Sample records for group-based query learning

  1. Query Health: standards-based, cross-platform population health surveillance.

    PubMed

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  2. Generalized query-based active learning to identify differentially methylated regions in DNA.

    PubMed

    Haque, Md Muksitul; Holder, Lawrence B; Skinner, Michael K; Cook, Diane J

    2013-01-01

    Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.

  3. A 5E Learning Cycle Approach-Based, Multimedia-Supplemented Instructional Unit for Structured Query Language

    ERIC Educational Resources Information Center

    Piyayodilokchai, Hongsiri; Panjaburee, Patcharin; Laosinchai, Parames; Ketpichainarong, Watcharee; Ruenwongsa, Pintip

    2013-01-01

    With the benefit of multimedia and the learning cycle approach in promoting effective active learning, this paper proposed a learning cycle approach-based, multimedia-supplemented instructional unit for Structured Query Language (SQL) for second-year undergraduate students with the aim of enhancing their basic knowledge of SQL and ability to apply…

  4. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  5. Active Learning by Querying Informative and Representative Examples.

    PubMed

    Huang, Sheng-Jun; Jin, Rong; Zhou, Zhi-Hua

    2014-10-01

    Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.

  6. Enhancing Learning Outcomes in Computer-Based Training via Self-Generated Elaboration

    ERIC Educational Resources Information Center

    Cuevas, Haydee M.; Fiore, Stephen M.

    2014-01-01

    The present study investigated the utility of an instructional strategy known as the "query method" for enhancing learning outcomes in computer-based training. The query method involves an embedded guided, sentence generation task requiring elaboration of key concepts in the training material that encourages learners to "stop and…

  7. Web Image Search Re-ranking with Click-based Similarity and Typicality.

    PubMed

    Yang, Xiaopeng; Mei, Tao; Zhang, Yong Dong; Liu, Jie; Satoh, Shin'ichi

    2016-07-20

    In image search re-ranking, besides the well known semantic gap, intent gap, which is the gap between the representation of users' query/demand and the real intent of the users, is becoming a major problem restricting the development of image retrieval. To reduce human effects, in this paper, we use image click-through data, which can be viewed as the "implicit feedback" from users, to help overcome the intention gap, and further improve the image search performance. Generally, the hypothesis visually similar images should be close in a ranking list and the strategy images with higher relevance should be ranked higher than others are widely accepted. To obtain satisfying search results, thus, image similarity and the level of relevance typicality are determinate factors correspondingly. However, when measuring image similarity and typicality, conventional re-ranking approaches only consider visual information and initial ranks of images, while overlooking the influence of click-through data. This paper presents a novel re-ranking approach, named spectral clustering re-ranking with click-based similarity and typicality (SCCST). First, to learn an appropriate similarity measurement, we propose click-based multi-feature similarity learning algorithm (CMSL), which conducts metric learning based on clickbased triplets selection, and integrates multiple features into a unified similarity space via multiple kernel learning. Then based on the learnt click-based image similarity measure, we conduct spectral clustering to group visually and semantically similar images into same clusters, and get the final re-rank list by calculating click-based clusters typicality and withinclusters click-based image typicality in descending order. Our experiments conducted on two real-world query-image datasets with diverse representative queries show that our proposed reranking approach can significantly improve initial search results, and outperform several existing re-ranking approaches.

  8. Learning and retention through predictive inference and classification.

    PubMed

    Sakamoto, Yasuaki; Love, Bradley C

    2010-12-01

    Work in category learning addresses how humans acquire knowledge and, thus, should inform classroom practices. In two experiments, we apply and evaluate intuitions garnered from laboratory-based research in category learning to learning tasks situated in an educational context. In Experiment 1, learning through predictive inference and classification were compared for fifth-grade students using class-related materials. Making inferences about properties of category members and receiving feedback led to the acquisition of both queried (i.e., tested) properties and nonqueried properties that were correlated with a queried property (e.g., even if not queried, students learned about a species' habitat because it correlated with a queried property, like the species' size). In contrast, classifying items according to their species and receiving feedback led to knowledge of only the property most diagnostic of category membership. After multiple-day delay, the fifth-graders who learned through inference selectively retained information about the queried properties, and the fifth-graders who learned through classification retained information about the diagnostic property, indicating a role for explicit evaluation in establishing memories. Overall, inference learning resulted in fewer errors, better retention, and more liking of the categories than did classification learning. Experiment 2 revealed that querying a property only a few times was enough to manifest the full benefits of inference learning in undergraduate students. These results suggest that classroom teaching should emphasize reasoning from the category to multiple properties rather than from a set of properties to the category. (PsycINFO Database Record (c) 2010 APA, all rights reserved).

  9. Multi-documents summarization based on clustering of learning object using hierarchical clustering

    NASA Astrophysics Data System (ADS)

    Mustamiin, M.; Budi, I.; Santoso, H. B.

    2018-03-01

    The Open Educational Resources (OER) is a portal of teaching, learning and research resources that is available in public domain and freely accessible. Learning contents or Learning Objects (LO) are granular and can be reused for constructing new learning materials. LO ontology-based searching techniques can be used to search for LO in the Indonesia OER. In this research, LO from search results are used as an ingredient to create new learning materials according to the topic searched by users. Summarizing-based grouping of LO use Hierarchical Agglomerative Clustering (HAC) with the dependency context to the user’s query which has an average value F-Measure of 0.487, while summarizing by K-Means F-Measure only has an average value of 0.336.

  10. Evaluation methodology for query-based scene understanding systems

    NASA Astrophysics Data System (ADS)

    Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.

    2015-05-01

    In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.

  11. Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

    PubMed

    Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

    2014-01-01

    Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).

  12. Cross-domain active learning for video concept detection

    NASA Astrophysics Data System (ADS)

    Li, Huan; Li, Chao; Shi, Yuan; Xiong, Zhang; Hauptmann, Alexander G.

    2011-08-01

    As video data from a variety of different domains (e.g., news, documentaries, entertainment) have distinctive data distributions, cross-domain video concept detection becomes an important task, in which one can reuse the labeled data of one domain to benefit the learning task in another domain with insufficient labeled data. In this paper, we approach this problem by proposing a cross-domain active learning method which iteratively queries labels of the most informative samples in the target domain. Traditional active learning assumes that the training (source domain) and test data (target domain) are from the same distribution. However, it may fail when the two domains have different distributions because querying informative samples according to a base learner that initially learned from source domain may no longer be helpful for the target domain. In our paper, we use the Gaussian random field model as the base learner which has the advantage of exploring the distributions in both domains, and adopt uncertainty sampling as the query strategy. Additionally, we present an instance weighting trick to accelerate the adaptability of the base learner, and develop an efficient model updating method which can significantly speed up the active learning process. Experimental results on TRECVID collections highlight the effectiveness.

  13. KBGIS-II: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, Terence; Peuquet, Donna; Menon, Sudhakar; Agarwal, Pankaj

    1986-01-01

    The architecture and working of a recently implemented Knowledge-Based Geographic Information System (KBGIS-II), designed to satisfy several general criteria for the GIS, is described. The system has four major functions including query-answering, learning and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial object language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is performing all its designated tasks successfully. Future reports will relate performance characteristics of the system.

  14. Query-based learning for aerospace applications.

    PubMed

    Saad, E W; Choi, J J; Vian, J L; Wunsch, D C Ii

    2003-01-01

    Models of real-world applications often include a large number of parameters with a wide dynamic range, which contributes to the difficulties of neural network training. Creating the training data set for such applications becomes costly, if not impossible. In order to overcome the challenge, one can employ an active learning technique known as query-based learning (QBL) to add performance-critical data to the training set during the learning phase, thereby efficiently improving the overall learning/generalization. The performance-critical data can be obtained using an inverse mapping called network inversion (discrete network inversion and continuous network inversion) followed by oracle query. This paper investigates the use of both inversion techniques for QBL learning, and introduces an original heuristic to select the inversion target values for continuous network inversion method. Efficiency and generalization was further enhanced by employing node decoupled extended Kalman filter (NDEKF) training and a causality index (CI) as a means to reduce the input search dimensionality. The benefits of the overall QBL approach are experimentally demonstrated in two aerospace applications: a classification problem with large input space and a control distribution problem.

  15. Comparing the performance of two CBIRS indexing schemes

    NASA Astrophysics Data System (ADS)

    Mueller, Wolfgang; Robbert, Guenter; Henrich, Andreas

    2003-01-01

    Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue. When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example. Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions. In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets. The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system. Our Benchmarks will draw on the Benchathlon"s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.

  16. Accelerating Research Impact in a Learning Health Care System

    PubMed Central

    Elwy, A. Rani; Sales, Anne E.; Atkins, David

    2017-01-01

    Background: Since 1998, the Veterans Health Administration (VHA) Quality Enhancement Research Initiative (QUERI) has supported more rapid implementation of research into clinical practice. Objectives: With the passage of the Veterans Access, Choice and Accountability Act of 2014 (Choice Act), QUERI further evolved to support VHA’s transformation into a Learning Health Care System by aligning science with clinical priority goals based on a strategic planning process and alignment of funding priorities with updated VHA priority goals in response to the Choice Act. Design: QUERI updated its strategic goals in response to independent assessments mandated by the Choice Act that recommended VHA reduce variation in care by providing a clear path to implement best practices. Specifically, QUERI updated its application process to ensure its centers (Programs) focus on cross-cutting VHA priorities and specify roadmaps for implementation of research-informed practices across different settings. QUERI also increased funding for scientific evaluations of the Choice Act and other policies in response to Commission on Care recommendations. Results: QUERI’s national network of Programs deploys effective practices using implementation strategies across different settings. QUERI Choice Act evaluations informed the law’s further implementation, setting the stage for additional rigorous national evaluations of other VHA programs and policies including community provider networks. Conclusions: Grounded in implementation science and evidence-based policy, QUERI serves as an example of how to operationalize core components of a Learning Health Care System, notably through rigorous evaluation and scientific testing of implementation strategies to ultimately reduce variation in quality and improve overall population health. PMID:27997456

  17. A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study.

    PubMed

    Del Fiol, Guilherme; Michelson, Matthew; Iorio, Alfonso; Cotoi, Chris; Haynes, R Brian

    2018-06-25

    A major barrier to the practice of evidence-based medicine is efficiently finding scientifically sound studies on a given clinical topic. To investigate a deep learning approach to retrieve scientifically sound treatment studies from the biomedical literature. We trained a Convolutional Neural Network using a noisy dataset of 403,216 PubMed citations with title and abstract as features. The deep learning model was compared with state-of-the-art search filters, such as PubMed's Clinical Query Broad treatment filter, McMaster's textword search strategy (no Medical Subject Heading, MeSH, terms), and Clinical Query Balanced treatment filter. A previously annotated dataset (Clinical Hedges) was used as the gold standard. The deep learning model obtained significantly lower recall than the Clinical Queries Broad treatment filter (96.9% vs 98.4%; P<.001); and equivalent recall to McMaster's textword search (96.9% vs 97.1%; P=.57) and Clinical Queries Balanced filter (96.9% vs 97.0%; P=.63). Deep learning obtained significantly higher precision than the Clinical Queries Broad filter (34.6% vs 22.4%; P<.001) and McMaster's textword search (34.6% vs 11.8%; P<.001), but was significantly lower than the Clinical Queries Balanced filter (34.6% vs 40.9%; P<.001). Deep learning performed well compared to state-of-the-art search filters, especially when citations were not indexed. Unlike previous machine learning approaches, the proposed deep learning model does not require feature engineering, or time-sensitive or proprietary features, such as MeSH terms and bibliometrics. Deep learning is a promising approach to identifying reports of scientifically rigorous clinical research. Further work is needed to optimize the deep learning model and to assess generalizability to other areas, such as diagnosis, etiology, and prognosis. ©Guilherme Del Fiol, Matthew Michelson, Alfonso Iorio, Chris Cotoi, R Brian Haynes. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.06.2018.

  18. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    PubMed Central

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379

  19. Enhancing user privacy in SARG04-based private database query protocols

    NASA Astrophysics Data System (ADS)

    Yu, Fang; Qiu, Daowen; Situ, Haozhen; Wang, Xiaoming; Long, Shun

    2015-11-01

    The well-known SARG04 protocol can be used in a private query application to generate an oblivious key. By usage of the key, the user can retrieve one out of N items from a database without revealing which one he/she is interested in. However, the existing SARG04-based private query protocols are vulnerable to the attacks of faked data from the database since in its canonical form, the SARG04 protocol lacks means for one party to defend attacks from the other. While such attacks can cause significant loss of user privacy, a variant of the SARG04 protocol is proposed in this paper with new mechanisms designed to help the user protect its privacy in private query applications. In the protocol, it is the user who starts the session with the database, trying to learn from it bits of a raw key in an oblivious way. An honesty test is used to detect a cheating database who had transmitted faked data. The whole private query protocol has O( N) communication complexity for conveying at least N encrypted items. Compared with the existing SARG04-based protocols, it is efficient in communication for per-bit learning.

  20. Out-of-Sample Extrapolation utilizing Semi-Supervised Manifold Learning (OSE-SSL): Content Based Image Retrieval for Histopathology Images

    PubMed Central

    Sparks, Rachel; Madabhushi, Anant

    2016-01-01

    Content-based image retrieval (CBIR) retrieves database images most similar to the query image by (1) extracting quantitative image descriptors and (2) calculating similarity between database and query image descriptors. Recently, manifold learning (ML) has been used to perform CBIR in a low dimensional representation of the high dimensional image descriptor space to avoid the curse of dimensionality. ML schemes are computationally expensive, requiring an eigenvalue decomposition (EVD) for every new query image to learn its low dimensional representation. We present out-of-sample extrapolation utilizing semi-supervised ML (OSE-SSL) to learn the low dimensional representation without recomputing the EVD for each query image. OSE-SSL incorporates semantic information, partial class label, into a ML scheme such that the low dimensional representation co-localizes semantically similar images. In the context of prostate histopathology, gland morphology is an integral component of the Gleason score which enables discrimination between prostate cancer aggressiveness. Images are represented by shape features extracted from the prostate gland. CBIR with OSE-SSL for prostate histology obtained from 58 patient studies, yielded an area under the precision recall curve (AUPRC) of 0.53 ± 0.03 comparatively a CBIR with Principal Component Analysis (PCA) to learn a low dimensional space yielded an AUPRC of 0.44 ± 0.01. PMID:27264985

  1. Supporting ontology-based keyword search over medical databases.

    PubMed

    Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

    2008-11-06

    The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices.

  2. A "Simple Query Interface" Adapter for the Discovery and Exchange of Learning Resources

    ERIC Educational Resources Information Center

    Massart, David

    2006-01-01

    Developed as part of CEN/ISSS Workshop on Learning Technology efforts to improve interoperability between learning resource repositories, the Simple Query Interface (SQI) is an Application Program Interface (API) for querying heterogeneous repositories of learning resource metadata. In the context of the ProLearn Network of Excellence, SQI is used…

  3. Learning Object Retrieval and Aggregation Based on Learning Styles

    ERIC Educational Resources Information Center

    Ramirez-Arellano, Aldo; Bory-Reyes, Juan; Hernández-Simón, Luis Manuel

    2017-01-01

    The main goal of this article is to develop a Management System for Merging Learning Objects (msMLO), which offers an approach that retrieves learning objects (LOs) based on students' learning styles and term-based queries, which produces a new outcome with a better score. The msMLO faces the task of retrieving LOs via two steps: The first step…

  4. Progressive content-based retrieval of image and video with adaptive and iterative refinement

    NASA Technical Reports Server (NTRS)

    Li, Chung-Sheng (Inventor); Turek, John Joseph Edward (Inventor); Castelli, Vittorio (Inventor); Chen, Ming-Syan (Inventor)

    1998-01-01

    A method and apparatus for minimizing the time required to obtain results for a content based query in a data base. More specifically, with this invention, the data base is partitioned into a plurality of groups. Then, a schedule or sequence of groups is assigned to each of the operations of the query, where the schedule represents the order in which an operation of the query will be applied to the groups in the schedule. Each schedule is arranged so that each application of the operation operates on the group which will yield intermediate results that are closest to final results.

  5. KBGIS-2: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, T.; Peuquet, D.; Menon, S.; Agarwal, P.

    1986-01-01

    The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2.

  6. Query Health: standards-based, cross-platform population health surveillance

    PubMed Central

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371

  7. Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.

    PubMed

    Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie

    2017-03-01

    Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].

  8. Human use regulatory affairs advisor (HURAA): learning about research ethics with intelligent learning modules.

    PubMed

    Hu, Xiangen; Graesser, Arthur C

    2004-05-01

    The Human Use Regulatory Affairs Advisor (HURAA) is a Web-based facility that provides help and training on the ethical use of human subjects in research, based on documents and regulations in United States federal agencies. HURAA has a number of standard features of conventional Web facilities and computer-based training, such as hypertext, multimedia, help modules, glossaries, archives, links to other sites, and page-turning didactic instruction. HURAA also has these intelligent features: (1) an animated conversational agent that serves as a navigational guide for the Web facility, (2) lessons with case-based and explanation-based reasoning, (3) document retrieval through natural language queries, and (4) a context-sensitive Frequently Asked Questions segment, called Point & Query. This article describes the functional learning components of HURAA, specifies its computational architecture, and summarizes empirical tests of the facility on learners.

  9. A unified framework for image retrieval using keyword and visual features.

    PubMed

    Jing, Feng; Li, Mingling; Zhang, Hong-Jiang; Zhang, Bo

    2005-07-01

    In this paper, a unified image retrieval framework based on both keyword annotations and visual features is proposed. In this framework, a set of statistical models are built based on visual features of a small set of manually labeled images to represent semantic concepts and used to propagate keywords to other unlabeled images. These models are updated periodically when more images implicitly labeled by users become available through relevance feedback. In this sense, the keyword models serve the function of accumulation and memorization of knowledge learned from user-provided relevance feedback. Furthermore, two sets of effective and efficient similarity measures and relevance feedback schemes are proposed for query by keyword scenario and query by image example scenario, respectively. Keyword models are combined with visual features in these schemes. In particular, a new, entropy-based active learning strategy is introduced to improve the efficiency of relevance feedback for query by keyword. Furthermore, a new algorithm is proposed to estimate the keyword features of the search concept for query by image example. It is shown to be more appropriate than two existing relevance feedback algorithms. Experimental results demonstrate the effectiveness of the proposed framework.

  10. Semantic-based surveillance video retrieval.

    PubMed

    Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve

    2007-04-01

    Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.

  11. Query construction, entropy, and generalization in neural-network models

    NASA Astrophysics Data System (ADS)

    Sollich, Peter

    1994-05-01

    We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.

  12. Practical private database queries based on a quantum-key-distribution protocol

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobi, Markus; Humboldt-Universitaet zu Berlin, D-10117 Berlin; Simon, Christoph

    2011-02-15

    Private queries allow a user, Alice, to learn an element of a database held by a provider, Bob, without revealing which element she is interested in, while limiting her information about the other elements. We propose to implement private queries based on a quantum-key-distribution protocol, with changes only in the classical postprocessing of the key. This approach makes our scheme both easy to implement and loss tolerant. While unconditionally secure private queries are known to be impossible, we argue that an interesting degree of security can be achieved by relying on fundamental physical principles instead of unverifiable security assumptions inmore » order to protect both the user and the database. We think that the scope exists for such practical private queries to become another remarkable application of quantum information in the footsteps of quantum key distribution.« less

  13. A Research on E - learning Resources Construction Based on Semantic Web

    NASA Astrophysics Data System (ADS)

    Rui, Liu; Maode, Deng

    Traditional e-learning platforms have the flaws that it's usually difficult to query or positioning, and realize the cross platform sharing and interoperability. In the paper, the semantic web and metadata standard is discussed, and a kind of e - learning system framework based on semantic web is put forward to try to solve the flaws of traditional elearning platforms.

  14. Reading a Critical Review of Evidence: Notes and Queries on Research Programmes in Environmental Education.

    ERIC Educational Resources Information Center

    Reid, Alan D.; Nikel, Jutta

    2003-01-01

    Explores the notion that a review communicates a research program and how it might extend and disrupt readings of Rickinson's (2001) review of the evidence base for environmental education learning. Investigates, through a series of notes and queries using Lakatos's ideas, the production and possibilities of the review rather than the findings.…

  15. Designing a Syntax-Based Retrieval System for Supporting Language Learning

    ERIC Educational Resources Information Center

    Tsao, Nai-Lung; Kuo, Chin-Hwa; Wible, David; Hung, Tsung-Fu

    2009-01-01

    In this paper, we propose a syntax-based text retrieval system for on-line language learning and use a fast regular expression search engine as its main component. Regular expression searches provide more scalable querying and search results than keyword-based searches. However, without a well-designed index scheme, the execution time of regular…

  16. Fast Nonparametric Machine Learning Algorithms for High-Dimensional Massive Data and Applications

    DTIC Science & Technology

    2006-03-01

    know the probability of that from Lemma 2. Using the union bound, we know that for any query q, the probability that i-am-feeling-lucky search algorithm...and each point in a d-dimensional space, a naive k-NN search needs to do a linear scan of T for every single query q, and thus the computational time...algorithm based on partition trees with priority search , and give an expected query time O((1/)d log n). But the constant in the O((1/)d log n

  17. An investigative, cooperative learning approach to the general microbiology laboratory.

    PubMed

    Seifert, Kyle; Fenster, Amy; Dilts, Judith A; Temple, Louise

    2009-01-01

    Investigative- and cooperative-based learning strategies have been used effectively in a variety of classrooms to enhance student learning and engagement. In the General Microbiology laboratory for juniors and seniors at James Madison University, these strategies were combined to make a semester-long, investigative, cooperative learning experience involving culture and identification of microbial isolates that the students obtained from various environments. To assess whether this strategy was successful, students were asked to complete a survey at the beginning and at the end of the semester regarding their comfort level with a variety of topics. For most of the topics queried, the students reported that their comfort had increased significantly during the semester. Furthermore, this group of students thought that the quality of this investigative lab experience was much better than that of any of their previous lab experiences.

  18. An Investigative, Cooperative Learning Approach to the General Microbiology Laboratory

    PubMed Central

    Seifert, Kyle; Fenster, Amy; Dilts, Judith A.

    2009-01-01

    Investigative- and cooperative-based learning strategies have been used effectively in a variety of classrooms to enhance student learning and engagement. In the General Microbiology laboratory for juniors and seniors at James Madison University, these strategies were combined to make a semester-long, investigative, cooperative learning experience involving culture and identification of microbial isolates that the students obtained from various environments. To assess whether this strategy was successful, students were asked to complete a survey at the beginning and at the end of the semester regarding their comfort level with a variety of topics. For most of the topics queried, the students reported that their comfort had increased significantly during the semester. Furthermore, this group of students thought that the quality of this investigative lab experience was much better than that of any of their previous lab experiences. PMID:19487504

  19. Content-based image retrieval with ontological ranking

    NASA Astrophysics Data System (ADS)

    Tsai, Shen-Fu; Tsai, Min-Hsuan; Huang, Thomas S.

    2010-02-01

    Images are a much more powerful medium of expression than text, as the adage says: "One picture is worth a thousand words." It is because compared with text consisting of an array of words, an image has more degrees of freedom and therefore a more complicated structure. However, the less limited structure of images presents researchers in the computer vision community a tough task of teaching machines to understand and organize images, especially when a limit number of learning examples and background knowledge are given. The advance of internet and web technology in the past decade has changed the way human gain knowledge. People, hence, can exchange knowledge with others by discussing and contributing information on the web. As a result, the web pages in the internet have become a living and growing source of information. One is therefore tempted to wonder whether machines can learn from the web knowledge base as well. Indeed, it is possible to make computer learn from the internet and provide human with more meaningful knowledge. In this work, we explore this novel possibility on image understanding applied to semantic image search. We exploit web resources to obtain links from images to keywords and a semantic ontology constituting human's general knowledge. The former maps visual content to related text in contrast to the traditional way of associating images with surrounding text; the latter provides relations between concepts for machines to understand to what extent and in what sense an image is close to the image search query. With the aid of these two tools, the resulting image search system is thus content-based and moreover, organized. The returned images are ranked and organized such that semantically similar images are grouped together and given a rank based on the semantic closeness to the input query. The novelty of the system is twofold: first, images are retrieved not only based on text cues but their actual contents as well; second, the grouping is different from pure visual similarity clustering. More specifically, the inferred concepts of each image in the group are examined in the context of a huge concept ontology to determine their true relations with what people have in mind when doing image search.

  20. Issues in the design of a pilot concept-based query interface for the neuroinformatics information framework.

    PubMed

    Marenco, Luis; Li, Yuli; Martone, Maryann E; Sternberg, Paul W; Shepherd, Gordon M; Miller, Perry L

    2008-09-01

    This paper describes a pilot query interface that has been constructed to help us explore a "concept-based" approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface.

  1. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    PubMed

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  2. An Information Retrieval and Recommendation System for Astronomical Observatories

    NASA Astrophysics Data System (ADS)

    Mukund, Nikhil; Thakur, Saurabh; Abraham, Sheelu; Aniyan, A. K.; Mitra, Sanjit; Sajeeth Philip, Ninan; Vaghmare, Kaustubh; Acharjya, D. P.

    2018-03-01

    We present a machine-learning-based information retrieval system for astronomical observatories that tries to address user-defined queries related to an instrument. In the modern instrumentation scenario where heterogeneous systems and talents are simultaneously at work, the ability to supply people with the right information helps speed up the tasks for detector operation, maintenance, and upgradation. The proposed method analyzes existing documented efforts at the site to intelligently group related information to a query and to present it online to the user. The user in response can probe the suggested content and explore previously developed solutions or probable ways to address the present situation optimally. We demonstrate natural language-processing-backed knowledge rediscovery by making use of the open source logbook data from the Laser Interferometric Gravitational Observatory (LIGO). We implement and test a web application that incorporates the above idea for LIGO Livingston, LIGO Hanford, and Virgo observatories.

  3. Query-based biclustering of gene expression data using Probabilistic Relational Models.

    PubMed

    Zhao, Hui; Cloots, Lore; Van den Bulcke, Tim; Wu, Yan; De Smet, Riet; Storms, Valerie; Meysman, Pieter; Engelen, Kristof; Marchal, Kathleen

    2011-02-15

    With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set. We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds. ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

  4. Active learning based segmentation of Crohns disease from abdominal MRI.

    PubMed

    Mahapatra, Dwarikanath; Vos, Franciscus M; Buhmann, Joachim M

    2016-05-01

    This paper proposes a novel active learning (AL) framework, and combines it with semi supervised learning (SSL) for segmenting Crohns disease (CD) tissues from abdominal magnetic resonance (MR) images. Robust fully supervised learning (FSL) based classifiers require lots of labeled data of different disease severities. Obtaining such data is time consuming and requires considerable expertise. SSL methods use a few labeled samples, and leverage the information from many unlabeled samples to train an accurate classifier. AL queries labels of most informative samples and maximizes gain from the labeling effort. Our primary contribution is in designing a query strategy that combines novel context information with classification uncertainty and feature similarity. Combining SSL and AL gives a robust segmentation method that: (1) optimally uses few labeled samples and many unlabeled samples; and (2) requires lower training time. Experimental results show our method achieves higher segmentation accuracy than FSL methods with fewer samples and reduced training effort. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

    PubMed Central

    Li, Yuli; Martone, Maryann E.; Sternberg, Paul W.; Shepherd, Gordon M.; Miller, Perry L.

    2009-01-01

    This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface. PMID:18953674

  6. A rank-based Prediction Algorithm of Learning User's Intention

    NASA Astrophysics Data System (ADS)

    Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing

    Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.

  7. Semantic Services in e-Learning: An Argumentation Case Study

    ERIC Educational Resources Information Center

    Moreale, Emanuela; Vargas-Vera, Maria

    2004-01-01

    This paper outlines an e-Learning services architecture offering semantic-based services to students and tutors, in particular ways to browse and obtain information through web services. Services could include registration, authentication, tutoring systems, smart question answering for students' queries, automated marking systems and a student…

  8. Research Data Alliance: Understanding Big Data Analytics Applications in Earth Science

    NASA Astrophysics Data System (ADS)

    Riedel, Morris; Ramachandran, Rahul; Baumann, Peter

    2014-05-01

    The Research Data Alliance (RDA) enables data to be shared across barriers through focused working groups and interest groups, formed of experts from around the world - from academia, industry and government. Its Big Data Analytics (BDA) interest groups seeks to develop community based recommendations on feasible data analytics approaches to address scientific community needs of utilizing large quantities of data. BDA seeks to analyze different scientific domain applications (e.g. earth science use cases) and their potential use of various big data analytics techniques. These techniques reach from hardware deployment models up to various different algorithms (e.g. machine learning algorithms such as support vector machines for classification). A systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries will be covered in these recommendations. This contribution will outline initial parts of such a classification and recommendations in the specific context of the field of Earth Sciences. Given lessons learned and experiences are based on a survey of use cases and also providing insights in a few use cases in detail.

  9. Research Data Alliance: Understanding Big Data Analytics Applications in Earth Science

    NASA Technical Reports Server (NTRS)

    Riedel, Morris; Ramachandran, Rahul; Baumann, Peter

    2014-01-01

    The Research Data Alliance (RDA) enables data to be shared across barriers through focused working groups and interest groups, formed of experts from around the world - from academia, industry and government. Its Big Data Analytics (BDA) interest groups seeks to develop community based recommendations on feasible data analytics approaches to address scientific community needs of utilizing large quantities of data. BDA seeks to analyze different scientific domain applications (e.g. earth science use cases) and their potential use of various big data analytics techniques. These techniques reach from hardware deployment models up to various different algorithms (e.g. machine learning algorithms such as support vector machines for classification). A systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries will be covered in these recommendations. This contribution will outline initial parts of such a classification and recommendations in the specific context of the field of Earth Sciences. Given lessons learned and experiences are based on a survey of use cases and also providing insights in a few use cases in detail.

  10. Indexing Guidelines: Applications in Use of Pulmonary Artery Catheters and Pressure Ulcer Prevention

    PubMed Central

    Jenders, Robert A.; Estey, Greg; Martin, Martha; Hamilton, Glenys; Ford-Carleton, Penny; Thompson, B. Taylor; Oliver, Diane E.; Eccles, Randy; Barnett, G. Octo; Zielstorff, Rita D.; Fitzmaurice, Joan B.

    1994-01-01

    In a busy clinical environment, access to knowledge must be rapid and specific to the clinical query at hand. This requires indices which support easy navigation within a knowledge source. We have developed a computer-based tool for trouble-shooting pulmonary artery waveforms using a graphical index. Preliminary results of domain knowledge tests for a group of clinicians exposed to the system (N=33) show a mean improvement on a 30-point test of 5.33 (p<0.001) compared to a control group (N=19) improvement of 0.47 (p=0.61). Survey of the experimental group (N=25) showed 84% (p=0.001) found the system easy to use. We discuss lessons learned in indexing this domain area to computer-based indexing of guidelines for pressure ulcer prevention. PMID:7950035

  11. Pattern Activity Clustering and Evaluation (PACE)

    NASA Astrophysics Data System (ADS)

    Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna

    2012-06-01

    With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.

  12. Occam's razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2005-01-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  13. Occam"s razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2004-12-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  14. Assisting Consumer Health Information Retrieval with Query Recommendations

    PubMed Central

    Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

    2006-01-01

    Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944

  15. Neural networks and logical reasoning systems: a translation table.

    PubMed

    Martins, J; Mendes, R V

    2001-04-01

    A correspondence is established between the basic elements of logic reasoning systems (knowledge bases, rules, inference and queries) and the structure and dynamical evolution laws of neural networks. The correspondence is pictured as a translation dictionary which might allow to go back and forth between symbolic and network formulations, a desirable step in learning-oriented systems and multicomputer networks. In the framework of Horn clause logics, it is found that atomic propositions with n arguments correspond to nodes with nth order synapses, rules to synaptic intensity constraints, forward chaining to synaptic dynamics and queries either to simple node activation or to a query tensor dynamics.

  16. Support patient search on pathology reports with interactive online learning based data extraction.

    PubMed

    Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng

    2015-01-01

    Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.

  17. Classification of ECG beats using deep belief network and active learning.

    PubMed

    G, Sayantan; T, Kien P; V, Kadambari K

    2018-04-12

    A new semi-supervised approach based on deep learning and active learning for classification of electrocardiogram signals (ECG) is proposed. The objective of the proposed work is to model a scientific method for classification of cardiac irregularities using electrocardiogram beats. The model follows the Association for the Advancement of medical instrumentation (AAMI) standards and consists of three phases. In phase I, feature representation of ECG is learnt using Gaussian-Bernoulli deep belief network followed by a linear support vector machine (SVM) training in the consecutive phase. It yields three deep models which are based on AAMI-defined classes, namely N, V, S, and F. In the last phase, a query generator is introduced to interact with the expert to label few beats to improve accuracy and sensitivity. The proposed approach depicts significant improvement in accuracy with minimal queries posed to the expert and fast online training as tested on the MIT-BIH Arrhythmia Database and the MIT-BIH Supra-ventricular Arrhythmia Database (SVDB). With 100 queries labeled by the expert in phase III, the method achieves an accuracy of 99.5% in "S" versus all classifications (SVEB) and 99.4% accuracy in "V " versus all classifications (VEB) on MIT-BIH Arrhythmia Database. In a similar manner, it is attributed that an accuracy of 97.5% for SVEB and 98.6% for VEB on SVDB database is achieved respectively. Graphical Abstract Reply- Deep belief network augmented by active learning for efficient prediction of arrhythmia.

  18. Learning of Multimodal Representations With Random Walks on the Click Graph.

    PubMed

    Wu, Fei; Lu, Xinyan; Song, Jun; Yan, Shuicheng; Zhang, Zhongfei Mark; Rui, Yong; Zhuang, Yueting

    2016-02-01

    In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users' searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross-modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross-modal retrieval performance on the unseen queries/images than the other state-of-the-art methods.

  19. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

    2014-01-01

    Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537

  20. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    PubMed

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.

  1. A similarity learning approach to content-based image retrieval: application to digital mammography.

    PubMed

    El-Naqa, Issam; Yang, Yongyi; Galatsanos, Nikolas P; Nishikawa, Robert M; Wernick, Miles N

    2004-10-01

    In this paper, we describe an approach to content-based retrieval of medical images from a database, and provide a preliminary demonstration of our approach as applied to retrieval of digital mammograms. Content-based image retrieval (CBIR) refers to the retrieval of images from a database using information derived from the images themselves, rather than solely from accompanying text indices. In the medical-imaging context, the ultimate aim of CBIR is to provide radiologists with a diagnostic aid in the form of a display of relevant past cases, along with proven pathology and other suitable information. CBIR may also be useful as a training tool for medical students and residents. The goal of information retrieval is to recall from a database information that is relevant to the user's query. The most challenging aspect of CBIR is the definition of relevance (similarity), which is used to guide the retrieval machine. In this paper, we pursue a new approach, in which similarity is learned from training examples provided by human observers. Specifically, we explore the use of neural networks and support vector machines to predict the user's notion of similarity. Within this framework we propose using a hierarchal learning approach, which consists of a cascade of a binary classifier and a regression module to optimize retrieval effectiveness and efficiency. We also explore how to incorporate online human interaction to achieve relevance feedback in this learning framework. Our experiments are based on a database consisting of 76 mammograms, all of which contain clustered microcalcifications (MCs). Our goal is to retrieve mammogram images containing similar MC clusters to that in a query. The performance of the retrieval system is evaluated using precision-recall curves computed using a cross-validation procedure. Our experimental results demonstrate that: 1) the learning framework can accurately predict the perceptual similarity reported by human observers, thereby serving as a basis for CBIR; 2) the learning-based framework can significantly outperform a simple distance-based similarity metric; 3) the use of the hierarchical two-stage network can improve retrieval performance; and 4) relevance feedback can be effectively incorporated into this learning framework to achieve improvement in retrieval precision based on online interaction with users; and 5) the retrieved images by the network can have predicting value for the disease condition of the query.

  2. Performing private database queries in a real-world environment using a quantum protocol.

    PubMed

    Chan, Philip; Lucio-Martinez, Itzel; Mo, Xiaofan; Simon, Christoph; Tittel, Wolfgang

    2014-06-10

    In the well-studied cryptographic primitive 1-out-of-N oblivious transfer, a user retrieves a single element from a database of size N without the database learning which element was retrieved. While it has previously been shown that a secure implementation of 1-out-of-N oblivious transfer is impossible against arbitrarily powerful adversaries, recent research has revealed an interesting class of private query protocols based on quantum mechanics in a cheat sensitive model. Specifically, a practical protocol does not need to guarantee that the database provider cannot learn what element was retrieved if doing so carries the risk of detection. The latter is sufficient motivation to keep a database provider honest. However, none of the previously proposed protocols could cope with noisy channels. Here we present a fault-tolerant private query protocol, in which the novel error correction procedure is integral to the security of the protocol. Furthermore, we present a proof-of-concept demonstration of the protocol over a deployed fibre.

  3. Performing private database queries in a real-world environment using a quantum protocol

    PubMed Central

    Chan, Philip; Lucio-Martinez, Itzel; Mo, Xiaofan; Simon, Christoph; Tittel, Wolfgang

    2014-01-01

    In the well-studied cryptographic primitive 1-out-of-N oblivious transfer, a user retrieves a single element from a database of size N without the database learning which element was retrieved. While it has previously been shown that a secure implementation of 1-out-of-N oblivious transfer is impossible against arbitrarily powerful adversaries, recent research has revealed an interesting class of private query protocols based on quantum mechanics in a cheat sensitive model. Specifically, a practical protocol does not need to guarantee that the database provider cannot learn what element was retrieved if doing so carries the risk of detection. The latter is sufficient motivation to keep a database provider honest. However, none of the previously proposed protocols could cope with noisy channels. Here we present a fault-tolerant private query protocol, in which the novel error correction procedure is integral to the security of the protocol. Furthermore, we present a proof-of-concept demonstration of the protocol over a deployed fibre. PMID:24913129

  4. Automatic Depth Extraction from 2D Images Using a Cluster-Based Learning Framework.

    PubMed

    Herrera, Jose L; Del-Blanco, Carlos R; Garcia, Narciso

    2018-07-01

    There has been a significant increase in the availability of 3D players and displays in the last years. Nonetheless, the amount of 3D content has not experimented an increment of such magnitude. To alleviate this problem, many algorithms for converting images and videos from 2D to 3D have been proposed. Here, we present an automatic learning-based 2D-3D image conversion approach, based on the key hypothesis that color images with similar structure likely present a similar depth structure. The presented algorithm estimates the depth of a color query image using the prior knowledge provided by a repository of color + depth images. The algorithm clusters this database attending to their structural similarity, and then creates a representative of each color-depth image cluster that will be used as prior depth map. The selection of the appropriate prior depth map corresponding to one given color query image is accomplished by comparing the structural similarity in the color domain between the query image and the database. The comparison is based on a K-Nearest Neighbor framework that uses a learning procedure to build an adaptive combination of image feature descriptors. The best correspondences determine the cluster, and in turn the associated prior depth map. Finally, this prior estimation is enhanced through a segmentation-guided filtering that obtains the final depth map estimation. This approach has been tested using two publicly available databases, and compared with several state-of-the-art algorithms in order to prove its efficiency.

  5. Active learning for semi-supervised clustering based on locally linear propagation reconstruction.

    PubMed

    Chang, Chin-Chun; Lin, Po-Yi

    2015-03-01

    The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Queries for Bias Testing

    NASA Technical Reports Server (NTRS)

    Gordon, Diana F.

    1992-01-01

    Selecting a good bias prior to concept learning can be difficult. Therefore, dynamic bias adjustment is becoming increasingly popular. Current dynamic bias adjustment systems, however, are limited in their ability to identify erroneous assumptions about the relationship between the bias and the target concept. Without proper diagnosis, it is difficult to identify and then remedy faulty assumptions. We have developed an approach that makes these assumptions explicit, actively tests them with queries to an oracle, and adjusts the bias based on the test results.

  7. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.

    PubMed

    Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin

    2010-04-16

    Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.

  8. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

    PubMed Central

    2010-01-01

    Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504

  9. Agent-Based Computing Integration and Testing

    DTIC Science & Technology

    2006-12-01

    Query Language (DQL). Regrettably, DQL never became a W3C Member Submission itself, but likely had some influence on the SPARQL Protocol And RDF... Query Language ( SPARQL ) subsequently produced by the W3C Data Access Working Group (DAWG) as that working group also contained members from the DAML...Sponsored by Defense Advanced Research Projects Agency DARPA Order No. K536 APPROVED FOR PUBLIC RELEASE

  10. Hybrid ontology for semantic information retrieval model using keyword matching indexing system.

    PubMed

    Uthayan, K R; Mala, G S Anandha

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

  11. Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System

    PubMed Central

    Uthayan, K. R.; Anandha Mala, G. S.

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851

  12. A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval.

    PubMed

    Yang, Liu; Jin, Rong; Mummert, Lily; Sukthankar, Rahul; Goode, Adam; Zheng, Bin; Hoi, Steven C H; Satyanarayanan, Mahadev

    2010-01-01

    Similarity measurement is a critical component in content-based image retrieval systems, and learning a good distance metric can significantly improve retrieval performance. However, despite extensive study, there are several major shortcomings with the existing approaches for distance metric learning that can significantly affect their application to medical image retrieval. In particular, "similarity" can mean very different things in image retrieval: resemblance in visual appearance (e.g., two images that look like one another) or similarity in semantic annotation (e.g., two images of tumors that look quite different yet are both malignant). Current approaches for distance metric learning typically address only one goal without consideration of the other. This is problematic for medical image retrieval where the goal is to assist doctors in decision making. In these applications, given a query image, the goal is to retrieve similar images from a reference library whose semantic annotations could provide the medical professional with greater insight into the possible interpretations of the query image. If the system were to retrieve images that did not look like the query, then users would be less likely to trust the system; on the other hand, retrieving images that appear superficially similar to the query but are semantically unrelated is undesirable because that could lead users toward an incorrect diagnosis. Hence, learning a distance metric that preserves both visual resemblance and semantic similarity is important. We emphasize that, although our study is focused on medical image retrieval, the problem addressed in this work is critical to many image retrieval systems. We present a boosting framework for distance metric learning that aims to preserve both visual and semantic similarities. The boosting framework first learns a binary representation using side information, in the form of labeled pairs, and then computes the distance as a weighted Hamming distance using the learned binary representation. A boosting algorithm is presented to efficiently learn the distance function. We evaluate the proposed algorithm on a mammographic image reference library with an Interactive Search-Assisted Decision Support (ISADS) system and on the medical image data set from ImageCLEF. Our results show that the boosting framework compares favorably to state-of-the-art approaches for distance metric learning in retrieval accuracy, with much lower computational cost. Additional evaluation with the COREL collection shows that our algorithm works well for regular image data sets.

  13. Web-based resources for critical care education.

    PubMed

    Kleinpell, Ruth; Ely, E Wesley; Williams, Ged; Liolios, Antonios; Ward, Nicholas; Tisherman, Samuel A

    2011-03-01

    To identify, catalog, and critically evaluate Web-based resources for critical care education. A multilevel search strategy was utilized. Literature searches were conducted (from 1996 to September 30, 2010) using OVID-MEDLINE, PubMed, and the Cumulative Index to Nursing and Allied Health Literature with the terms "Web-based learning," "computer-assisted instruction," "e-learning," "critical care," "tutorials," "continuing education," "virtual learning," and "Web-based education." The Web sites of relevant critical care organizations (American College of Chest Physicians, American Society of Anesthesiologists, American Thoracic Society, European Society of Intensive Care Medicine, Society of Critical Care Medicine, World Federation of Societies of Intensive and Critical Care Medicine, American Association of Critical Care Nurses, and World Federation of Critical Care Nurses) were reviewed for the availability of e-learning resources. Finally, Internet searches and e-mail queries to critical care medicine fellowship program directors and members of national and international acute/critical care listserves were conducted to 1) identify the use of and 2) review and critique Web-based resources for critical care education. To ensure credibility of Web site information, Web sites were reviewed by three independent reviewers on the basis of the criteria of authority, objectivity, authenticity, accuracy, timeliness, relevance, and efficiency in conjunction with suggested formats for evaluating Web sites in the medical literature. Literature searches using OVID-MEDLINE, PubMed, and the Cumulative Index to Nursing and Allied Health Literature resulted in >250 citations. Those pertinent to critical care provide examples of the integration of e-learning techniques, the development of specific resources, reports of the use of types of e-learning, including interactive tutorials, case studies, and simulation, and reports of student or learner satisfaction, among other general reviews of the benefits of utilizing e-learning. Review of the Web sites of relevant critical care organizations revealed the existence of a number of e-learning resources, including online critical care courses, tutorials, podcasts, webcasts, slide sets, and continuing medical education resources, some requiring membership or a fee to access. Respondents to listserve queries (>100) and critical care medicine fellowship director and advanced practice nursing educator e-mail queries (>50) identified the use of a number of tutorials, self-directed learning modules, and video-enhanced programs for critical care education and practice. In all, >135 Web-based education resources exist, including video Web resources for critical care education in a variety of e-learning formats, such as tutorials, self-directed learning modules, interactive case studies, webcasts, podcasts, and video-enhanced programs. As identified by critical care educators and practitioners, e-learning is actively being integrated into critical care medicine and nursing training programs for continuing medical education and competency training purposes. Knowledge of available Web-based educational resources may enhance critical care practitioners' ongoing learning and clinical competence, although this has not been objectively measured to date.

  14. Learning context-sensitive shape similarity by graph transduction.

    PubMed

    Bai, Xiang; Yang, Xingwei; Latecki, Longin Jan; Liu, Wenyu; Tu, Zhuowen

    2010-05-01

    Shape similarity and shape retrieval are very important topics in computer vision. The recent progress in this domain has been mostly driven by designing smart shape descriptors for providing better similarity measure between pairs of shapes. In this paper, we provide a new perspective to this problem by considering the existing shapes as a group, and study their similarity measures to the query shape in a graph structure. Our method is general and can be built on top of any existing shape similarity measure. For a given similarity measure, a new similarity is learned through graph transduction. The new similarity is learned iteratively so that the neighbors of a given shape influence its final similarity to the query. The basic idea here is related to PageRank ranking, which forms a foundation of Google Web search. The presented experimental results demonstrate that the proposed approach yields significant improvements over the state-of-art shape matching algorithms. We obtained a retrieval rate of 91.61 percent on the MPEG-7 data set, which is the highest ever reported in the literature. Moreover, the learned similarity by the proposed method also achieves promising improvements on both shape classification and shape clustering.

  15. Recommender System for Learning SQL Using Hints

    ERIC Educational Resources Information Center

    Lavbic, Dejan; Matek, Tadej; Zrnec, Aljaž

    2017-01-01

    Today's software industry requires individuals who are proficient in as many programming languages as possible. Structured query language (SQL), as an adopted standard, is no exception, as it is the most widely used query language to retrieve and manipulate data. However, the process of learning SQL turns out to be challenging. The need for a…

  16. Querying Proofs (Work in Progress)

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2011-01-01

    We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.

  17. Measuring the development of conceptual understanding in chemistry

    NASA Astrophysics Data System (ADS)

    Claesgens, Jennifer Marie

    The purpose of this dissertation research is to investigate and characterize how students learn chemistry from pre-instruction to deeper understanding of the subject matter in their general chemistry coursework. Based on preliminary work, I believe that students have a general pathway of learning across the "big ideas," or concepts, in chemistry that can be characterized over the course of instruction. My hypothesis is that as students learn chemistry they build from experience and logical reasoning then relate chemistry specific ideas in a pair-wise fashion before making more complete multi-relational links for deeper understanding of the subject matter. This proposed progression of student learning, which starts at Notions, moves to Recognition, and then to Formulation, is described in the ChemQuery Perspectives framework. My research continues the development of ChemQuery, an NSF-funded assessment system that uses a framework of the key ideas in the discipline and criterion-referenced analysis using item response theory (IRT) to map student progress. Specifially, this research investigates the potential for using criterion-referenced analysis to describe and measure how students learn chemistry followed by more detailed task analysis of patterns in student responses found in the data. My research question asks: does IRT work to describe and measure how students learn chemistry and if so, what is discovered about how students learn? Although my findings seem to neither entirely support nor entirely refute the pathway of student understanding proposed in the ChemQuery Perspectives framework. My research does provide an indication of trouble spots. For example, it seems like the pathway from Notions to Recognition is holding but there are difficulties around the transition from Recognition to Formulation that cannot be resolved with this data. Nevertheless, this research has produced the following, which has contributed to the development of the ChemQuery assessment system, (a) 13 new change items with good fits, 3 new change items that need further study, (b) a refined scoring guide and (c) a set of item exemplars that can then be developed further into a computer-adapted model so that more data can be captured.

  18. Learning from the U.S. Department of Veterans Affairs Quality Enhancement Research Initiative: QUERI Series

    PubMed Central

    Graham, Ian D; Tetroe, Jacqueline

    2009-01-01

    As the recent collection of papers from the Quality Enhancement Research Initiative (QUERI) Series indicates, knowledge is leading to considerable action in the United States (U.S.) Department of Veterans Affairs (VA). The QUERI Series offers clinical researchers, implementation scientists, health systems, and health research funders from around the globe a unique window into the both the practice and science of implementation or knowledge translation (KT) in the VA. By describing successes and challenges as well as setbacks and disappointments, the QUERI Series is all the more useful. From the vantage point of Canadian KT researchers and officials at a national health research funding agency, we offer a number of observations and lessons that can be learned from QUERI. "Knowledge, if it does not determine action, is dead to us." Plotinus (Roman philosopher 205AD-270AD) PMID:19267920

  19. Cross-modal learning to rank via latent joint representation.

    PubMed

    Wu, Fei; Jiang, Xinyang; Li, Xi; Tang, Siliang; Lu, Weiming; Zhang, Zhongfei; Zhuang, Yueting

    2015-05-01

    Cross-modal ranking is a research topic that is imperative to many applications involving multimodal data. Discovering a joint representation for multimodal data and learning a ranking function are essential in order to boost the cross-media retrieval (i.e., image-query-text or text-query-image). In this paper, we propose an approach to discover the latent joint representation of pairs of multimodal data (e.g., pairs of an image query and a text document) via a conditional random field and structural learning in a listwise ranking manner. We call this approach cross-modal learning to rank via latent joint representation (CML²R). In CML²R, the correlations between multimodal data are captured in terms of their sharing hidden variables (e.g., topics), and a hidden-topic-driven discriminative ranking function is learned in a listwise ranking manner. The experiments show that the proposed approach achieves a good performance in cross-media retrieval and meanwhile has the capability to learn the discriminative representation of multimodal data.

  20. Collaborative Supervised Learning for Sensor Networks

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Rebbapragada, Umaa; Lane, Terran

    2011-01-01

    Collaboration methods for distributed machine-learning algorithms involve the specification of communication protocols for the learners, which can query other learners and/or broadcast their findings preemptively. Each learner incorporates information from its neighbors into its own training set, and they are thereby able to bootstrap each other to higher performance. Each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. After being seeded with an initial labeled training set, each learner proceeds to learn in an iterative fashion. New data is collected and classified. The learner can then either broadcast its most confident classifications for use by other learners, or can query neighbors for their classifications of its least confident items. As such, collaborative learning combines elements of both passive (broadcast) and active (query) learning. It also uses ideas from ensemble learning to combine the multiple responses to a given query into a single useful label. This approach has been evaluated against current non-collaborative alternatives, including training a single classifier and deploying it at all nodes with no further learning possible, and permitting learners to learn from their own most confident judgments, absent interaction with their neighbors. On several data sets, it has been consistently found that active collaboration is the best strategy for a distributed learner network. The main advantages include the ability for learning to take place autonomously by collaboration rather than by requiring intervention from an oracle (usually human), and also the ability to learn in a distributed environment, permitting decisions to be made in situ and to yield faster response time.

  1. Automated rule-base creation via CLIPS-Induce

    NASA Technical Reports Server (NTRS)

    Murphy, Patrick M.

    1994-01-01

    Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.

  2. Flexible Querying of Lifelong Learner Metadata

    ERIC Educational Resources Information Center

    Poulovassilis, A.; Selmer, P.; Wood, P. T.

    2012-01-01

    This paper discusses the provision of flexible querying facilities over heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities is to allow learners to identify possible choices for their future learning and professional development by seeing what others have done. We motivate and…

  3. Enabling Incremental Query Re-Optimization.

    PubMed

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  4. Enabling Incremental Query Re-Optimization

    PubMed Central

    Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

    2017-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658

  5. Active learning for noisy oracle via density power divergence.

    PubMed

    Sogawa, Yasuhiro; Ueno, Tsuyoshi; Kawahara, Yoshinobu; Washio, Takashi

    2013-10-01

    The accuracy of active learning is critically influenced by the existence of noisy labels given by a noisy oracle. In this paper, we propose a novel pool-based active learning framework through robust measures based on density power divergence. By minimizing density power divergence, such as β-divergence and γ-divergence, one can estimate the model accurately even under the existence of noisy labels within data. Accordingly, we develop query selecting measures for pool-based active learning using these divergences. In addition, we propose an evaluation scheme for these measures based on asymptotic statistical analyses, which enables us to perform active learning by evaluating an estimation error directly. Experiments with benchmark datasets and real-world image datasets show that our active learning scheme performs better than several baseline methods. Copyright © 2013 Elsevier Ltd. All rights reserved.

  6. Web document ranking via active learning and kernel principal component analysis

    NASA Astrophysics Data System (ADS)

    Cai, Fei; Chen, Honghui; Shu, Zhen

    2015-09-01

    Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.

  7. An Efficient Quantum Somewhat Homomorphic Symmetric Searchable Encryption

    NASA Astrophysics Data System (ADS)

    Sun, Xiaoqiang; Wang, Ting; Sun, Zhiwei; Wang, Ping; Yu, Jianping; Xie, Weixin

    2017-04-01

    In 2009, Gentry first introduced an ideal lattices fully homomorphic encryption (FHE) scheme. Later, based on the approximate greatest common divisor problem, learning with errors problem or learning with errors over rings problem, FHE has developed rapidly, along with the low efficiency and computational security. Combined with quantum mechanics, Liang proposed a symmetric quantum somewhat homomorphic encryption (QSHE) scheme based on quantum one-time pad, which is unconditional security. And it was converted to a quantum fully homomorphic encryption scheme, whose evaluation algorithm is based on the secret key. Compared with Liang's QSHE scheme, we propose a more efficient QSHE scheme for classical input states with perfect security, which is used to encrypt the classical message, and the secret key is not required in the evaluation algorithm. Furthermore, an efficient symmetric searchable encryption (SSE) scheme is constructed based on our QSHE scheme. SSE is important in the cloud storage, which allows users to offload search queries to the untrusted cloud. Then the cloud is responsible for returning encrypted files that match search queries (also encrypted), which protects users' privacy.

  8. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  9. A Semantic Basis for Proof Queries and Transformations

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen W.; Luth, Christoph

    2013-01-01

    We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that

  10. Data augmentation-assisted deep learning of hand-drawn partially colored sketches for visual search

    PubMed Central

    Muhammad, Khan; Baik, Sung Wook

    2017-01-01

    In recent years, image databases are growing at exponential rates, making their management, indexing, and retrieval, very challenging. Typical image retrieval systems rely on sample images as queries. However, in the absence of sample query images, hand-drawn sketches are also used. The recent adoption of touch screen input devices makes it very convenient to quickly draw shaded sketches of objects to be used for querying image databases. This paper presents a mechanism to provide access to visual information based on users’ hand-drawn partially colored sketches using touch screen devices. A key challenge for sketch-based image retrieval systems is to cope with the inherent ambiguity in sketches due to the lack of colors, textures, shading, and drawing imperfections. To cope with these issues, we propose to fine-tune a deep convolutional neural network (CNN) using augmented dataset to extract features from partially colored hand-drawn sketches for query specification in a sketch-based image retrieval framework. The large augmented dataset contains natural images, edge maps, hand-drawn sketches, de-colorized, and de-texturized images which allow CNN to effectively model visual contents presented to it in a variety of forms. The deep features extracted from CNN allow retrieval of images using both sketches and full color images as queries. We also evaluated the role of partial coloring or shading in sketches to improve the retrieval performance. The proposed method is tested on two large datasets for sketch recognition and sketch-based image retrieval and achieved better classification and retrieval performance than many existing methods. PMID:28859140

  11. Changes in science classrooms resulting from collaborative action research initiatives

    NASA Astrophysics Data System (ADS)

    Oh, Phil Seok

    Collaborative action research was undertaken over two years between a Korean science teacher and science education researchers at the University of Iowa. For the purpose of realizing science learning as envisioned by constructivist principles, Group-Investigations were implemented three or five times per project year. In addition, the second year project enacted Peer Assessments among students. Student perceptions of their science classrooms, as measured by the Constructivist Learning Environment Survey (CLES), provided evidence that the collaborative action research was successful in creating constructivist learning environments. Student attitudes toward science lessons, as examined by the Enjoyment of Science Lessons Scale (ESLS), indicated that the action research also contributed to developing more positive attitudes of students about science learning. Discourse analysis was conducted on video-recordings of in-class presentations and discussions. The results indicated that students in science classrooms which were moving toward constructivist learning environments engaged in such discursive practices as: (1) Communicating their inquiries to others, (2) Seeking and providing information through dialogues, and (3) Negotiating conflicts in their knowledge and beliefs. Based on these practices, science learning was viewed as the process of constructing knowledge and understanding of science as well as the process of engaging in scientific inquiry and discourse. The teacher's discursive practices included: (1) Wrapping up student presentations, (2) Addressing misconceptions, (3) Answering student queries, (4) Coaching, (5) Assessing and advising, (6) Guiding students discursively into new knowledge, and (7) Scaffolding. Science teaching was defined as situated acts of the teacher to facilitate the learning process. In particular, when the classrooms became more constructivist, the teacher intervened more frequently and carefully in student activities to fulfill a variety of pedagogical functions. Students perceived Group-Investigations and Peer Assessments as positive in that they contributed to realizing constructivist features in their classrooms. The students also reported that they gained several learning outcomes through Group-Investigations, including more positive attitudes, new knowledge, greater learning capabilities, and improved self-esteem. However, the Group-Investigation and Peer Assessment methods were perceived as negative and problematic by those who had rarely been exposed to such inquiry-based, student-centered approaches.

  12. Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.

    PubMed

    Tian, Yuling; Zhang, Hongxian

    2016-01-01

    For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.

  13. Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy

    PubMed Central

    Tian, Yuling; Zhang, Hongxian

    2016-01-01

    For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic–there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions. PMID:27487242

  14. Discriminative structural approaches for enzyme active-site prediction.

    PubMed

    Kato, Tsuyoshi; Nagano, Nozomi

    2011-02-15

    Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.

  15. Neuro-symbolic representation learning on biological knowledge graphs.

    PubMed

    Alshahrani, Mona; Khan, Mohammad Asif; Maddouri, Omar; Kinjo, Akira R; Queralt-Rosinach, Núria; Hoehndorf, Robert

    2017-09-01

    Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. https://github.com/bio-ontology-research-group/walking-rdf-and-owl. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  16. Efficient privacy-preserving string search and an application in genomics.

    PubMed

    Shimizu, Kana; Nuida, Koji; Rätsch, Gunnar

    2016-06-01

    Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within [Formula: see text] 4.6 s and [Formula: see text] 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  17. Efficient privacy-preserving string search and an application in genomics

    PubMed Central

    Shimizu, Kana; Nuida, Koji; Rätsch, Gunnar

    2016-01-01

    Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows–Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows–Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Contacts: shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153731

  18. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

    PubMed

    Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D

    2014-03-25

    A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

  19. Image Search Reranking With Hierarchical Topic Awareness.

    PubMed

    Tian, Xinmei; Yang, Linjun; Lu, Yijuan; Tian, Qi; Tao, Dacheng

    2015-10-01

    With much attention from both academia and industrial communities, visual search reranking has recently been proposed to refine image search results obtained from text-based image search engines. Most of the traditional reranking methods cannot capture both relevance and diversity of the search results at the same time. Or they ignore the hierarchical topic structure of search result. Each topic is treated equally and independently. However, in real applications, images returned for certain queries are naturally in hierarchical organization, rather than simple parallel relation. In this paper, a new reranking method "topic-aware reranking (TARerank)" is proposed. TARerank describes the hierarchical topic structure of search results in one model, and seamlessly captures both relevance and diversity of the image search results simultaneously. Through a structured learning framework, relevance and diversity are modeled in TARerank by a set of carefully designed features, and then the model is learned from human-labeled training samples. The learned model is expected to predict reranking results with high relevance and diversity for testing queries. To verify the effectiveness of the proposed method, we collect an image search dataset and conduct comparison experiments on it. The experimental results demonstrate that the proposed TARerank outperforms the existing relevance-based and diversified reranking methods.

  20. End-to-End ASR-Free Keyword Search From Speech

    NASA Astrophysics Data System (ADS)

    Audhkhasi, Kartik; Rosenberg, Andrew; Sethy, Abhinav; Ramabhadran, Bhuvana; Kingsbury, Brian

    2017-12-01

    End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end system for text query-based keyword search (KWS) from speech trained with minimal supervision. Our E2E KWS system consists of three sub-systems. The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation. The second sub-system is a character-level RNN language model using embeddings learned from a convolutional neural network. Since the acoustic and text query embeddings occupy different representation spaces, they are input to a third feed-forward neural network that predicts whether the query occurs in the acoustic utterance or not. This E2E ASR-free KWS system performs respectably despite lacking a conventional ASR system and trains much faster.

  1. An Intelligent System for Document Retrieval in Distributed Office Environments.

    ERIC Educational Resources Information Center

    Mukhopadhyay, Uttam; And Others

    1986-01-01

    MINDS (Multiple Intelligent Node Document Servers) is a distributed system of knowledge-based query engines for efficiently retrieving multimedia documents in an office environment of distributed workstations. By learning document distribution patterns and user interests and preferences during system usage, it customizes document retrievals for…

  2. Impact of Adapted Hypermedia on Undergraduate Students' Learning of Astronomy in an Elearning Environment

    NASA Astrophysics Data System (ADS)

    Zuel, Brian

    The purpose of this dissertation was to examine the effectiveness of matching learners' optimal learning styles to their overall knowledge retention. The study attempted to determine if learners who are placed in an online learning environment that matches their optimal learning styles will retain the information at a higher rate than those learners who are not in an adapted learning environment. There were 56 participants that took one of two lessons; the first lesson was textual based, had no hypertext, and was not influenced heavily by the coherence principle, while the second lesson was multimedia based utilizing hypermedia guided by the coherence principle. Each participant took Felder and Soloman's (1991, 2000) Index of Learning Styles (ILS) questionnaire and was classified using the Felder-Silverman Learning Style Model (FSLSM; 1998) into four individual categories. Groups were separated using the Visual/Verbal section of the FSLSM with 55% (n = 31) of participants going to the adapted group, and 45% (n =25) of participants going to the non-adapted group. Each participant completed an immediate posttest directly after the lesson and a retention posttest a week later. Several repeated measures MANOVA tests were conducted to measure the significance of differences in the tests between groups and within groups. Repeated measures MANOVA tests were conducted to determine if significance existed between the immediate posttest results and the retention posttest results. Also, participants were asked their perspectives if the lesson type they received was beneficial to their perceived learning of the material. Of the 56 students who took part in this study, 31 students were placed in the adapted group and 25 in the non-adapted group based on outcomes of the ILS and the FLSSM. No significant differences were found between groups taking the multimedia lesson and the textual lesson in the immediate posttest. No significant differences were found between the adapted and the non-adapted groups on the immediate posttest. No significant difference was found between the adapted and the non-adapted groups on the retention posttest. However, results also revealed that the adapted group scored significantly higher on the retention posttest when compared with the immediate posttest. Interestingly, the non-adapted group scored significantly higher on the immediate posttest when compared with the retention posttest. When queried about the perception of benefit of the lesson style, 42% of the adapted group replied in the affirmative following the immediate posttest, yet that percentage grew to 81% following the retention posttest. The non-adapted group had 28% reply in the affirmative following the immediate posttest, and that percentage grew to 48% following the retention posttest. Both groups found benefit, yet the numbers associated with the adapted group were higher. Overall perceptions of benefit corresponded to higher test scores as opposed to those who did not find benefit, who had a lower score.

  3. Mining Student Data Captured from a Web-Based Tutoring Tool: Initial Exploration and Results

    ERIC Educational Resources Information Center

    Merceron, Agathe; Yacef, Kalina

    2004-01-01

    In this article we describe the initial investigations that we have conducted on student data collected from a web-based tutoring tool. We have used some data mining techniques such as association rule and symbolic data analysis, as well as traditional SQL queries to gain further insight on the students' learning and deduce information to improve…

  4. Query-by-example surgical activity detection.

    PubMed

    Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

    2016-06-01

    Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall across different complexity datasets and experimental conditions.

  5. Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.

    PubMed

    Safari, Leila; Patrick, Jon D

    2018-06-01

    This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Building a semi-automatic ontology learning and construction system for geosciences

    NASA Astrophysics Data System (ADS)

    Babaie, H. A.; Sunderraman, R.; Zhu, Y.

    2013-12-01

    We are developing an ontology learning and construction framework that allows continuous, semi-automatic knowledge extraction, verification, validation, and maintenance by potentially a very large group of collaborating domain experts in any geosciences field. The system brings geoscientists from the side-lines to the center stage of ontology building, allowing them to collaboratively construct and enrich new ontologies, and merge, align, and integrate existing ontologies and tools. These constantly evolving ontologies can more effectively address community's interests, purposes, tools, and change. The goal is to minimize the cost and time of building ontologies, and maximize the quality, usability, and adoption of ontologies by the community. Our system will be a domain-independent ontology learning framework that applies natural language processing, allowing users to enter their ontology in a semi-structured form, and a combined Semantic Web and Social Web approach that lets direct participation of geoscientists who have no skill in the design and development of their domain ontologies. A controlled natural language (CNL) interface and an integrated authoring and editing tool automatically convert syntactically correct CNL text into formal OWL constructs. The WebProtege-based system will allow a potentially large group of geoscientists, from multiple domains, to crowd source and participate in the structuring of their knowledge model by sharing their knowledge through critiquing, testing, verifying, adopting, and updating of the concept models (ontologies). We will use cloud storage for all data and knowledge base components of the system, such as users, domain ontologies, discussion forums, and semantic wikis that can be accessed and queried by geoscientists in each domain. We will use NoSQL databases such as MongoDB as a service in the cloud environment. MongoDB uses the lightweight JSON format, which makes it convenient and easy to build Web applications using just HTML5 and Javascript, thereby avoiding cumbersome server side coding present in the traditional approaches. The JSON format used in MongoDB is also suitable for storing and querying RDF data. We will store the domain ontologies and associated linked data in JSON/RDF formats. Our Web interface will be built upon the open source and configurable WebProtege ontology editor. We will develop a simplified mobile version of our user interface which will automatically detect the hosting device and adjust the user interface layout to accommodate different screen sizes. We will also use the Semantic Media Wiki that allows the user to store and query the data within the wiki pages. By using HTML 5, JavaScript, and WebGL, we aim to create an interactive, dynamic, and multi-dimensional user interface that presents various geosciences data sets in a natural and intuitive way.

  7. NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News Articles.

    PubMed

    Tariq, Amara; Karim, Asim; Foroosh, Hassan

    2017-10-01

    Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities' network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.

  8. Visualizing and enhancing a deep learning framework using patients age and gender for chest x-ray image retrieval

    NASA Astrophysics Data System (ADS)

    Anavi, Yaron; Kogan, Ilya; Gelbart, Elad; Geva, Ofer; Greenspan, Hayit

    2016-03-01

    We explore the combination of text metadata, such as patients' age and gender, with image-based features, for X-ray chest pathology image retrieval. We focus on a feature set extracted from a pre-trained deep convolutional network shown in earlier work to achieve state-of-the-art results. Two distance measures are explored: a descriptor-based measure, which computes the distance between image descriptors, and a classification-based measure, which performed by a comparison of the corresponding SVM classification probabilities. We show that retrieval results increase once the age and gender information combined with the features extracted from the last layers of the network, with best results using the classification-based scheme. Visualization of the X-ray data is presented by embedding the high dimensional deep learning features in a 2-D dimensional space while preserving the pairwise distances using the t-SNE algorithm. The 2-D visualization gives the unique ability to find groups of X-ray images that are similar to the query image and among themselves, which is a characteristic we do not see in a 1-D traditional ranking.

  9. Bidirectional Active Learning: A Two-Way Exploration Into Unlabeled and Labeled Data Set.

    PubMed

    Zhang, Xiao-Yu; Wang, Shupeng; Yun, Xiaochun

    2015-12-01

    In practical machine learning applications, human instruction is indispensable for model construction. To utilize the precious labeling effort effectively, active learning queries the user with selective sampling in an interactive way. Traditional active learning techniques merely focus on the unlabeled data set under a unidirectional exploration framework and suffer from model deterioration in the presence of noise. To address this problem, this paper proposes a novel bidirectional active learning algorithm that explores into both unlabeled and labeled data sets simultaneously in a two-way process. For the acquisition of new knowledge, forward learning queries the most informative instances from unlabeled data set. For the introspection of learned knowledge, backward learning detects the most suspiciously unreliable instances within the labeled data set. Under the two-way exploration framework, the generalization ability of the learning model can be greatly improved, which is demonstrated by the encouraging experimental results.

  10. A proposal of fuzzy connective with learning function and its application to fuzzy retrieval system

    NASA Technical Reports Server (NTRS)

    Hayashi, Isao; Naito, Eiichi; Ozawa, Jun; Wakami, Noboru

    1993-01-01

    A new fuzzy connective and a structure of network constructed by fuzzy connectives are proposed to overcome a drawback of conventional fuzzy retrieval systems. This network represents a retrieval query and the fuzzy connectives in networks have a learning function to adjust its parameters by data from a database and outputs of a user. The fuzzy retrieval systems employing this network are also constructed. Users can retrieve results even with a query whose attributes do not exist in a database schema and can get satisfactory results for variety of thinkings by learning function.

  11. A comparative study for chest radiograph image retrieval using binary texture and deep learning classification.

    PubMed

    Anavi, Yaron; Kogan, Ilya; Gelbart, Elad; Geva, Ofer; Greenspan, Hayit

    2015-08-01

    In this work various approaches are investigated for X-ray image retrieval and specifically chest pathology retrieval. Given a query image taken from a data set of 443 images, the objective is to rank images according to similarity. Different features, including binary features, texture features, and deep learning (CNN) features are examined. In addition, two approaches are investigated for the retrieval task. One approach is based on the distance of image descriptors using the above features (hereon termed the "descriptor"-based approach); the second approach ("classification"-based approach) is based on a probability descriptor, generated by a pair-wise classification of each two classes (pathologies) and their decision values using an SVM classifier. Best results are achieved using deep learning features in a classification scheme.

  12. Evolution of Protein Lipograms: A Bioinformatics Problem

    ERIC Educational Resources Information Center

    White, Harold B., III; Dhurjati, Prasad

    2006-01-01

    A protein lacking one of the 20 common amino acids is a protein lipogram. This open-ended problem-based learning assignment deals with the evolution of proteins with biased amino acid composition. It has students query protein and metabolic databases to test the hypothesis that natural selection has reduced the frequency of each amino acid…

  13. A Comparison of Evaluation Metrics for Biomedical Journals, Articles, and Websites in Terms of Sensitivity to Topic

    PubMed Central

    Fu, Lawrence D.; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F.

    2011-01-01

    Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed’s clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations. PMID:21419864

  14. A topic clustering approach to finding similar questions from large question and answer archives.

    PubMed

    Zhang, Wei-Nan; Liu, Ting; Yang, Yang; Cao, Liujuan; Zhang, Yu; Ji, Rongrong

    2014-01-01

    With the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a large number of question and answer (Q&A) pairs with high quality devoted by human intelligence have been accumulated as a comprehensive knowledge base. Unlike the search engines, which return long lists of results, searching in the CQA services can obtain the correct answers to the question queries by automatically finding similar questions that have already been answered by other users. Hence, it greatly improves the efficiency of the online information retrieval. However, given a question query, finding the similar and well-answered questions is a non-trivial task. The main challenge is the word mismatch between question query (query) and candidate question for retrieval (question). To investigate this problem, in this study, we capture the word semantic similarity between query and question by introducing the topic modeling approach. We then propose an unsupervised machine-learning approach to finding similar questions on CQA Q&A archives. The experimental results show that our proposed approach significantly outperforms the state-of-the-art methods.

  15. Identification of Functionally Related Enzymes by Learning-to-Rank Methods.

    PubMed

    Stock, Michiel; Fober, Thomas; Hüllermeier, Eyke; Glinca, Serghei; Klebe, Gerhard; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

    2014-01-01

    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.

  16. A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

    PubMed Central

    Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia

    2013-01-01

    Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008

  17. Do-It-Yourself: A Special Library's Approach to Creating Dynamic Web Pages Using Commercial Off-The-Shelf Applications

    NASA Technical Reports Server (NTRS)

    Steeman, Gerald; Connell, Christopher

    2000-01-01

    Many librarians may feel that dynamic Web pages are out of their reach, financially and technically. Yet we are reminded in library and Web design literature that static home pages are a thing of the past. This paper describes how librarians at the Institute for Defense Analyses (IDA) library developed a database-driven, dynamic intranet site using commercial off-the-shelf applications. Administrative issues include surveying a library users group for interest and needs evaluation; outlining metadata elements; and, committing resources from managing time to populate the database and training in Microsoft FrontPage and Web-to-database design. Technical issues covered include Microsoft Access database fundamentals, lessons learned in the Web-to-database process (including setting up Database Source Names (DSNs), redesigning queries to accommodate the Web interface, and understanding Access 97 query language vs. Standard Query Language (SQL)). This paper also offers tips on editing Active Server Pages (ASP) scripting to create desired results. A how-to annotated resource list closes out the paper.

  18. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  19. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  20. Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking.

    PubMed

    Yu, Jun; Yang, Xiaokang; Gao, Fei; Tao, Dacheng

    2017-12-01

    How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.

  1. Demonstration of quantum advantage in machine learning

    NASA Astrophysics Data System (ADS)

    Ristè, Diego; da Silva, Marcus P.; Ryan, Colm A.; Cross, Andrew W.; Córcoles, Antonio D.; Smolin, John A.; Gambetta, Jay M.; Chow, Jerry M.; Johnson, Blake R.

    2017-04-01

    The main promise of quantum computing is to efficiently solve certain problems that are prohibitively expensive for a classical computer. Most problems with a proven quantum advantage involve the repeated use of a black box, or oracle, whose structure encodes the solution. One measure of the algorithmic performance is the query complexity, i.e., the scaling of the number of oracle calls needed to find the solution with a given probability. Few-qubit demonstrations of quantum algorithms, such as Deutsch-Jozsa and Grover, have been implemented across diverse physical systems such as nuclear magnetic resonance, trapped ions, optical systems, and superconducting circuits. However, at the small scale, these problems can already be solved classically with a few oracle queries, limiting the obtained advantage. Here we solve an oracle-based problem, known as learning parity with noise, on a five-qubit superconducting processor. Executing classical and quantum algorithms using the same oracle, we observe a large gap in query count in favor of quantum processing. We find that this gap grows by orders of magnitude as a function of the error rates and the problem size. This result demonstrates that, while complex fault-tolerant architectures will be required for universal quantum computing, a significant quantum advantage already emerges in existing noisy systems.

  2. Effective use of pause procedure to enhance student engagement and learning.

    PubMed

    Bachhel, Rachna; Thaman, Richa Ghay

    2014-08-01

    Active learning strategies have been documented to enhance learning. We created an active learning environment in neuromuscular physiology lectures for first year medical students by using 'Pause Procedure'. One hundred and fifty medical students class is divided into two Groups (Group A and Group B) and taught in different classes. Each lecture of group A (experimental Group) undergraduate first year medical students was divided into short presentations of 12-15 min each. Each presentation was followed by a pause of 2-3min, three times in a 50 min lecture. During the pauses students worked in pairs to discuss and rework their notes. Any queries were directed towards the teacher and discussed forthwith. At the end of each lecture students were given 2-3 minutes to write down the key points they remembered about the lecture (free-recall). Fifteen days after completion of the lectures a 30 item MCQ test was administered to measure long term recall. Group B (control Group) received the same lectures without the use of pause procedure and was similarly tested. Experimental Group students did significantly better on the MCQ test (p-value<0.05) in comparison to the control Group. Most of the students (83.6%) agreed that the 'pause procedure' helped them to enhance lecture recall. Pause procedure is a good active learning strategy which helps students review their notes, reflect on them, discuss and explain the key ideas with their partners. Moreover, it requires only 6-7 min of the classroom time and can significantly enhance student learning.

  3. Effective Use of Pause Procedure to Enhance Student Engagement and Learning

    PubMed Central

    Thaman, Richa Ghay

    2014-01-01

    Introduction: Active learning strategies have been documented to enhance learning. We created an active learning environment in neuromuscular physiology lectures for first year medical students by using ‘Pause Procedure’. Materials and Methods: One hundred and fifty medical students class is divided into two Groups (Group A and Group B) and taught in different classes. Each lecture of group A (experimental Group) undergraduate first year medical students was divided into short presentations of 12-15 min each. Each presentation was followed by a pause of 2-3min, three times in a 50 min lecture. During the pauses students worked in pairs to discuss and rework their notes. Any queries were directed towards the teacher and discussed forthwith. At the end of each lecture students were given 2-3 minutes to write down the key points they remembered about the lecture (free-recall). Fifteen days after completion of the lectures a 30 item MCQ test was administered to measure long term recall. Group B (control Group) received the same lectures without the use of pause procedure and was similarly tested. Results: Experimental Group students did significantly better on the MCQ test (p-value<0.05) in comparison to the control Group. Most of the students (83.6%) agreed that the ‘pause procedure’ helped them to enhance lecture recall. Conclusion: Pause procedure is a good active learning strategy which helps students review their notes, reflect on them, discuss and explain the key ideas with their partners. Moreover, it requires only 6-7 min of the classroom time and can significantly enhance student learning. PMID:25302251

  4. Active Learning with Irrelevant Examples

    NASA Technical Reports Server (NTRS)

    Mazzoni, Dominic; Wagstaff, Kiri L.; Burl, Michael

    2006-01-01

    Active learning algorithms attempt to accelerate the learning process by requesting labels for the most informative items first. In real-world problems, however, there may exist unlabeled items that are irrelevant to the user's classification goals. Queries about these points slow down learning because they provide no information about the problem of interest. We have observed that when irrelevant items are present, active learning can perform worse than random selection, requiring more time (queries) to achieve the same level of accuracy. Therefore, we propose a novel approach, Relevance Bias, in which the active learner combines its default selection heuristic with the output of a simultaneously trained relevance classifier to favor items that are likely to be both informative and relevant. In our experiments on a real-world problem and two benchmark datasets, the Relevance Bias approach significantly improved the learning rate of three different active learning approaches.

  5. A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

    PubMed

    Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-10-08

    Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.

  6. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  7. Query-Biased Preview over Outsourced and Encrypted Data

    PubMed Central

    Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798

  8. Query-biased preview over outsourced and encrypted data.

    PubMed

    Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.

  9. Binary Multidimensional Scaling for Hashing.

    PubMed

    Huang, Yameng; Lin, Zhouchen

    2017-10-04

    Hashing is a useful technique for fast nearest neighbor search due to its low storage cost and fast query speed. Unsupervised hashing aims at learning binary hash codes for the original features so that the pairwise distances can be best preserved. While several works have targeted on this task, the results are not satisfactory mainly due to the oversimplified model. In this paper, we propose a unified and concise unsupervised hashing framework, called Binary Multidimensional Scaling (BMDS), which is able to learn the hash code for distance preservation in both batch and online mode. In the batch mode, unlike most existing hashing methods, we do not need to simplify the model by predefining the form of hash map. Instead, we learn the binary codes directly based on the pairwise distances among the normalized original features by Alternating Minimization. This enables a stronger expressive power of the hash map. In the online mode, we consider the holistic distance relationship between current query example and those we have already learned, rather than only focusing on current data chunk. It is useful when the data come in a streaming fashion. Empirical results show that while being efficient for training, our algorithm outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.

  10. RDF-GL: A SPARQL-Based Graphical Query Language for RDF

    NASA Astrophysics Data System (ADS)

    Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.

  11. A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

    PubMed Central

    Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-01-01

    Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078

  12. To compare PubMed Clinical Queries and UpToDate in teaching information mastery to clinical residents: a crossover randomized controlled trial.

    PubMed

    Sayyah Ensan, Ladan; Faghankhani, Masoomeh; Javanbakht, Anna; Ahmadi, Seyed-Foad; Baradaran, Hamid Reza

    2011-01-01

    To compare PubMed Clinical Queries and UpToDate regarding the amount and speed of information retrieval and users' satisfaction. A cross-over randomized trial was conducted in February 2009 in Tehran University of Medical Sciences that included 44 year-one or two residents who participated in an information mastery workshop. A one-hour lecture on the principles of information mastery was organized followed by self learning slide shows before using each database. Subsequently, participants were randomly assigned to answer 2 clinical scenarios using either UpToDate or PubMed Clinical Queries then crossed to use the other database to answer 2 different clinical scenarios. The proportion of relevantly answered clinical scenarios, time to answer retrieval, and users' satisfaction were measured in each database. Based on intention-to-treat analysis, participants retrieved the answer of 67 (76%) questions using UpToDate and 38 (43%) questions using PubMed Clinical Queries (P<0.001). The median time to answer retrieval was 17 min (95% CI: 16 to 18) using UpToDate compared to 29 min (95% CI: 26 to 32) using PubMed Clinical Queries (P<0.001). The satisfaction with the accuracy of retrieved answers, interaction with UpToDate and also overall satisfaction were higher among UpToDate users compared to PubMed Clinical Queries users (P<0.001). For first time users, using UpToDate compared to Pubmed Clinical Queries can lead to not only a higher proportion of relevant answer retrieval within a shorter time, but also a higher users' satisfaction. So, addition of tutoring pre-appraised sources such as UpToDate to the information mastery curricula seems to be highly efficient.

  13. Automatic generation of investigator bibliographies for institutional research networking systems.

    PubMed

    Johnson, Stephen B; Bales, Michael E; Dine, Daniel; Bakken, Suzanne; Albert, Paul J; Weng, Chunhua

    2014-10-01

    Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Automatic generation of investigator bibliographies for institutional research networking systems

    PubMed Central

    Johnson, Stephen B.; Bales, Michael E.; Dine, Daniel; Bakken, Suzanne; Albert, Paul J.; Weng, Chunhua

    2014-01-01

    Objective Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. Methods ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. Results About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss’ kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. Discussion ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators. PMID:24694772

  15. Mining the SDSS SkyServer SQL queries log

    NASA Astrophysics Data System (ADS)

    Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

    2016-05-01

    SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

  16. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  17. Coaching the exploration and exploitation in active learning for interactive video retrieval.

    PubMed

    Wei, Xiao-Yong; Yang, Zhen-Qun

    2013-03-01

    Conventional active learning approaches for interactive video/image retrieval usually assume the query distribution is unknown, as it is difficult to estimate with only a limited number of labeled instances available. Thus, it is easy to put the system in a dilemma whether to explore the feature space in uncertain areas for a better understanding of the query distribution or to harvest in certain areas for more relevant instances. In this paper, we propose a novel approach called coached active learning that makes the query distribution predictable through training and, therefore, avoids the risk of searching on a completely unknown space. The estimated distribution, which provides a more global view of the feature space, can be used to schedule not only the timing but also the step sizes of the exploration and the exploitation in a principled way. The results of the experiments on a large-scale data set from TRECVID 2005-2009 validate the efficiency and effectiveness of our approach, which demonstrates an encouraging performance when facing domain-shift, outperforms eight conventional active learning methods, and shows superiority to six state-of-the-art interactive video retrieval systems.

  18. A comparison of evaluation metrics for biomedical journals, articles, and websites in terms of sensitivity to topic.

    PubMed

    Fu, Lawrence D; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F

    2011-08-01

    Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed's clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic-adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Manually Classifying User Search Queries on an Academic Library Web Site

    ERIC Educational Resources Information Center

    Chapman, Suzanne; Desai, Shevon; Hagedorn, Kat; Varnum, Ken; Mishra, Sonali; Piacentine, Julie

    2013-01-01

    The University of Michigan Library wanted to learn more about the kinds of searches its users were conducting through the "one search" search box on the Library Web site. Library staff conducted two investigations. A preliminary investigation in 2011 involved the manual review of the 100 most frequently occurring queries conducted…

  20. Manchester visual query language

    NASA Astrophysics Data System (ADS)

    Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

    1993-04-01

    We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.

  1. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  2. Collaborative E-Learning Using Semantic Course Blog

    ERIC Educational Resources Information Center

    Lu, Lai-Chen; Yeh, Ching-Long

    2008-01-01

    Collaborative e-learning delivers many enhancements to e-learning technology; it enables students to collaborate with each other and improves their learning efficiency. Semantic blog combines semantic Web and blog technology that users can import, export, view, navigate, and query the blog. We developed a semantic course blog for collaborative…

  3. Optimizing Maintenance of Constraint-Based Database Caches

    NASA Astrophysics Data System (ADS)

    Klein, Joachim; Braun, Susanne

    Caching data reduces user-perceived latency and often enhances availability in case of server crashes or network failures. DB caching aims at local processing of declarative queries in a DBMS-managed cache close to the application. Query evaluation must produce the same results as if done at the remote database backend, which implies that all data records needed to process such a query must be present and controlled by the cache, i. e., to achieve “predicate-specific” loading and unloading of such record sets. Hence, cache maintenance must be based on cache constraints such that “predicate completeness” of the caching units currently present can be guaranteed at any point in time. We explore how cache groups can be maintained to provide the data currently needed. Moreover, we design and optimize loading and unloading algorithms for sets of records keeping the caching units complete, before we empirically identify the costs involved in cache maintenance.

  4. An energy-aware routing protocol for query-based applications in wireless sensor networks.

    PubMed

    Ahvar, Ehsan; Ahvar, Shohreh; Lee, Gyu Myoung; Crespi, Noel

    2014-01-01

    Wireless sensor network (WSN) typically has energy consumption restriction. Designing energy-aware routing protocol can significantly reduce energy consumption in WSNs. Energy-aware routing protocols can be classified into two categories, energy savers and energy balancers. Energy saving protocols are used to minimize the overall energy consumed by a WSN, while energy balancing protocols attempt to efficiently distribute the consumption of energy throughout the network. In general terms, energy saving protocols are not necessarily good at balancing energy consumption and energy balancing protocols are not always good at reducing energy consumption. In this paper, we propose an energy-aware routing protocol (ERP) for query-based applications in WSNs, which offers a good trade-off between traditional energy balancing and energy saving objectives and supports a soft real time packet delivery. This is achieved by means of fuzzy sets and learning automata techniques along with zonal broadcasting to decrease total energy consumption.

  5. An Energy-Aware Routing Protocol for Query-Based Applications in Wireless Sensor Networks

    PubMed Central

    Crespi, Noel

    2014-01-01

    Wireless sensor network (WSN) typically has energy consumption restriction. Designing energy-aware routing protocol can significantly reduce energy consumption in WSNs. Energy-aware routing protocols can be classified into two categories, energy savers and energy balancers. Energy saving protocols are used to minimize the overall energy consumed by a WSN, while energy balancing protocols attempt to efficiently distribute the consumption of energy throughout the network. In general terms, energy saving protocols are not necessarily good at balancing energy consumption and energy balancing protocols are not always good at reducing energy consumption. In this paper, we propose an energy-aware routing protocol (ERP) for query-based applications in WSNs, which offers a good trade-off between traditional energy balancing and energy saving objectives and supports a soft real time packet delivery. This is achieved by means of fuzzy sets and learning automata techniques along with zonal broadcasting to decrease total energy consumption. PMID:24696640

  6. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool involving the physicians from several hospitals in Latvia and working with real data from these hospitals. Our third step is to develop an efficient implementation of the query language.

  7. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  8. An Approach for Externalization of Expert Tacit Knowledge Using a Query Management System in an E-Learning Environment

    ERIC Educational Resources Information Center

    Khan, Abdul Azeez; Khader, Sheik Abdul

    2014-01-01

    E-learning or electronic learning platforms facilitate delivery of the knowledge spectrum to the learning community through information and communication technologies. The transfer of knowledge takes place from experts to learners, and externalization of the knowledge transfer is significant. In the e-learning environment, the learners seek…

  9. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  10. Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?

    PubMed

    Kendra, Rachel Lynn; Karki, Suman; Eickholt, Jesse Lee; Gandy, Lisa

    2015-06-19

    User content posted through Twitter has been used for biosurveillance, to characterize public perception of health-related topics, and as a means of distributing information to the general public. Most of the existing work surrounding Twitter and health care has shown Twitter to be an effective medium for these problems but more could be done to provide finer and more efficient access to all pertinent data. Given the diversity of user-generated content, small samples or summary presentations of the data arguably omit a large part of the virtual discussion taking place in the Twittersphere. Still, managing, processing, and querying large amounts of Twitter data is not a trivial task. This work describes tools and techniques capable of handling larger sets of Twitter data and demonstrates their use with the issue of antibiotics. This work has two principle objectives: (1) to provide an open-source means to efficiently explore all collected tweets and query health-related topics on Twitter, specifically, questions such as what users are saying and how messages are spread, and (2) to characterize the larger discourse taking place on Twitter with respect to antibiotics. Open-source software suites Hadoop, Flume, and Hive were used to collect and query a large number of Twitter posts. To classify tweets by topic, a deep network classifier was trained using a limited number of manually classified tweets. The particular machine learning approach used also allowed the use of a large number of unclassified tweets to increase performance. Query-based analysis of the collected tweets revealed that a large number of users contributed to the online discussion and that a frequent topic mentioned was resistance. A number of prominent events related to antibiotics led to a number of spikes in activity but these were short in duration. The category-based classifier developed was able to correctly classify 70% of manually labeled tweets (using a 10-fold cross validation procedure and 9 classes). The classifier also performed well when evaluated on a per category basis. Using existing tools such as Hive, Flume, Hadoop, and machine learning techniques, it is possible to construct tools and workflows to collect and query large amounts of Twitter data to characterize the larger discussion taking place on Twitter with respect to a particular health-related topic. Furthermore, using newer machine learning techniques and a limited number of manually labeled tweets, an entire body of collected tweets can be classified to indicate what topics are driving the virtual, online discussion. The resulting classifier can also be used to efficiently explore collected tweets by category and search for messages of interest or exemplary content.

  11. Features: Real-Time Adaptive Feature and Document Learning for Web Search.

    ERIC Educational Resources Information Center

    Chen, Zhixiang; Meng, Xiannong; Fowler, Richard H.; Zhu, Binhai

    2001-01-01

    Describes Features, an intelligent Web search engine that is able to perform real-time adaptive feature (i.e., keyword) and document learning. Explains how Features learns from users' document relevance feedback and automatically extracts and suggests indexing keywords relevant to a search query, and learns from users' keyword relevance feedback…

  12. TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gu, Shengyin; Anderson, Iain; Kunin, Victor

    2007-05-07

    Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

  13. Implementation of a Clinical Documentation Improvement Curriculum Improves Quality Metrics and Hospital Charges in an Academic Surgery Department.

    PubMed

    Reyes, Cynthia; Greenbaum, Alissa; Porto, Catherine; Russell, John C

    2017-03-01

    Accurate clinical documentation (CD) is necessary for many aspects of modern health care, including excellent communication, quality metrics reporting, and legal documentation. New requirements have mandated adoption of ICD-10-CM coding systems, adding another layer of complexity to CD. A clinical documentation improvement (CDI) and ICD-10 training program was created for health care providers in our academic surgery department. We aimed to assess the impact of our CDI curriculum by comparing quality metrics, coding, and reimbursement before and after implementation of our CDI program. A CDI/ICD-10 training curriculum was instituted in September 2014 for all members of our university surgery department. The curriculum consisted of didactic lectures, 1-on-1 provider training, case reviews, e-learning modules, and CD queries from nurse CDI staff and hospital coders. Outcomes parameters included monthly documentation completion rates, severity of illness (SOI), risk of mortality (ROM), case-mix index (CMI), all-payer refined diagnosis-related groups (APR-DRG), and Surgical Care Improvement Program (SCIP) metrics. Financial gain from responses to CDI queries was determined retrospectively. Surgery department delinquent documentation decreased by 85% after CDI implementation. Compliance with SCIP measures improved from 85% to 97%. Significant increases in surgical SOI, ROM, CMI, and APR-DRG (all p < 0.01) were found after CDI/ICD-10 training implementation. Provider responses to CDI queries resulted in an estimated $4,672,786 increase in charges. Clinical documentation improvement/ICD-10 training in an academic surgery department is an effective method to improve documentation rates, increase the hospital estimated reimbursement based on more accurate CD, and provide better compliance with surgical quality measures. Copyright © 2016 American College of Surgeons. All rights reserved.

  14. Building a Smart Portal for Astronomy

    NASA Astrophysics Data System (ADS)

    Derriere, S.; Boch, T.

    2011-07-01

    The development of a portal for accessing astronomical resources is not an easy task. The ever-increasing complexity of the data products can result in very complex user interfaces, requiring a lot of effort and learning from the user in order to perform searches. This is often a design choice, where the user must explicitly set many constraints, while the portal search logic remains simple. We investigated a different approach, where the query interface is kept as simple as possible (ideally, a simple text field, like for Google search), and the search logic is made much more complex to interpret the query in a relevant manner. We will present the implications of this approach in terms of interpretation and categorization of the query parameters (related to astronomical vocabularies), translation (mapping) of these concepts into the portal components metadata, identification of query schemes and use cases matching the input parameters, and delivery of query results to the user.

  15. Targeted exploration and analysis of large cross-platform human transcriptomic compendia

    PubMed Central

    Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

    2016-01-01

    We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801

  16. Active Learning with Irrelevant Examples

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri; Mazzoni, Dominic

    2009-01-01

    An improved active learning method has been devised for training data classifiers. One example of a data classifier is the algorithm used by the United States Postal Service since the 1960s to recognize scans of handwritten digits for processing zip codes. Active learning algorithms enable rapid training with minimal investment of time on the part of human experts to provide training examples consisting of correctly classified (labeled) input data. They function by identifying which examples would be most profitable for a human expert to label. The goal is to maximize classifier accuracy while minimizing the number of examples the expert must label. Although there are several well-established methods for active learning, they may not operate well when irrelevant examples are present in the data set. That is, they may select an item for labeling that the expert simply cannot assign to any of the valid classes. In the context of classifying handwritten digits, the irrelevant items may include stray marks, smudges, and mis-scans. Querying the expert about these items results in wasted time or erroneous labels, if the expert is forced to assign the item to one of the valid classes. In contrast, the new algorithm provides a specific mechanism for avoiding querying the irrelevant items. This algorithm has two components: an active learner (which could be a conventional active learning algorithm) and a relevance classifier. The combination of these components yields a method, denoted Relevance Bias, that enables the active learner to avoid querying irrelevant data so as to increase its learning rate and efficiency when irrelevant items are present. The algorithm collects irrelevant data in a set of rejected examples, then trains the relevance classifier to distinguish between labeled (relevant) training examples and the rejected ones. The active learner combines its ranking of the items with the probability that they are relevant to yield a final decision about which item to present to the expert for labeling. Experiments on several data sets have demonstrated that the Relevance Bias approach significantly decreases the number of irrelevant items queried and also accelerates learning speed.

  17. Flipped Learning: Can Rheumatology Lead the Shift in Medical Education?

    PubMed

    El Miedany, Yasser; El Gaafary, Maha; El Aroussy, Nadia; Youssef, Sally

    2018-04-16

    To: 1. implement flipped classroom rheumatology teaching for undergraduate education. 2. Evaluate outcomes of teaching using OSCE assessment and student perceived effectiveness and satisfaction survey. The flipped classroom education was conducted in 3 phases. Phase 1: carried out in the students' own time. Web links were emailed to assist exposure of the instructional part of the lesson online. Phase 2: interactive in-class activity to share personal reflection and reinforce the key aspects. Phase 3: a simulated OSCE assessment. A cohort of 56-students, who were taught in the last educational year on the same topics according to standard teaching protocols, were included as control group. The clinical Outcomes were assessed using the scores of the OSCE examination model. Academic outcomes included the engagement measure as well as the students' answers to perceived effectiveness and satisfaction survey. There was no significant difference regarding demographics between the 2 students' groups. There was significant improvement (p< 0.05) in the flipped learning, in contrast to the control group, in terms of clinical (OSCE score) as well as communication skills. Student perceived effectiveness and satisfaction was significantly higher among the flipped learning (p< 0.05). Scores from the flipped learning cohort showed a state of engagement significantly higher than the control group (p< 0.01). Flipped learning implementation musculoskeletal learning successfully demonstrated a promising platform for using technology to make better use of the students' time, and for increasing their satisfaction. Active learning increases student engagement and can lead to improved retention of knowledge. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Personalized query suggestion based on user behavior

    NASA Astrophysics Data System (ADS)

    Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

    Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

  19. START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

    PubMed

    Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y

    2017-09-22

    A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.

  20. To Compare PubMed Clinical Queries and UpToDate in Teaching Information Mastery to Clinical Residents: A Crossover Randomized Controlled Trial

    PubMed Central

    Sayyah Ensan, Ladan; Faghankhani, Masoomeh; Javanbakht, Anna; Ahmadi, Seyed-Foad; Baradaran, Hamid Reza

    2011-01-01

    Purpose To compare PubMed Clinical Queries and UpToDate regarding the amount and speed of information retrieval and users' satisfaction. Method A cross-over randomized trial was conducted in February 2009 in Tehran University of Medical Sciences that included 44 year-one or two residents who participated in an information mastery workshop. A one-hour lecture on the principles of information mastery was organized followed by self learning slide shows before using each database. Subsequently, participants were randomly assigned to answer 2 clinical scenarios using either UpToDate or PubMed Clinical Queries then crossed to use the other database to answer 2 different clinical scenarios. The proportion of relevantly answered clinical scenarios, time to answer retrieval, and users' satisfaction were measured in each database. Results Based on intention-to-treat analysis, participants retrieved the answer of 67 (76%) questions using UpToDate and 38 (43%) questions using PubMed Clinical Queries (P<0.001). The median time to answer retrieval was 17 min (95% CI: 16 to 18) using UpToDate compared to 29 min (95% CI: 26 to 32) using PubMed Clinical Queries (P<0.001). The satisfaction with the accuracy of retrieved answers, interaction with UpToDate and also overall satisfaction were higher among UpToDate users compared to PubMed Clinical Queries users (P<0.001). Conclusions For first time users, using UpToDate compared to Pubmed Clinical Querries can lead to not only a higher proportion of relevant answer retrieval within a shorter time, but also a higher users' satisfaction. So, addition of tutoring pre-appraised sources such as UpToDate to the information mastery curricula seems to be highly efficient. PMID:21858142

  1. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

    2009-12-01

    The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.

  2. Big Data Analytics with Datalog Queries on Spark.

    PubMed

    Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo

    2016-01-01

    There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.

  3. Big Data Analytics with Datalog Queries on Spark

    PubMed Central

    Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo

    2017-01-01

    There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296

  4. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  5. Game-powered machine learning

    PubMed Central

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-01-01

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  6. Game-powered machine learning.

    PubMed

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  7. Fast Query-Optimized Kernel-Machine Classification

    NASA Technical Reports Server (NTRS)

    Mazzoni, Dominic; DeCoste, Dennis

    2004-01-01

    A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.

  8. Interactive communication with the public: qualitative exploration of the use of social media by food and health organizations.

    PubMed

    Shan, Liran Christine; Panagiotopoulos, Panagiotis; Regan, Áine; De Brún, Aoife; Barnett, Julie; Wall, Patrick; McConnon, Áine

    2015-01-01

    To examine the use and impact of social media on 2-way communication between consumers and public organizations in the food safety and nutrition area. In-depth qualitative study conducted between October, 2012 and January, 2013, using semi-structured interviews in the United Kingdom and Ireland. Sixteen professionals worked on the public interface within 5 national organizations with a role in communicating on food safety and nutrition issues in this thematic analysis. Five main themes were identified: gradual shift toward social media-based queries and complaints; challenges and limitations of social media to deal with queries and complaints; benefits of using social media in query and complaint services; content redesign driven by social media use; and using social media to learn more about consumers. Social media penetrated and brought new opportunities to food organizations' interactions with the public. Given the increasing use of social media by the public, food organizations need to explore such new opportunities for communication and research. Copyright © 2015 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.

  9. From headache to tumour: An examination of health anxiety, health-related Internet use and 'query escalation'.

    PubMed

    Singh, Karmpaul; Brown, Richard J

    2016-09-01

    The current study aimed to explore the phenomenon of disease-related 'query escalation' in high/low health anxious Internet users (N = 40). During a 15-minute health-related Internet search, participants rated their anxiety and the perceived seriousness of information on each page. Post-search interviews determined the reasons for, and effects of, escalating queries to consider serious diseases. Both groups were found to be significantly more anxious after escalating queries. The high group was significantly more likely to escalate queries. Evaluating personal relevance of material was the main reason for escalations and moderated anxiety post-escalation. We conclude that searching for online disease information can increase anxiety, particularly for people worried about their health. © The Author(s) 2015.

  10. Noise-tolerant parity learning with one quantum bit

    NASA Astrophysics Data System (ADS)

    Park, Daniel K.; Rhee, June-Koo K.; Lee, Soonchil

    2018-03-01

    Demonstrating quantum advantage with less powerful but more realistic devices is of great importance in modern quantum information science. Recently, a significant quantum speedup was achieved in the problem of learning a hidden parity function with noise. However, if all data qubits at the query output are completely depolarized, the algorithm fails. In this work, we present a quantum parity learning algorithm that exhibits quantum advantage as long as one qubit is provided with nonzero polarization in each query. In this scenario, the quantum parity learning naturally becomes deterministic quantum computation with one qubit. Then the hidden parity function can be revealed by performing a set of operations that can be interpreted as measuring nonlocal observables on the auxiliary result qubit having nonzero polarization and each data qubit. We also discuss the source of the quantum advantage in our algorithm from the resource-theoretic point of view.

  11. Physical and Psychological Well-Being and University Student Satisfaction with E-Learning

    ERIC Educational Resources Information Center

    Johnson, Genevieve Marie

    2015-01-01

    Although research establishes that student characteristics exert considerable influence on learning outcomes, research concerned with e-learning satisfaction most typically focuses of factors associated with instructional design, curriculum and pedagogy. Fifty-eight first-year university e-students completed an online survey that queried their…

  12. The role of economics in the QUERI program: QUERI Series

    PubMed Central

    Smith, Mark W; Barnett, Paul G

    2008-01-01

    Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199

  13. The role of economics in the QUERI program: QUERI Series.

    PubMed

    Smith, Mark W; Barnett, Paul G

    2008-04-22

    The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  14. Querying and Ranking XML Documents.

    ERIC Educational Resources Information Center

    Schlieder, Torsten; Meuss, Holger

    2002-01-01

    Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…

  15. Lessons learned from participating in D3R 2016 Grand Challenge 2: compounds targeting the farnesoid X receptor

    NASA Astrophysics Data System (ADS)

    Duan, Rui; Xu, Xianjin; Zou, Xiaoqin

    2018-01-01

    D3R 2016 Grand Challenge 2 focused on predictions of binding modes and affinities for 102 compounds against the farnesoid X receptor (FXR). In this challenge, two distinct methods, a docking-based method and a template-based method, were employed by our team for the binding mode prediction. For the new template-based method, 3D ligand similarities were calculated for each query compound against the ligands in the co-crystal structures of FXR available in Protein Data Bank. The binding mode was predicted based on the co-crystal protein structure containing the ligand with the best ligand similarity score against the query compound. For the FXR dataset, the template-based method achieved a better performance than the docking-based method on the binding mode prediction. For the binding affinity prediction, an in-house knowledge-based scoring function ITScore2 and MM/PBSA approach were employed. Good performance was achieved for MM/PBSA, whereas the performance of ITScore2 was sensitive to ligand composition, e.g. the percentage of carbon atoms in the compounds. The sensitivity to ligand composition could be a clue for the further improvement of our knowledge-based scoring function.

  16. In-context query reformulation for failing SPARQL queries

    NASA Astrophysics Data System (ADS)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  17. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  18. Deep Correlated Holistic Metric Learning for Sketch-Based 3D Shape Retrieval.

    PubMed

    Dai, Guoxian; Xie, Jin; Fang, Yi

    2018-07-01

    How to effectively retrieve desired 3D models with simple queries is a long-standing problem in computer vision community. The model-based approach is quite straightforward but nontrivial, since people could not always have the desired 3D query model available by side. Recently, large amounts of wide-screen electronic devices are prevail in our daily lives, which makes the sketch-based 3D shape retrieval a promising candidate due to its simpleness and efficiency. The main challenge of sketch-based approach is the huge modality gap between sketch and 3D shape. In this paper, we proposed a novel deep correlated holistic metric learning (DCHML) method to mitigate the discrepancy between sketch and 3D shape domains. The proposed DCHML trains two distinct deep neural networks (one for each domain) jointly, which learns two deep nonlinear transformations to map features from both domains into a new feature space. The proposed loss, including discriminative loss and correlation loss, aims to increase the discrimination of features within each domain as well as the correlation between different domains. In the new feature space, the discriminative loss minimizes the intra-class distance of the deep transformed features and maximizes the inter-class distance of the deep transformed features to a large margin within each domain, while the correlation loss focused on mitigating the distribution discrepancy across different domains. Different from existing deep metric learning methods only with loss at the output layer, our proposed DCHML is trained with loss at both hidden layer and output layer to further improve the performance by encouraging features in the hidden layer also with desired properties. Our proposed method is evaluated on three benchmarks, including 3D Shape Retrieval Contest 2013, 2014, and 2016 benchmarks, and the experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

  19. MetaSEEk: a content-based metasearch engine for images

    NASA Astrophysics Data System (ADS)

    Beigi, Mandis; Benitez, Ana B.; Chang, Shih-Fu

    1997-12-01

    Search engines are the most powerful resources for finding information on the rapidly expanding World Wide Web (WWW). Finding the desired search engines and learning how to use them, however, can be very time consuming. The integration of such search tools enables the users to access information across the world in a transparent and efficient manner. These systems are called meta-search engines. The recent emergence of visual information retrieval (VIR) search engines on the web is leading to the same efficiency problem. This paper describes and evaluates MetaSEEk, a content-based meta-search engine used for finding images on the Web based on their visual information. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. We compare MetaSEEk with a base line version of meta-search engine, which does not use the past performance of the different search engines in recommending target search engines for future queries.

  20. Information Retrieval Using UMLS-based Structured Queries

    PubMed Central

    Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

    2001-01-01

    During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.

  1. G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

    PubMed

    Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H

    2009-01-01

    Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.

  2. Research on presentation and query service of geo-spatial data based on ontology

    NASA Astrophysics Data System (ADS)

    Li, Hong-wei; Li, Qin-chao; Cai, Chang

    2008-10-01

    The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.

  3. Learning multiple relative attributes with humans in the loop.

    PubMed

    Qian, Buyue; Wang, Xiang; Cao, Nan; Jiang, Yu-Gang; Davidson, Ian

    2014-12-01

    Semantic attributes have been recognized as a more spontaneous manner to describe and annotate image content. It is widely accepted that image annotation using semantic attributes is a significant improvement to the traditional binary or multiclass annotation due to its naturally continuous and relative properties. Though useful, existing approaches rely on an abundant supervision and high-quality training data, which limit their applicability. Two standard methods to overcome small amounts of guidance and low-quality training data are transfer and active learning. In the context of relative attributes, this would entail learning multiple relative attributes simultaneously and actively querying a human for additional information. This paper addresses the two main limitations in existing work: 1) it actively adds humans to the learning loop so that minimal additional guidance can be given and 2) it learns multiple relative attributes simultaneously and thereby leverages dependence amongst them. In this paper, we formulate a joint active learning to rank framework with pairwise supervision to achieve these two aims, which also has other benefits such as the ability to be kernelized. The proposed framework optimizes over a set of ranking functions (measuring the strength of the presence of attributes) simultaneously and dependently on each other. The proposed pairwise queries take the form of which one of these two pictures is more natural? These queries can be easily answered by humans. Extensive empirical study on real image data sets shows that our proposed method, compared with several state-of-the-art methods, achieves superior retrieval performance while requires significantly less human inputs.

  4. A journey to Semantic Web query federation in the life sciences.

    PubMed

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.

  5. A journey to Semantic Web query federation in the life sciences

    PubMed Central

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-01-01

    Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394

  6. XGI: a graphical interface for XQuery creation.

    PubMed

    Li, Xiang; Gennari, John H; Brinkley, James F

    2007-10-11

    XML has become the default standard for data exchange among heterogeneous data sources, and in January 2007 XQuery (XML Query language) was recommended by the World Wide Web Consortium as the query language for XML. However, XQuery is a complex language that is difficult for non-programmers to learn. We have therefore developed XGI (XQuery Graphical Interface), a visual interface for graphically generating XQuery. In this paper we demonstrate the functionality of XGI through its application to a biomedical XML dataset. We describe the system architecture and the features of XGI in relation to several existing querying systems, we demonstrate the system's usability through a sample query construction, and we discuss a preliminary evaluation of XGI. Finally, we describe some limitations of the system, and our plans for future improvements.

  7. The business case for payer support of a community-based health information exchange: a humana pilot evaluating its effectiveness in cost control for plan members seeking emergency department care.

    PubMed

    Tzeel, Albert; Lawnicki, Victor; Pemble, Kim R

    2011-07-01

    As emergency department utilization continues to increase, health plans must limit their cost exposure, which may be driven by duplicate testing and a lack of medical history at the point of care. Based on previous studies, health information exchanges (HIEs) can potentially provide health plans with the ability to address this need. To assess the effectiveness of a community-based HIE in controlling plan costs arising from emergency department care for a health plan's members. Albert Tzeel. The study design was observational, with an eligible population (N = 1482) of fully insured plan members who sought emergency department care on at least 2 occasions during the study period, from December 2008 through March 2010. Cost and utilization data, obtained from member claims, were matched to a list of persons utilizing the emergency department where HIE querying could have occurred. Eligible members underwent propensity score matching to create a test group (N = 326) in which the HIE database was queried in all emergency department visits, and a control group (N = 325) in which the HIE database was not queried in any emergency department visit. Post-propensity matching analysis showed that the test group achieved an average savings of $29 per emergency department visit compared with the control group. Decreased utilization of imaging procedures and diagnostic tests drove this cost-savings. When clinicians utilize HIE in the care of patients who present to the emergency department, the costs borne by a health plan providing coverage for these patients decrease. Although many factors can play a role in this finding, it is likely that HIEs obviate unnecessary service utilization through provision of historical medical information regarding specific patients at the point of care.

  8. Active Learning with Statistical Models.

    DTIC Science & Technology

    1995-01-01

    Active Learning with Statistical Models ASC-9217041, NSF CDA-9309300 6. AUTHOR(S) David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan 7. PERFORMING...TERMS 15. NUMBER OF PAGES Al, MIT, Artificial Intelligence, active learning , queries, locally weighted 6 regression, LOESS, mixtures of gaussians...COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1522 January 9. 1995 C.B.C.L. Paper No. 110 Active Learning with

  9. Active Learning Using Arbitrary Binary Valued Queries

    DTIC Science & Technology

    1990-10-01

    active learning in the sense that the learner has complete choice in the information received. Specifically, we allow the learner to ask arbitrary yes...no questions. We consider both active learning under a fixed distribution and distribution-free active learning . In the case of active learning , the...a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We

  10. Prediction of Carbohydrate Binding Sites on Protein Surfaces with 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Tsai, Keng-Chang; Jian, Jhih-Wei; Yang, Ei-Wen; Hsu, Po-Chiang; Peng, Hung-Pin; Chen, Ching-Tai; Chen, Jun-Bo; Chang, Jeng-Yih; Hsu, Wen-Lian; Yang, An-Suei

    2012-01-01

    Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date. PMID:22848404

  11. Toward accelerating landslide mapping with interactive machine learning techniques

    NASA Astrophysics Data System (ADS)

    Stumpf, André; Lachiche, Nicolas; Malet, Jean-Philippe; Kerle, Norman; Puissant, Anne

    2013-04-01

    Despite important advances in the development of more automated methods for landslide mapping from optical remote sensing images, the elaboration of inventory maps after major triggering events still remains a tedious task. Image classification with expert defined rules typically still requires significant manual labour for the elaboration and adaption of rule sets for each particular case. Machine learning algorithm, on the contrary, have the ability to learn and identify complex image patterns from labelled examples but may require relatively large amounts of training data. In order to reduce the amount of required training data active learning has evolved as key concept to guide the sampling for applications such as document classification, genetics and remote sensing. The general underlying idea of most active learning approaches is to initialize a machine learning model with a small training set, and to subsequently exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labelled by the user and added in the training set. With relatively few queries and labelled samples, an active learning strategy should ideally yield at least the same accuracy than an equivalent classifier trained with many randomly selected samples. Our study was dedicated to the development of an active learning approach for landslide mapping from VHR remote sensing images with special consideration of the spatial distribution of the samples. The developed approach is a region-based query heuristic that enables to guide the user attention towards few compact spatial batches rather than distributed points resulting in time savings of 50% and more compared to standard active learning techniques. The approach was tested with multi-temporal and multi-sensor satellite images capturing recent large scale triggering events in Brazil and China and demonstrated balanced user's and producer's accuracies between 74% and 80%. The assessment also included an experimental evaluation of the uncertainties of manual mappings from multiple experts and demonstrated strong relationships between the uncertainty of the experts and the machine learning model.

  12. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  13. A Practical Ontology Query Expansion Algorithm for Semantic-Aware Learning Objects Retrieval

    ERIC Educational Resources Information Center

    Lee, Ming-Che; Tsai, Kun Hua; Wang, Tzone I.

    2008-01-01

    Following the rapid development of Internet, particularly web page interaction technology, distant e-learning has become increasingly realistic and popular. To solve the problems associated with sharing and reusing teaching materials in different e-learning systems, several standard formats, including SCORM, IMS, LOM, and AICC, etc., recently have…

  14. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  15. Proceedings of the Seventh International Symposium on Methodologies for Intelligent Systems (Poster Session)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harber, K.S.

    1993-05-01

    This report contains the following papers: Implications in vivid logic; a self-learning bayesian expert system; a natural language generation system for a heterogeneous distributed database system; competence-switching'' managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries.more » These papers have been indexed separately.« less

  16. Proceedings of the Seventh International Symposium on Methodologies for Intelligent Systems (Poster Session)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harber, K.S.

    1993-05-01

    This report contains the following papers: Implications in vivid logic; a self-learning Bayesian Expert System; a natural language generation system for a heterogeneous distributed database system; ``competence-switching`` managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries.more » These papers have been indexed separately.« less

  17. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

  18. Oleanolic Acid Ameliorates Aβ25-35 Injection-induced Spatial Learning and Memory Deficit in Alzheimer's Disease Model Rats.

    PubMed

    Wang, Kai; Sun, Weiming; Zhang, Linlin; Guo, Wei; Xu, Jiachun; Liu, Shuang; Zhou, Zhen; Zhang, Yulian

    2018-05-24

    Abnormal amyloid β (Aβ) accumulation and deposition in hippocampus is an essential process in Alzheimer's disease (AD). To investigate whether Oleanolic acid (OA) could improve learning and memory deficit and its possible mechanism. Forty-five SD rats were randomly divided into sham operation group, model group, and OA group. AD models by injection of Aβ25-35 were built. Morris water maze (MWM) was applied to investigate learning and memory, transmission electron microscope (TEM) to observe the ultrastructure of synapse, western blot to the key targets of synapse, electrophysiology for long-term potentiation (LTP), and Ca2+ concentration in synapse was also measured. The latency time in model group was significantly longer than that in sham operation group (P=0.0001<0.05); while it was significantly shorter in the OA group than that in model group (P=0.0001<0.05); compared with model group, the times of cross-platform in OA group significantly increased (P = 0.0001 <0.05). TEM results showed OA couldalleviate neuron damage and synapses changes induced by Aβ25-35. The expression of CaMKII, PKC, NMDAR2B, BDNF, TrkB, and CREB protein were significantly improved by OA; the concentration of Ca2+ were significantly lower and the slope and amplitude of f-EPSP increased in OA group. OA could ameliorate Aβ-induced spatial learning and memory loss of AD rats, and the mechanism might be involved with maintaining synaptic integrity to restore synaptic plasticity and increasing the NMDAR2B protein, CaMKII and PKC protein in postsynaptic density (PSD), reducing synaptic Ca2+ concentration, enhancing LTP by up-regulating BDNF, TrkB, CREB proteins. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  19. Ontology-Driven Provenance Management in eScience: An Application in Parasite Research

    NASA Astrophysics Data System (ADS)

    Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.

    Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.

  20. Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

    DTIC Science & Technology

    2006-06-01

    SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML

  1. Automation and integration of components for generalized semantic markup of electronic medical texts.

    PubMed

    Dugan, J M; Berrios, D C; Liu, X; Kim, D K; Kaizer, H; Fagan, L M

    1999-01-01

    Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.

  2. A memory learning framework for effective image retrieval.

    PubMed

    Han, Junwei; Ngan, King N; Li, Mingjing; Zhang, Hong-Jiang

    2005-04-01

    Most current content-based image retrieval systems are still incapable of providing users with their desired results. The major difficulty lies in the gap between low-level image features and high-level image semantics. To address the problem, this study reports a framework for effective image retrieval by employing a novel idea of memory learning. It forms a knowledge memory model to store the semantic information by simply accumulating user-provided interactions. A learning strategy is then applied to predict the semantic relationships among images according to the memorized knowledge. Image queries are finally performed based on a seamless combination of low-level features and learned semantics. One important advantage of our framework is its ability to efficiently annotate images and also propagate the keyword annotation from the labeled images to unlabeled images. The presented algorithm has been integrated into a practical image retrieval system. Experiments on a collection of 10,000 general-purpose images demonstrate the effectiveness of the proposed framework.

  3. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  4. a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    NASA Astrophysics Data System (ADS)

    Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

    2017-10-01

    Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  5. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  6. The Research on Automatic Construction of Domain Model Based on Deep Web Query Interfaces

    NASA Astrophysics Data System (ADS)

    JianPing, Gu

    The integration of services is transparent, meaning that users no longer face the millions of Web services, do not care about the required data stored, but do not need to learn how to obtain these data. In this paper, we analyze the uncertainty of schema matching, and then propose a series of similarity measures. To reduce the cost of execution, we propose the type-based optimization method and schema matching pruning method of numeric data. Based on above analysis, we propose the uncertain schema matching method. The experiments prove the effectiveness and efficiency of our method.

  7. Arthroplasty Utilization in the United States is Predicted by Age-Specific Population Groups.

    PubMed

    Bashinskaya, Bronislava; Zimmerman, Ryan M; Walcott, Brian P; Antoci, Valentin

    2012-01-01

    Osteoarthritis is a common indication for hip and knee arthroplasty. An accurate assessment of current trends in healthcare utilization as they relate to arthroplasty may predict the needs of a growing elderly population in the United States. First, incidence data was queried from the United States Nationwide Inpatient Sample from 1993 to 2009. Patients undergoing total knee and hip arthroplasty were identified. Then, the United States Census Bureau was queried for population data from the same study period as well as to provide future projections. Arthroplasty followed linear regression models with the population group >64 years in both hip and knee groups. Projections for procedure incidence in the year 2050 based on these models were calculated to be 1,859,553 cases (hip) and 4,174,554 cases (knee). The need for hip and knee arthroplasty is expected to grow significantly in the upcoming years, given population growth predictions.

  8. Survey on Uses of Distance Learning in the U.S.

    ERIC Educational Resources Information Center

    Downing, Diane E.

    A December 1983 survey queried the chief state school officers of the 50 states on the extent to which distance learning techniques are used in public education in their states. Respondents were asked to focus on interactive forms of distance learning, such as audio and video teleconferencing. A total of 28 states (56%) responded, with the…

  9. Multiple Perspectives: Whither Scholarship in the Work of Enhancing the Quality of Teaching and Learning?

    ERIC Educational Resources Information Center

    Dangel, Julie Rainer

    2011-01-01

    Whither Scholarship in the Work of Enhancing the Quality of Teaching and Learning? This is an important query because it acknowledges, embraces, questions, and challenges the role of scholarship in enhancing teaching and learning. Interestingly, these four verbs help the author categorize her perspectives on the use of scholarship. Drawing from…

  10. Landmark Image Retrieval by Jointing Feature Refinement and Multimodal Classifier Learning.

    PubMed

    Zhang, Xiaoming; Wang, Senzhang; Li, Zhoujun; Ma, Shuai; Xiaoming Zhang; Senzhang Wang; Zhoujun Li; Shuai Ma; Ma, Shuai; Zhang, Xiaoming; Wang, Senzhang; Li, Zhoujun

    2018-06-01

    Landmark retrieval is to return a set of images with their landmarks similar to those of the query images. Existing studies on landmark retrieval focus on exploiting the geometries of landmarks for visual similarity matches. However, the visual content of social images is of large diversity in many landmarks, and also some images share common patterns over different landmarks. On the other side, it has been observed that social images usually contain multimodal contents, i.e., visual content and text tags, and each landmark has the unique characteristic of both visual content and text content. Therefore, the approaches based on similarity matching may not be effective in this environment. In this paper, we investigate whether the geographical correlation among the visual content and the text content could be exploited for landmark retrieval. In particular, we propose an effective multimodal landmark classification paradigm to leverage the multimodal contents of social image for landmark retrieval, which integrates feature refinement and landmark classifier with multimodal contents by a joint model. The geo-tagged images are automatically labeled for classifier learning. Visual features are refined based on low rank matrix recovery, and multimodal classification combined with group sparse is learned from the automatically labeled images. Finally, candidate images are ranked by combining classification result and semantic consistence measuring between the visual content and text content. Experiments on real-world datasets demonstrate the superiority of the proposed approach as compared to existing methods.

  11. Integrated teaching program using case-based learning

    PubMed Central

    Bhardwaj, Pankaj; Bhardwaj, Nikha; Mahdi, Farzana; Srivastava, J P; Gupta, Uma

    2015-01-01

    Background: At present, in a medical school, students are taught in different departments, subject-wise, without integration to interrelate or unify subjects and these results in compartmentalization of medical education, with no stress on case-based learning. Therefore, an effort was made to develop and adopt integrated teaching in order to have a better contextual knowledge among students. Methodology and Implementation: After the faculty orientation training, four “topic committees” with faculty members from different departments were constituted which decided and agreed on the content material to be taught, different methodologies to be used, along with the logical sequencing of the same for the purpose of implementation. Different teaching methodologies used, during the program, were didactic lectures, case stimulated sessions, clinical visits, laboratory work, and small group student's seminar. Results: After the implementation of program, the comparison between two batches as well as between topics taught with integrated learning program versus traditional method showed that students performed better in the topics, taught with integrated approach. Students rated “clinical visits” as very good methodology, followed by “case stimulated interactive sessions.” Students believed that they felt more actively involved, and their queries are better addressed with such interactive sessions. Conclusion: There is a very good perception of students toward integrated teaching. Students performed better if they are taught using this technique. Although majority of faculty found integrated teaching, as useful method of teaching, nevertheless extra work burden and interdepartmental coordination remained a challenging task. PMID:26380204

  12. Characterizing Listener Engagement with Popular Songs Using Large-Scale Music Discovery Data

    PubMed Central

    Kaneshiro, Blair; Ruan, Feng; Baker, Casey W.; Berger, Jonathan

    2017-01-01

    Music discovery in everyday situations has been facilitated in recent years by audio content recognition services such as Shazam. The widespread use of such services has produced a wealth of user data, specifying where and when a global audience takes action to learn more about music playing around them. Here, we analyze a large collection of Shazam queries of popular songs to study the relationship between the timing of queries and corresponding musical content. Our results reveal that the distribution of queries varies over the course of a song, and that salient musical events drive an increase in queries during a song. Furthermore, we find that the distribution of queries at the time of a song's release differs from the distribution following a song's peak and subsequent decline in popularity, possibly reflecting an evolution of user intent over the “life cycle” of a song. Finally, we derive insights into the data size needed to achieve consistent query distributions for individual songs. The combined findings of this study suggest that music discovery behavior, and other facets of the human experience of music, can be studied quantitatively using large-scale industrial data. PMID:28386241

  13. Characterizing Listener Engagement with Popular Songs Using Large-Scale Music Discovery Data.

    PubMed

    Kaneshiro, Blair; Ruan, Feng; Baker, Casey W; Berger, Jonathan

    2017-01-01

    Music discovery in everyday situations has been facilitated in recent years by audio content recognition services such as Shazam. The widespread use of such services has produced a wealth of user data, specifying where and when a global audience takes action to learn more about music playing around them. Here, we analyze a large collection of Shazam queries of popular songs to study the relationship between the timing of queries and corresponding musical content. Our results reveal that the distribution of queries varies over the course of a song, and that salient musical events drive an increase in queries during a song. Furthermore, we find that the distribution of queries at the time of a song's release differs from the distribution following a song's peak and subsequent decline in popularity, possibly reflecting an evolution of user intent over the "life cycle" of a song. Finally, we derive insights into the data size needed to achieve consistent query distributions for individual songs. The combined findings of this study suggest that music discovery behavior, and other facets of the human experience of music, can be studied quantitatively using large-scale industrial data.

  14. Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment

    PubMed Central

    Anwar, Nadia; Hunt, Ela

    2009-01-01

    Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482

  15. Active Exploration of Large 3D Model Repositories.

    PubMed

    Gao, Lin; Cao, Yan-Pei; Lai, Yu-Kun; Huang, Hao-Zhi; Kobbelt, Leif; Hu, Shi-Min

    2015-12-01

    With broader availability of large-scale 3D model repositories, the need for efficient and effective exploration becomes more and more urgent. Existing model retrieval techniques do not scale well with the size of the database since often a large number of very similar objects are returned for a query, and the possibilities to refine the search are quite limited. We propose an interactive approach where the user feeds an active learning procedure by labeling either entire models or parts of them as "like" or "dislike" such that the system can automatically update an active set of recommended models. To provide an intuitive user interface, candidate models are presented based on their estimated relevance for the current query. From the methodological point of view, our main contribution is to exploit not only the similarity between a query and the database models but also the similarities among the database models themselves. We achieve this by an offline pre-processing stage, where global and local shape descriptors are computed for each model and a sparse distance metric is derived that can be evaluated efficiently even for very large databases. We demonstrate the effectiveness of our method by interactively exploring a repository containing over 100 K models.

  16. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

    PubMed

    Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

    2018-03-01

    In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.

  17. SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

    PubMed

    Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

    2014-08-15

    Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

  18. An automated algorithm for determining photometric redshifts of quasars

    NASA Astrophysics Data System (ADS)

    Wang, Dan; Zhang, Yanxia; Zhao, Yongheng

    2010-07-01

    We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where the result of new instance query is predicted based on the closest training samples. The regressor do not use any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern, 59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If only using one kind of colors as input, the model colors achieve the best performance. However, when using two kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1) shows its superiority compared to KNN (k ≠ 1) for the given sample.

  19. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

  20. An evolution-based DNA-binding residue predictor using a dynamic query-driven learning scheme.

    PubMed

    Chai, H; Zhang, J; Yang, G; Ma, Z

    2016-11-15

    DNA-binding proteins play a pivotal role in various biological activities. Identification of DNA-binding residues (DBRs) is of great importance for understanding the mechanism of gene regulations and chromatin remodeling. Most traditional computational methods usually construct their predictors on static non-redundant datasets. They excluded many homologous DNA-binding proteins so as to guarantee the generalization capability of their models. However, those ignored samples may potentially provide useful clues when studying protein-DNA interactions, which have not obtained enough attention. In view of this, we propose a novel method, namely DQPred-DBR, to fill the gap of DBR predictions. First, a large-scale extensible sample pool was compiled. Second, evolution-based features in the form of a relative position specific score matrix and covariant evolutionary conservation descriptors were used to encode the feature space. Third, a dynamic query-driven learning scheme was designed to make more use of proteins with known structure and functions. In comparison with a traditional static model, the introduction of dynamic models could obviously improve the prediction performance. Experimental results from the benchmark and independent datasets proved that our DQPred-DBR had promising generalization capability. It was capable of producing decent predictions and outperforms many state-of-the-art methods. For the convenience of academic use, our proposed method was also implemented as a web server at .

  1. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed Central

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database. PMID:8653451

  2. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database.

  3. A Re-Unification of Two Competing Models for Document Retrieval.

    ERIC Educational Resources Information Center

    Bodoff, David

    1999-01-01

    Examines query-oriented versus document-oriented information retrieval and feedback learning. Highlights include a reunification of the two approaches for probabilistic document retrieval and for vector space model (VSM) retrieval; learning in VSM and in probabilistic models; multi-dimensional scaling; and ongoing field studies. (LRW)

  4. Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

    PubMed Central

    Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

    2015-01-01

    We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572

  5. An SSVEP-Based Brain-Computer Interface for Text Spelling With Adaptive Queries That Maximize Information Gain Rates.

    PubMed

    Akce, Abdullah; Norton, James J S; Bretl, Timothy

    2015-09-01

    This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.

  6. The Influence of Student Learning Characteristics on Purchase of Paper Book and eBook for University Study and Personal Interest

    ERIC Educational Resources Information Center

    Johnson, Genevieve Marie

    2016-01-01

    First-year university students (n = 199) completed an online questionnaire that queried their purchase of paper books and eBooks for university study and personal interest. The questionnaire also required students to rate their learning characteristics including reading strategies, study self-regulation, learning control beliefs and achievement…

  7. Diverse expected gradient active learning for relative attributes.

    PubMed

    You, Xinge; Wang, Ruxin; Tao, Dacheng

    2014-07-01

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called diverse expected gradient active learning. This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multiclass distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

  8. Diverse Expected Gradient Active Learning for Relative Attributes.

    PubMed

    You, Xinge; Wang, Ruxin; Tao, Dacheng

    2014-06-02

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called Diverse Expected Gradient Active Learning (DEGAL). This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multi-class distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

  9. An overview of the EOSDIS V0 information management system: Lessons learned from the implementation of a distributed data system

    NASA Technical Reports Server (NTRS)

    Ryan, Patrick M.

    1994-01-01

    The EOSDIS Version 0 system, released in July, 1994, is a working prototype of a distributed data system. One of the purposes of the V0 project is to take several existing data systems and coordinate them into one system while maintaining the independent nature of the original systems. The project is a learning experience and the lessons are being passed on to the architects of the system which will distribute the data received from the planned EOS satellites. In the V0 system, the data resides on heterogeneous systems across the globe but users are presented with a single, integrated interface. This interface allows users to query the participating data centers based on a wide set of criteria. Because this system is a prototype, we used many novel approaches in trying to connect a diverse group of users with the huge amount of available data. Some of these methods worked and others did not. Now that V0 has been released to the public, we can look back at the design and implementation of the system and also consider some possible future directions for the next generation of EOSDIS.

  10. Entrez Neuron RDFa: a pragmatic semantic web application for data integration in neuroscience research.

    PubMed

    Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

    2009-01-01

    The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present "Entrez Neuron", a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the 'HCLS knowledgebase' developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrate how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup.

  11. Learning to rank using user clicks and visual features for image retrieval.

    PubMed

    Yu, Jun; Tao, Dacheng; Wang, Meng; Rui, Yong

    2015-04-01

    The inconsistency between textual features and visual contents can cause poor image search results. To solve this problem, click features, which are more reliable than textual information in justifying the relevance between a query and clicked images, are adopted in image ranking model. However, the existing ranking model cannot integrate visual features, which are efficient in refining the click-based search results. In this paper, we propose a novel ranking model based on the learning to rank framework. Visual features and click features are simultaneously utilized to obtain the ranking model. Specifically, the proposed approach is based on large margin structured output learning and the visual consistency is integrated with the click features through a hypergraph regularizer term. In accordance with the fast alternating linearization method, we design a novel algorithm to optimize the objective function. This algorithm alternately minimizes two different approximations of the original objective function by keeping one function unchanged and linearizing the other. We conduct experiments on a large-scale dataset collected from the Microsoft Bing image search engine, and the results demonstrate that the proposed learning to rank models based on visual features and user clicks outperforms state-of-the-art algorithms.

  12. Classification of Automated Search Traffic

    NASA Astrophysics Data System (ADS)

    Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

    As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

  13. FINDbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide.

    PubMed

    van Baal, Sjozef; Kaimakis, Polynikis; Phommarinh, Manyphong; Koumbi, Daphne; Cuppens, Harry; Riccardino, Francesca; Macek, Milan; Scriver, Charles R; Patrinos, George P

    2007-01-01

    Frequency of INherited Disorders database (FINDbase) (http://www.findbase.org) is a relational database, derived from the ETHNOS software, recording frequencies of causative mutations leading to inherited disorders worldwide. Database records include the population and ethnic group, the disorder name and the related gene, accompanied by links to any corresponding locus-specific mutation database, to the respective Online Mendelian Inheritance in Man entries and the mutation together with its frequency in that population. The initial information is derived from the published literature, locus-specific databases and genetic disease consortia. FINDbase offers a user-friendly query interface, providing instant access to the list and frequencies of the different mutations. Query outputs can be either in a table or graphical format, accompanied by reference(s) on the data source. Registered users from three different groups, namely administrator, national coordinator and curator, are responsible for database curation and/or data entry/correction online via a password-protected interface. Databaseaccess is free of charge and there are no registration requirements for data querying. FINDbase provides a simple, web-based system for population-based mutation data collection and retrieval and can serve not only as a valuable online tool for molecular genetic testing of inherited disorders but also as a non-profit model for sustainable database funding, in the form of a 'database-journal'.

  14. Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques

    DTIC Science & Technology

    2007-08-01

    In this domain, queries typically show a deeply nested structure, which makes the semantic parsing task rather challenging , e.g.: What states border...only 80% of the GEOQUERY queries are semantically tractable, which shows that GEOQUERY is indeed a more challenging domain than ATIS. Note that none...a particularly challenging task, because of the inherent ambiguity of natural languages on both sides. It has inspired a large body of research. In

  15. Supervised learning of tools for content-based search of image databases

    NASA Astrophysics Data System (ADS)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  16. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  17. Comment on ‘Are some people suffering as a result of increasing mass exposure of the public to ultrasound in air?’

    PubMed Central

    2017-01-01

    A number of queries regarding the paper ‘Are some people suffering as a result of increasing mass exposure of the public to ultrasound in air?’ (Leighton 2016 Proc. R. Soc. A 472, 20150624 (doi:10.1098/rspa.2015.0624)) have been sent in from readers, almost all based around some or all of a small set of questions. These can be grouped into issues of engineering, human factors and timeliness. Those issues (represented by the most typical wording used in queries) and my responses are summarized in this comment. PMID:28413349

  18. Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

    NASA Technical Reports Server (NTRS)

    Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

    1986-01-01

    The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.

  19. Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

    ERIC Educational Resources Information Center

    Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

    2003-01-01

    Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…

  20. A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2016-11-01

    The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.

  1. RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks

    PubMed Central

    Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng

    2017-01-01

    Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197

  2. Population-specific documentation of pharmacogenomic markers and their allelic frequencies in FINDbase.

    PubMed

    Georgitsi, Marianthi; Viennas, Emmanouil; Gkantouna, Vassiliki; Christodoulopoulou, Elena; Zagoriti, Zoi; Tafrali, Christina; Ntellos, Fotios; Giannakopoulou, Olga; Boulakou, Athanassia; Vlahopoulou, Panagiota; Kyriacou, Eva; Tsaknakis, John; Tsakalidis, Athanassios; Poulas, Konstantinos; Tzimas, Giannis; Patrinos, George P

    2011-01-01

    Population and ethnic group-specific allele frequencies of pharmacogenomic markers are poorly documented and not systematically collected in structured data repositories. We developed the Frequency of Inherited Disorders Pharmacogenomics database (FINDbase-PGx), a separate module of the FINDbase, aiming to systematically document pharmacogenomic allele frequencies in various populations and ethnic groups worldwide. We critically collected and curated 214 scientific articles reporting pharmacogenomic markers allele frequencies in various populations and ethnic groups worldwide. Subsequently, in order to host the curated data, support data visualization and data mining, we developed a website application, utilizing Microsoft™ PivotViewer software. Curated allelic frequency data pertaining to 144 pharmacogenomic markers across 14 genes, representing approximately 87,000 individuals from 150 populations worldwide, are currently included in FINDbase-PGx. A user-friendly query interface allows for easy data querying, based on numerous content criteria, such as population, ethnic group, geographical region, gene, drug and rare allele frequency. FINDbase-PGx is a comprehensive database, which, unlike other pharmacogenomic knowledgebases, fulfills the much needed requirement to systematically document pharmacogenomic allelic frequencies in various populations and ethnic groups worldwide.

  3. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    PubMed

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  4. The Business Case for Payer Support of a Community-Based Health Information Exchange: A Humana Pilot Evaluating Its Effectiveness in Cost Control for Plan Members Seeking Emergency Department Care

    PubMed Central

    Tzeel, Albert; Lawnicki, Victor; Pemble, Kim R.

    2011-01-01

    Background As emergency department utilization continues to increase, health plans must limit their cost exposure, which may be driven by duplicate testing and a lack of medical history at the point of care. Based on previous studies, health information exchanges (HIEs) can potentially provide health plans with the ability to address this need. Objective To assess the effectiveness of a community-based HIE in controlling plan costs arising from emergency department care for a health plan's members. Albert Tzeel Methods The study design was observational, with an eligible population (N = 1482) of fully insured plan members who sought emergency department care on at least 2 occasions during the study period, from December 2008 through March 2010. Cost and utilization data, obtained from member claims, were matched to a list of persons utilizing the emergency department where HIE querying could have occurred. Eligible members underwent propensity score matching to create a test group (N = 326) in which the HIE database was queried in all emergency department visits, and a control group (N = 325) in which the HIE database was not queried in any emergency department visit. Results Post–propensity matching analysis showed that the test group achieved an average savings of $29 per emergency department visit compared with the control group. Decreased utilization of imaging procedures and diagnostic tests drove this cost-savings. Conclusions When clinicians utilize HIE in the care of patients who present to the emergency department, the costs borne by a health plan providing coverage for these patients decrease. Although many factors can play a role in this finding, it is likely that HIEs obviate unnecessary service utilization through provision of historical medical information regarding specific patients at the point of care. PMID:25126351

  5. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  6. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  7. Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus

    ERIC Educational Resources Information Center

    Lyall-Wilson, Jennifer Rae

    2013-01-01

    The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…

  8. Automatic classification and detection of clinically relevant images for diabetic retinopathy

    NASA Astrophysics Data System (ADS)

    Xu, Xinyu; Li, Baoxin

    2008-03-01

    We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.

  9. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  10. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/.

  11. Development of a web-based video management and application processing system

    NASA Astrophysics Data System (ADS)

    Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting

    2001-07-01

    How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.

  12. Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.

  13. Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data

    PubMed Central

    Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick

    2016-01-01

    This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859

  14. Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data.

    PubMed

    Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick

    2016-01-01

    This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.

  15. US consumer interest in non-cigarette tobacco products spikes around the 2009 federal tobacco tax increase.

    PubMed

    Jo, Catherine L; Ayers, John W; Althouse, Benjamin M; Emery, Sherry; Huang, Jidong; Ribisl, Kurt M

    2015-07-01

    This quasi-experimental longitudinal study monitored aggregate Google search queries as a proxy for consumer interest in non-cigarette tobacco products (NTP) around the time of the 2009 US federal tobacco tax increase. Query trends for searches mentioning common NTP were downloaded from Google's public archives. The mean relative increase was estimated by comparing the observed with expected query volume for the 16 weeks around the tax. After the tax was announced, queries spiked for chewing tobacco, cigarillos, electronic cigarettes ('e-cigarettes'), roll-your-own (RYO) tobacco, snuff, and snus. E-cigarette queries were 75% (95% CI 70% to 80%) higher than expected 8 weeks before and after the tax, followed by RYO 59% (95% CI 53% to 65%), snus 34% (95% CI 31% to 37%), chewing tobacco 17% (95% CI 15% to 20%), cigarillos 14% (95% CI 11% to 17%), and snuff 13% (95% CI 10% to 14%). Unique queries increasing the most were 'ryo cigarettes' 427% (95% CI 308% to 534%), 'ryo tobacco' 348% (95% CI 300% to 391%), 'best electronic cigarette' 221% (95% CI 185% to 257%), and 'e-cigarette' 205% (95% CI 163% to 245%). The 2009 tobacco tax increase triggered large increases in consumer interest for some NTP, particularly e-cigarettes and RYO tobacco. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  16. Reconstruction based finger-knuckle-print verification with score level adaptive binary fusion.

    PubMed

    Gao, Guangwei; Zhang, Lei; Yang, Jian; Zhang, Lin; Zhang, David

    2013-12-01

    Recently, a new biometrics identifier, namely finger knuckle print (FKP), has been proposed for personal authentication with very interesting results. One of the advantages of FKP verification lies in its user friendliness in data collection. However, the user flexibility in positioning fingers also leads to a certain degree of pose variations in the collected query FKP images. The widely used Gabor filtering based competitive coding scheme is sensitive to such variations, resulting in many false rejections. We propose to alleviate this problem by reconstructing the query sample with a dictionary learned from the template samples in the gallery set. The reconstructed FKP image can reduce much the enlarged matching distance caused by finger pose variations; however, both the intra-class and inter-class distances will be reduced. We then propose a score level adaptive binary fusion rule to adaptively fuse the matching distances before and after reconstruction, aiming to reduce the false rejections without increasing much the false acceptances. Experimental results on the benchmark PolyU FKP database show that the proposed method significantly improves the FKP verification accuracy.

  17. A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms

    PubMed Central

    Roberts, Kirk; Patra, Braja Gopal

    2017-01-01

    This paper presents a method for converting natural language questions about structured data in the electronic health record (EHR) into logical forms. The logical forms can then subsequently be converted to EHR-dependent structured queries. The natural language processing task, known as semantic parsing, has the potential to convert questions to logical forms with extremely high precision, resulting in a system that is usable and trusted by clinicians for real-time use in clinical settings. We propose a hybrid semantic parsing method, combining rule-based methods with a machine learning-based classifier. The overall semantic parsing precision on a set of 212 questions is 95.6%. The parser’s rules furthermore allow it to “know what it does not know”, enabling the system to indicate when unknown terms prevent it from understanding the question’s full logical structure. When combined with a module for converting a logical form into an EHR-dependent query, this high-precision approach allows for a question answering system to provide a user with a single, verifiably correct answer. PMID:29854217

  18. End-User Use of Data Base Query Language: Pros and Cons.

    ERIC Educational Resources Information Center

    Nicholes, Walter

    1988-01-01

    Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)

  19. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

  20. An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nicholas; Sellis, Timos

    1994-01-01

    We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.

  1. An Application Programming Interface for Synthetic Snowflake Particle Structure and Scattering Data

    NASA Technical Reports Server (NTRS)

    Lammers, Matthew; Kuo, Kwo-Sen

    2017-01-01

    The work by Kuo and colleagues on growing synthetic snowflakes and calculating their single-scattering properties has demonstrated great potential to improve the retrievals of snowfall. To grant colleagues flexible and targeted access to their large collection of sizes and shapes at fifteen (15) microwave frequencies, we have developed a web-based Application Programming Interface (API) integrated with NASA Goddard's Precipitation Processing System (PPS) Group. It is our hope that the API will enable convenient programmatic utilization of the database. To help users better understand the API's capabilities, we have developed an interactive web interface called the OpenSSP API Query Builder, which implements an intuitive system of mechanisms for selecting shapes, sizes, and frequencies to generate queries, with which the API can then extract and return data from the database. The Query Builder also allows for the specification of normalized particle size distributions by setting pertinent parameters, with which the API can also return mean geometric and scattering properties for each size bin. Additionally, the Query Builder interface enables downloading of raw scattering and particle structure data packages. This presentation will describe some of the challenges and successes associated with developing such an API. Examples of its usage will be shown both through downloading output and pulling it into a spreadsheet, as well as querying the API programmatically and working with the output in code.

  2. Automation and integration of components for generalized semantic markup of electronic medical texts.

    PubMed Central

    Dugan, J. M.; Berrios, D. C.; Liu, X.; Kim, D. K.; Kaizer, H.; Fagan, L. M.

    1999-01-01

    Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models. Images Figure 1 Figure 2 Figure 4 Figure 5 PMID:10566457

  3. Relativistic quantum private database queries

    NASA Astrophysics Data System (ADS)

    Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

    2015-04-01

    Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.

  4. iSMART: Ontology-based Semantic Query of CDA Documents

    PubMed Central

    Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue

    2009-01-01

    The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883

  5. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.

  6. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457

  7. How popular is waterpipe tobacco smoking? Findings from internet search queries.

    PubMed

    Salloum, Ramzi G; Osman, Amira; Maziak, Wasim; Thrasher, James F

    2015-09-01

    Waterpipe tobacco smoking (WTS), a traditional tobacco consumption practice in the Middle East, is gaining popularity worldwide. Estimates of population-level interest in WTS over time are not documented. We assessed the popularity of WTS using World Wide Web search query results across four English-speaking countries. We analysed trends in Google search queries related to WTS, comparing these trends with those for electronic cigarettes between 2004 and 2013 in Australia, Canada, the UK and the USA. Weekly search volumes were reported as percentages relative to the week with the highest volume of searches. Web-based searches for WTS have increased steadily since 2004 in all four countries. Search volume for WTS was higher than for e-cigarettes in three of the four nations, with the highest volume in the USA. Online searches were primarily targeted at WTS products for home use, followed by searches for WTS cafés/lounges. Online demand for information on WTS-related products and venues is large and increasing. Given the rise in WTS popularity, increasing evidence of exposure-related harms, and relatively lax government regulation, WTS is a serious public health concern and could reach epidemic levels in Western societies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. A Cybernetic Design Methodology for 'Intelligent' Online Learning Support

    NASA Astrophysics Data System (ADS)

    Quinton, Stephen R.

    The World Wide Web (WWW) provides learners and knowledge workers convenient access to vast stores of information, so much that present methods for refinement of a query or search result are inadequate - there is far too much potentially useful material. The problem often encountered is that users usually do not recognise what may be useful until they have progressed some way through the discovery, learning, and knowledge acquisition process. Additional support is needed to structure and identify potentially relevant information, and to provide constructive feedback. In short, support for learning is needed. The learning envisioned here is not simply the capacity to recall facts or to recognise objects. The focus is on learning that results in the construction of knowledge. Although most online learning platforms are efficient at delivering information, most do not provide tools that support learning as envisaged in this chapter. It is conceivable that Web-based learning environments can incorporate software systems that assist learners to form new associations between concepts and synthesise information to create new knowledge. This chapter details the rationale and theory behind a research study that aims to evolve Web-based learning environments into 'intelligent thinking' systems that respond to natural language human input. Rather than functioning simply as a means of delivering information, it is argued that online learning solutions will 1 day interact directly with students to support their conceptual thinking and cognitive development.

  9. Practical quantum private query of blocks based on unbalanced-state Bennett-Brassard-1984 quantum-key-distribution protocol

    NASA Astrophysics Data System (ADS)

    Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin

    2014-12-01

    Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.

  10. Practical quantum private query of blocks based on unbalanced-state Bennett-Brassard-1984 quantum-key-distribution protocol

    PubMed Central

    Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin

    2014-01-01

    Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary. PMID:25518810

  11. Secure and Privacy-Preserving Body Sensor Data Collection and Query Scheme.

    PubMed

    Zhu, Hui; Gao, Lijuan; Li, Hui

    2016-02-01

    With the development of body sensor networks and the pervasiveness of smart phones, different types of personal data can be collected in real time by body sensors, and the potential value of massive personal data has attracted considerable interest recently. However, the privacy issues of sensitive personal data are still challenging today. Aiming at these challenges, in this paper, we focus on the threats from telemetry interface and present a secure and privacy-preserving body sensor data collection and query scheme, named SPCQ, for outsourced computing. In the proposed SPCQ scheme, users' personal information is collected by body sensors in different types and converted into multi-dimension data, and each dimension is converted into the form of a number and uploaded to the cloud server, which provides a secure, efficient and accurate data query service, while the privacy of sensitive personal information and users' query data is guaranteed. Specifically, based on an improved homomorphic encryption technology over composite order group, we propose a special weighted Euclidean distance contrast algorithm (WEDC) for multi-dimension vectors over encrypted data. With the SPCQ scheme, the confidentiality of sensitive personal data, the privacy of data users' queries and accurate query service can be achieved in the cloud server. Detailed analysis shows that SPCQ can resist various security threats from telemetry interface. In addition, we also implement SPCQ on an embedded device, smart phone and laptop with a real medical database, and extensive simulation results demonstrate that our proposed SPCQ scheme is highly efficient in terms of computation and communication costs.

  12. Secure and Privacy-Preserving Body Sensor Data Collection and Query Scheme

    PubMed Central

    Zhu, Hui; Gao, Lijuan; Li, Hui

    2016-01-01

    With the development of body sensor networks and the pervasiveness of smart phones, different types of personal data can be collected in real time by body sensors, and the potential value of massive personal data has attracted considerable interest recently. However, the privacy issues of sensitive personal data are still challenging today. Aiming at these challenges, in this paper, we focus on the threats from telemetry interface and present a secure and privacy-preserving body sensor data collection and query scheme, named SPCQ, for outsourced computing. In the proposed SPCQ scheme, users’ personal information is collected by body sensors in different types and converted into multi-dimension data, and each dimension is converted into the form of a number and uploaded to the cloud server, which provides a secure, efficient and accurate data query service, while the privacy of sensitive personal information and users’ query data is guaranteed. Specifically, based on an improved homomorphic encryption technology over composite order group, we propose a special weighted Euclidean distance contrast algorithm (WEDC) for multi-dimension vectors over encrypted data. With the SPCQ scheme, the confidentiality of sensitive personal data, the privacy of data users’ queries and accurate query service can be achieved in the cloud server. Detailed analysis shows that SPCQ can resist various security threats from telemetry interface. In addition, we also implement SPCQ on an embedded device, smart phone and laptop with a real medical database, and extensive simulation results demonstrate that our proposed SPCQ scheme is highly efficient in terms of computation and communication costs. PMID:26840319

  13. Deep Constrained Siamese Hash Coding Network and Load-Balanced Locality-Sensitive Hashing for Near Duplicate Image Detection.

    PubMed

    Hu, Weiming; Fan, Yabo; Xing, Junliang; Sun, Liang; Cai, Zhaoquan; Maybank, Stephen

    2018-09-01

    We construct a new efficient near duplicate image detection method using a hierarchical hash code learning neural network and load-balanced locality-sensitive hashing (LSH) indexing. We propose a deep constrained siamese hash coding neural network combined with deep feature learning. Our neural network is able to extract effective features for near duplicate image detection. The extracted features are used to construct a LSH-based index. We propose a load-balanced LSH method to produce load-balanced buckets in the hashing process. The load-balanced LSH significantly reduces the query time. Based on the proposed load-balanced LSH, we design an effective and feasible algorithm for near duplicate image detection. Extensive experiments on three benchmark data sets demonstrate the effectiveness of our deep siamese hash encoding network and load-balanced LSH.

  14. Unsupervised classification of variable stars

    NASA Astrophysics Data System (ADS)

    Valenzuela, Lucas; Pichara, Karim

    2018-03-01

    During the past 10 years, a considerable amount of effort has been made to develop algorithms for automatic classification of variable stars. That has been primarily achieved by applying machine learning methods to photometric data sets where objects are represented as light curves. Classifiers require training sets to learn the underlying patterns that allow the separation among classes. Unfortunately, building training sets is an expensive process that demands a lot of human efforts. Every time data come from new surveys; the only available training instances are the ones that have a cross-match with previously labelled objects, consequently generating insufficient training sets compared with the large amounts of unlabelled sources. In this work, we present an algorithm that performs unsupervised classification of variable stars, relying only on the similarity among light curves. We tackle the unsupervised classification problem by proposing an untraditional approach. Instead of trying to match classes of stars with clusters found by a clustering algorithm, we propose a query-based method where astronomers can find groups of variable stars ranked by similarity. We also develop a fast similarity function specific for light curves, based on a novel data structure that allows scaling the search over the entire data set of unlabelled objects. Experiments show that our unsupervised model achieves high accuracy in the classification of different types of variable stars and that the proposed algorithm scales up to massive amounts of light curves.

  15. Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes

    NASA Astrophysics Data System (ADS)

    Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel

    RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.

  16. Active learning methods for interactive image retrieval.

    PubMed

    Gosselin, Philippe Henri; Cord, Matthieu

    2008-07-01

    Active learning methods have been considered with increased interest in the statistical learning community. Initially developed within a classification framework, a lot of extensions are now being proposed to handle multimedia applications. This paper provides algorithms within a statistical framework to extend active learning for online content-based image retrieval (CBIR). The classification framework is presented with experiments to compare several powerful classification techniques in this information retrieval context. Focusing on interactive methods, active learning strategy is then described. The limitations of this approach for CBIR are emphasized before presenting our new active selection process RETIN. First, as any active method is sensitive to the boundary estimation between classes, the RETIN strategy carries out a boundary correction to make the retrieval process more robust. Second, the criterion of generalization error to optimize the active learning selection is modified to better represent the CBIR objective of database ranking. Third, a batch processing of images is proposed. Our strategy leads to a fast and efficient active learning scheme to retrieve sets of online images (query concept). Experiments on large databases show that the RETIN method performs well in comparison to several other active strategies.

  17. Leveraging hospital big data to monitor flu epidemics.

    PubMed

    Bouzillé, Guillaume; Poirier, Canelle; Campillo-Gimenez, Boris; Aubert, Marie-Laure; Chabot, Mélanie; Chazard, Emmanuel; Lavenu, Audrey; Cuggia, Marc

    2018-02-01

    Influenza epidemics are a major public health concern and require a costly and time-consuming surveillance system at different geographical scales. The main challenge is being able to predict epidemics. Besides traditional surveillance systems, such as the French Sentinel network, several studies proposed prediction models based on internet-user activity. Here, we assessed the potential of hospital big data to monitor influenza epidemics. We used the clinical data warehouse of the Academic Hospital of Rennes (France) and then built different queries to retrieve relevant information from electronic health records to gather weekly influenza-like illness activity. We found that the query most highly correlated with Sentinel network estimates was based on emergency reports concerning discharged patients with a final diagnosis of influenza (Pearson's correlation coefficient (PCC) of 0.931). The other tested queries were based on structured data (ICD-10 codes of influenza in Diagnosis-related Groups, and influenza PCR tests) and performed best (PCC of 0.981 and 0.953, respectively) during the flu season 2014-15. This suggests that both ICD-10 codes and PCR results are associated with severe epidemics. Finally, our approach allowed us to obtain additional patients' characteristics, such as the sex ratio or age groups, comparable with those from the Sentinel network. Conclusions: Hospital big data seem to have a great potential for monitoring influenza epidemics in near real-time. Such a method could constitute a complementary tool to standard surveillance systems by providing additional characteristics on the concerned population or by providing information earlier. This system could also be easily extended to other diseases with possible activity changes. Additional work is needed to assess the real efficacy of predictive models based on hospital big data to predict flu epidemics. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. A data-driven soft sensor for needle deflection in heterogeneous tissue using just-in-time modelling.

    PubMed

    Rossa, Carlos; Lehmann, Thomas; Sloboda, Ronald; Usmani, Nawaid; Tavakoli, Mahdi

    2017-08-01

    Global modelling has traditionally been the approach taken to estimate needle deflection in soft tissue. In this paper, we propose a new method based on local data-driven modelling of needle deflection. External measurement of needle-tissue interactions is collected from several insertions in ex vivo tissue to form a cloud of data. Inputs to the system are the needle insertion depth, axial rotations, and the forces and torques measured at the needle base by a force sensor. When a new insertion is performed, the just-in-time learning method estimates the model outputs given the current inputs to the needle-tissue system and the historical database. The query is compared to every observation in the database and is given weights according to some similarity criteria. Only a subset of historical data that is most relevant to the query is selected and a local linear model is fit to the selected points to estimate the query output. The model outputs the 3D deflection of the needle tip and the needle insertion force. The proposed approach is validated in ex vivo multilayered biological tissue in different needle insertion scenarios. Experimental results in five different case studies indicate an accuracy in predicting needle deflection of 0.81 and 1.24 mm in the horizontal and vertical lanes, respectively, and an accuracy of 0.5 N in predicting the needle insertion force over 216 needle insertions.

  19. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.

  20. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323

  1. ENT COBRA (Consortium for Brachytherapy Data Analysis): interdisciplinary standardized data collection system for head and neck patients treated with interventional radiotherapy (brachytherapy).

    PubMed

    Tagliaferri, Luca; Kovács, György; Autorino, Rosa; Budrukkar, Ashwini; Guinot, Jose Luis; Hildebrand, Guido; Johansson, Bengt; Monge, Rafael Martìnez; Meyer, Jens E; Niehoff, Peter; Rovirosa, Angeles; Takàcsi-Nagy, Zoltàn; Dinapoli, Nicola; Lanzotti, Vito; Damiani, Andrea; Soror, Tamer; Valentini, Vincenzo

    2016-08-01

    Aim of the COBRA (Consortium for Brachytherapy Data Analysis) project is to create a multicenter group (consortium) and a web-based system for standardized data collection. GEC-ESTRO (Groupe Européen de Curiethérapie - European Society for Radiotherapy & Oncology) Head and Neck (H&N) Working Group participated in the project and in the implementation of the consortium agreement, the ontology (data-set) and the necessary COBRA software services as well as the peer reviewing of the general anatomic site-specific COBRA protocol. The ontology was defined by a multicenter task-group. Eleven centers from 6 countries signed an agreement and the consortium approved the ontology. We identified 3 tiers for the data set: Registry (epidemiology analysis), Procedures (prediction models and DSS), and Research (radiomics). The COBRA-Storage System (C-SS) is not time-consuming as, thanks to the use of "brokers", data can be extracted directly from the single center's storage systems through a connection with "structured query language database" (SQL-DB), Microsoft Access(®), FileMaker Pro(®), or Microsoft Excel(®). The system is also structured to perform automatic archiving directly from the treatment planning system or afterloading machine. The architecture is based on the concept of "on-purpose data projection". The C-SS architecture is privacy protecting because it will never make visible data that could identify an individual patient. This C-SS can also benefit from the so called "distributed learning" approaches, in which data never leave the collecting institution, while learning algorithms and proposed predictive models are commonly shared. Setting up a consortium is a feasible and practicable tool in the creation of an international and multi-system data sharing system. COBRA C-SS seems to be well accepted by all involved parties, primarily because it does not influence the center's own data storing technologies, procedures, and habits. Furthermore, the method preserves the privacy of all patients.

  2. Federated ontology-based queries over cancer data

    PubMed Central

    2012-01-01

    Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043

  3. Nearest private query based on quantum oblivious key distribution

    NASA Astrophysics Data System (ADS)

    Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan

    2017-12-01

    Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.

  4. Design notes for the next generation persistent object manager for CAP

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Isely, M.; Fischler, M.; Galli, M.

    1995-05-01

    The CAP query system software at Fermilab has several major components, including SQS (for managing the query), the retrieval system (for fetching auxiliary data), and the query software itself. The central query software in particular is essentially a modified version of the `ptool` product created at UIC (University of Illinois at Chicago) as part of the PASS project under Bob Grossman. The original UIC version was designed for use in a single-user non-distributed Unix environment. The Fermi modifications were an attempt to permit multi-user access to a data set distributed over a set of storage nodes. (The hardware is anmore » IBM SP-x system - a cluster of AIX POWER2 nodes with an IBM-proprietary high speed switch interconnect). Since the implementation work of the Fermi-ized ptool, the CAP members have learned quite a bit about the nature of queries and where the current performance bottlenecks exist. This has lead them to design a persistent object manager that will overcome these problems. For backwards compatibility with ptool, the ptool persistent object API will largely be retained, but the implementation will be entirely different.« less

  5. Queries over Unstructured Data: Probabilistic Methods to the Rescue

    NASA Astrophysics Data System (ADS)

    Sarawagi, Sunita

    Unstructured data like emails, addresses, invoices, call transcripts, reviews, and press releases are now an integral part of any large enterprise. A challenge of modern business intelligence applications is analyzing and querying data seamlessly across structured and unstructured sources. This requires the development of automated techniques for extracting structured records from text sources and resolving entity mentions in data from various sources. The success of any automated method for extraction and integration depends on how effectively it unifies diverse clues in the unstructured source and in existing structured databases. We argue that statistical learning techniques like Conditional Random Fields (CRFs) provide a accurate, elegant and principled framework for tackling these tasks. Given the inherent noise in real-world sources, it is important to capture the uncertainty of the above operations via imprecise data models. CRFs provide a sound probability distribution over extractions but are not easy to represent and query in a relational framework. We present methods of approximating this distribution to query-friendly row and column uncertainty models. Finally, we present models for representing the uncertainty of de-duplication and algorithms for various Top-K count queries on imprecise duplicates.

  6. NCBI2RDF: enabling full RDF-based access to NCBI databases.

    PubMed

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.

  7. Incremental Query Rewriting with Resolution

    NASA Astrophysics Data System (ADS)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  8. Facebook as a learning environment for teaching medical emergencies in dental practice.

    PubMed

    Alshiekhly, Ulla; Arrar, Rebal; Barngkgei, Imad; Dashash, Mayssoon

    2015-01-01

    Social media can be part of the formal education of health professsionals and in their lifelong learning activities. The effectiveness of Facebook, an online social medium, application for educational purposes was evaluated in this study. It was used to serve as a teaching medium of a course in medical emergencies in dental practice (MEDP). Syrian dental students were invited to join a Facebook group "Medical emergencies in dental practice" during the second semester of the academic year 2013-2014. The group privacy settings were changed from an open group to a closed group after the registration period. Administrators of the group published 61 posts during the course period, which extended for one month. Students' progress in learning was evaluated using self-assessment questionnaires administered to the students before and after the course. These questionnaires also queried their opinions regarding the use of Facebook as an educational modality. Qualitative statistics, Wilcoxon signed ranks and Mann-Whitney U-tests were used to analyze the data. Out of 388 students registered in this course, 184 completed it. Two-third of students agreed that Facebook was useful in education. Their impressions of this course were 17.4% as excellent, 52.2% as very good. P values of the self-assessment questions of Wilcoxon signed ranks test were <0.001, indicating self-assessed improvement in MEDP skills. Facebook as a social medium provides a unique learning environment. It allows students to discuss topics more openly in a flexible setting with less rigid time and place constraints. In the light of this study it was found that Facebook may be useful in teaching medical emergencies in dental practice in its theoretical aspect.

  9. Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

    ERIC Educational Resources Information Center

    Choi, Youngok; Rasmussen, Edie M.

    2003-01-01

    Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…

  10. Searching and Filtering Tweets: CSIRO at the TREC 2012 Microblog Track

    DTIC Science & Technology

    2012-11-01

    stages. We first evaluate the effect of tweet corpus pre- processing in vanilla runs (no query expansion), and then assess the effect of query expansion...Effect of a vanilla run on D4 index (both realtime and non-real-time), and query expansion methods based on the submitted runs for two sets of queries

  11. Entrez Neuron RDFa: a pragmatic Semantic Web application for data integration in neuroscience research

    PubMed Central

    Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

    2013-01-01

    The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present “Entrez Neuron”, a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the ‘HCLS knowledgebase’ developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrates how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup. PMID:19745321

  12. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  13. Hybrid Collaborative Learning for Classification and Clustering in Sensor Networks

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Sosnowski, Scott; Lane, Terran

    2012-01-01

    Traditionally, nodes in a sensor network simply collect data and then pass it on to a centralized node that archives, distributes, and possibly analyzes the data. However, analysis at the individual nodes could enable faster detection of anomalies or other interesting events as well as faster responses, such as sending out alerts or increasing the data collection rate. There is an additional opportunity for increased performance if learners at individual nodes can communicate with their neighbors. In previous work, methods were developed by which classification algorithms deployed at sensor nodes can communicate information about event labels to each other, building on prior work with co-training, self-training, and active learning. The idea of collaborative learning was extended to function for clustering algorithms as well, similar to ideas from penta-training and consensus clustering. However, collaboration between these learner types had not been explored. A new protocol was developed by which classifiers and clusterers can share key information about their observations and conclusions as they learn. This is an active collaboration in which learners of either type can query their neighbors for information that they then use to re-train or re-learn the concept they are studying. The protocol also supports broadcasts from the classifiers and clusterers to the rest of the network to announce new discoveries. Classifiers observe an event and assign it a label (type). Clusterers instead group observations into clusters without assigning them a label, and they collaborate in terms of pairwise constraints between two events [same-cluster (mustlink) or different-cluster (cannot-link)]. Fundamentally, these two learner types speak different languages. To bridge this gap, the new communication protocol provides four types of exchanges: hybrid queries for information, hybrid "broadcasts" of learned information, each specified for classifiers-to-clusterers, and clusterers-to-classifiers. The new capability has the potential to greatly expand the in situ analysis abilities of sensor networks. Classifiers seeking to categorize incoming data into different types of events can operate in tandem with clusterers that are sensitive to the occurrence of new kinds of events not known to the classifiers. In contrast to current approaches that treat these operations as independent components, a hybrid collaborative learning system can enable them to learn from each other.

  14. "These People Are Never Going to Stop Labeling Me": Educational Experiences of African American Male Students Labeled with Learning Disabilities

    ERIC Educational Resources Information Center

    Banks, Joy

    2017-01-01

    This investigation employs Disability Critical Race Studies as a theoretical framework to determine the interdependence of racism and ableism in school settings. African American male students with learning disabilities are queried about their interpretations of special education placement and labeling while attempting to secure educational…

  15. Teaching Tip: Active Learning via a Sample Database: The Case of Microsoft's Adventure Works

    ERIC Educational Resources Information Center

    Mitri, Michel

    2015-01-01

    This paper describes the use and benefits of Microsoft's Adventure Works (AW) database to teach advanced database skills in a hands-on, realistic environment. Database management and querying skills are a key element of a robust information systems curriculum, and active learning is an important way to develop these skills. To facilitate active…

  16. Designing Adult Learning Strategies: The Case of South Eastern Europe

    ERIC Educational Resources Information Center

    Gunny, Madeleine; Viertel, Evelyn

    2006-01-01

    The importance of lifelong learning is generally well understood and few people today would query the need for adults to regularly update their skills in line with labour market needs, and for governments and social partners to provide an environment that supports skills acquisition and updating. However, it is clear when we look at data from the…

  17. Practical quantum private query of blocks based on unbalanced-state Bennett-Brassard-1984 quantum-key-distribution protocol.

    PubMed

    Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin

    2014-12-18

    Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message a1a2···al from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each a(i) contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.

  18. Active learning reduces annotation time for clinical concept extraction.

    PubMed

    Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

    2017-10-01

    To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Monitoring Moving Queries inside a Safe Region

    PubMed Central

    Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652

  20. SensorDB: a virtual laboratory for the integration, visualization and analysis of varied biological sensor data.

    PubMed

    Salehi, Ali; Jimenez-Berni, Jose; Deery, David M; Palmer, Doug; Holland, Edward; Rozas-Larraondo, Pablo; Chapman, Scott C; Georgakopoulos, Dimitrios; Furbank, Robert T

    2015-01-01

    To our knowledge, there is no software or database solution that supports large volumes of biological time series sensor data efficiently and enables data visualization and analysis in real time. Existing solutions for managing data typically use unstructured file systems or relational databases. These systems are not designed to provide instantaneous response to user queries. Furthermore, they do not support rapid data analysis and visualization to enable interactive experiments. In large scale experiments, this behaviour slows research discovery, discourages the widespread sharing and reuse of data that could otherwise inform critical decisions in a timely manner and encourage effective collaboration between groups. In this paper we present SensorDB, a web based virtual laboratory that can manage large volumes of biological time series sensor data while supporting rapid data queries and real-time user interaction. SensorDB is sensor agnostic and uses web-based, state-of-the-art cloud and storage technologies to efficiently gather, analyse and visualize data. Collaboration and data sharing between different agencies and groups is thereby facilitated. SensorDB is available online at http://sensordb.csiro.au.

  1. Minimizing Statistical Bias with Queries.

    DTIC Science & Technology

    1995-09-14

    method for optimally selecting these points would o er enormous savings in time and money. An active learning system will typically attempt to select data...research in active learning assumes that the sec- ond term of Equation 2 is approximately zero, that is, that the learner is unbiased. If this is the case...outperforms the variance- minimizing algorithm and random exploration. and e ective strategy for active learning . I have given empirical evidence that, with

  2. MorphoSaurus--design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain.

    PubMed

    Markó, K; Schulz, S; Hahn, U

    2005-01-01

    We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain. Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents. We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline. Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contra-distinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.

  3. Clustering and Flow Conservation Monitoring Tool for Software Defined Networks.

    PubMed

    Puente Fernández, Jesús Antonio; García Villalba, Luis Javier; Kim, Tai-Hoon

    2018-04-03

    Prediction systems present some challenges on two fronts: the relation between video quality and observed session features and on the other hand, dynamics changes on the video quality. Software Defined Networks (SDN) is a new concept of network architecture that provides the separation of control plane (controller) and data plane (switches) in network devices. Due to the existence of the southbound interface, it is possible to deploy monitoring tools to obtain the network status and retrieve a statistics collection. Therefore, achieving the most accurate statistics depends on a strategy of monitoring and information requests of network devices. In this paper, we propose an enhanced algorithm for requesting statistics to measure the traffic flow in SDN networks. Such an algorithm is based on grouping network switches in clusters focusing on their number of ports to apply different monitoring techniques. Such grouping occurs by avoiding monitoring queries in network switches with common characteristics and then, by omitting redundant information. In this way, the present proposal decreases the number of monitoring queries to switches, improving the network traffic and preventing the switching overload. We have tested our optimization in a video streaming simulation using different types of videos. The experiments and comparison with traditional monitoring techniques demonstrate the feasibility of our proposal maintaining similar values decreasing the number of queries to the switches.

  4. Rethinking the lecture: the application of problem based learning methods to atypical contexts.

    PubMed

    Rogal, Sonya M M; Snider, Paul D

    2008-05-01

    Problem based learning is a teaching and learning strategy that uses a problematic stimulus as a means of motivating and directing students to develop and acquire knowledge. Problem based learning is a strategy that is typically used with small groups attending a series of sessions. This article describes the principles of problem based learning and its application in atypical contexts; large groups attending discrete, stand-alone sessions. The principles of problem based learning are based on Socratic teaching, constructivism and group facilitation. To demonstrate the application of problem based learning in an atypical setting, this article focuses on the graduate nurse intake from a teaching hospital. The groups are relatively large and meet for single day sessions. The modified applications of problem based learning to meet the needs of atypical groups are described. This article contains a step by step guide of constructing a problem based learning package for large, single session groups. Nurse educators facing similar groups will find they can modify problem based learning to suit their teaching context.

  5. Executing SPARQL Queries over the Web of Linked Data

    NASA Astrophysics Data System (ADS)

    Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

    The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

  6. Context-Aware Online Commercial Intention Detection

    NASA Astrophysics Data System (ADS)

    Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng

    With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.

  7. Influenza-like illness surveillance on Twitter through automated learning of naïve language.

    PubMed

    Gesualdo, Francesco; Stilo, Giovanni; Agricola, Eleonora; Gonfiantini, Michaela V; Pandolfi, Elisabetta; Velardi, Paola; Tozzi, Alberto E

    2013-01-01

    Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

  8. Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language

    PubMed Central

    Gesualdo, Francesco; Stilo, Giovanni; Agricola, Eleonora; Gonfiantini, Michaela V.; Pandolfi, Elisabetta; Velardi, Paola; Tozzi, Alberto E.

    2013-01-01

    Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems. PMID:24324799

  9. Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman

    2014-01-01

    Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380

  10. VIGOR: Interactive Visual Exploration of Graph Query Results.

    PubMed

    Pienta, Robert; Hohman, Fred; Endert, Alex; Tamersoy, Acar; Roundy, Kevin; Gates, Chris; Navathe, Shamkant; Chau, Duen Horng

    2018-01-01

    Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

  11. Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries.

    PubMed

    Hoijemberg, Pablo A; Pelczer, István

    2018-01-05

    A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.

  12. Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

    NASA Astrophysics Data System (ADS)

    Juric, Mario

    2011-01-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

  13. Bilastine in allergic rhinoconjunctivitis and urticaria: a practical approach to treatment decisions based on queries received by the medical information department

    PubMed Central

    Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis

    2017-01-01

    Background Bilastine is a safe and effective commonly prescribed non-sedating H1-antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Data sources and methods Queries received by the Medical Information Department and the responses provided to senders of these queries. Results The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Conclusions Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances. PMID:28210286

  14. Bilastine in allergic rhinoconjunctivitis and urticaria: a practical approach to treatment decisions based on queries received by the medical information department.

    PubMed

    Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis

    2017-01-01

    Bilastine is a safe and effective commonly prescribed non-sedating H 1 -antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Queries received by the Medical Information Department and the responses provided to senders of these queries. The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances.

  15. Indexing and retrieval of multimedia objects at different levels of granularity

    NASA Astrophysics Data System (ADS)

    Faudemay, Pascal; Durand, Gwenael; Seyrat, Claude; Tondre, Nicolas

    1998-10-01

    Intelligent access to multimedia databases for `naive user' should probably be based on queries formulation by `intelligent agents'. These agents should `understand' the semantics of the contents, learn user preferences and deliver to the user a subset of the source contents, for further navigation. The goal of such systems should be to enable `zero-command' access to the contents, while keeping the freedom of choice of the user. Such systems should interpret multimedia contents in terms of multiple audiovisual objects (from video to visual or audio object), and on actions and scenarios.

  16. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  17. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  18. The Development and Preliminary Application Ofplant Quarantine Remote Teaching System Inchina

    NASA Astrophysics Data System (ADS)

    Wu, Zhigang; Li, Zhihong; Yang, Ding; Zhang, Guozhen

    With the development of modern information technology, the traditional teaching mode becomes more deficient for the requirement of modern education. Plant Quarantine has been accepted as the common course for the universities of agriculture in China after the entry of WTO. But the teaching resources of this course are not enough especially for most universities with lack base. The characteristic of e-learning is regarded as one way to solve the problem of short teaching resource. PQRTS (Plant Quarantine Remote Teaching System) was designed and developed with JSP (Java Sever Pages), MySQL and Tomcat in this study. The system included many kinds of plant quarantine teaching resources, such as international glossary, regulations and standards, multimedia information of quarantine process and pests, ppt files of teaching, and training exercise. The system prototype implemented the functions of remote learning, querying, management, examination and remote discussion. It could be a tool for teaching, teaching assistance and learning online.

  19. Collaborative mining of graph patterns from multiple sources

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Colonna-Romanoa, John

    2016-05-01

    Intelligence analysts require automated tools to mine multi-source data, including answering queries, learning patterns of life, and discovering malicious or anomalous activities. Graph mining algorithms have recently attracted significant attention in intelligence community, because the text-derived knowledge can be efficiently represented as graphs of entities and relationships. However, graph mining models are limited to use-cases involving collocated data, and often make restrictive assumptions about the types of patterns that need to be discovered, the relationships between individual sources, and availability of accurate data segmentation. In this paper we present a model to learn the graph patterns from multiple relational data sources, when each source might have only a fragment (or subgraph) of the knowledge that needs to be discovered, and segmentation of data into training or testing instances is not available. Our model is based on distributed collaborative graph learning, and is effective in situations when the data is kept locally and cannot be moved to a centralized location. Our experiments show that proposed collaborative learning achieves learning quality better than aggregated centralized graph learning, and has learning time comparable to traditional distributed learning in which a knowledge of data segmentation is needed.

  20. NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

    PubMed Central

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425

  1. A New Publicly Available Chemical Query Language, CSRML ...

    EPA Pesticide Factsheets

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan

  2. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

  3. Flipped classroom instructional approach in undergraduate medical education

    PubMed Central

    Fatima, Syeda Sadia; Arain, Fazal Manzoor; Enam, Syed Ather

    2017-01-01

    Objective: In this study we implemented the “flipped classroom” model to enhance active learning in medical students taking neurosciences module at Aga Khan University, Karachi. Methods: Ninety eight undergraduate medical students participated in this study. The study was conducted from January till March 2017. Study material was provided to students in form of video lecture and reading material for the non-face to face sitting, while face to face time was spent on activities such as case solving, group discussions, and quizzes to consolidate learning under the supervision of faculty. To ensure deeper learning, we used pre- and post-class quizzes, work sheets and blog posts for each session. Student feedback was recorded via a likert scale survey. Results: Eighty four percent students gave positive responses towards utility of flipped classroom in terms of being highly interactive, thought provoking and activity lead learning. Seventy five percent of the class completed the pre-session preparation. Students reported that their queries and misconceptions were cleared in a much better way in the face-to-face session as compared to the traditional setting (4.09 ±1.04). Conclusion: Flipped classroom(FCR) teaching and learning pedagogy is an effective way of enhancing student engagement and active learning. Thus, this pedagogy can be used as an effective tool in medical schools. PMID:29492071

  4. Flipped classroom instructional approach in undergraduate medical education.

    PubMed

    Fatima, Syeda Sadia; Arain, Fazal Manzoor; Enam, Syed Ather

    2017-01-01

    In this study we implemented the "flipped classroom" model to enhance active learning in medical students taking neurosciences module at Aga Khan University, Karachi. Ninety eight undergraduate medical students participated in this study. The study was conducted from January till March 2017. Study material was provided to students in form of video lecture and reading material for the non-face to face sitting, while face to face time was spent on activities such as case solving, group discussions, and quizzes to consolidate learning under the supervision of faculty. To ensure deeper learning, we used pre- and post-class quizzes, work sheets and blog posts for each session. Student feedback was recorded via a likert scale survey. Eighty four percent students gave positive responses towards utility of flipped classroom in terms of being highly interactive, thought provoking and activity lead learning. Seventy five percent of the class completed the pre-session preparation. Students reported that their queries and misconceptions were cleared in a much better way in the face-to-face session as compared to the traditional setting (4.09 ±1.04). Flipped classroom(FCR) teaching and learning pedagogy is an effective way of enhancing student engagement and active learning. Thus, this pedagogy can be used as an effective tool in medical schools.

  5. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  6. Multi-Bit Quantum Private Query

    NASA Astrophysics Data System (ADS)

    Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

    2015-09-01

    Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.

  7. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories.

    PubMed

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.

  8. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

    PubMed Central

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239

  9. A trial of team-based versus small-group learning for second-year medical students: does the size of the small group make a difference?

    PubMed

    Willett, Laura Rees; Rosevear, G Craig; Kim, Sarang

    2011-01-01

    Team-based learning is a large-group instructional modality intended to provide active learning with modest faculty resources. The goal is to determine if team-based learning could be substituted for small-group learning in case sessions without compromising test performance or satisfaction. One hundred and sixty-seven students were assigned to team-based or small-group learning for 6 case discussion sessions. Examination scores and student satisfaction were compared. Instruction modality had no meaningful effect on examination score, 81.7% team based versus 79.7% small-group, p=.56 after multivariate adjustment. Student satisfaction was lower with team-based learning, 2.45 versus 3.74 on a 5-point scale, p<.001. Survey responses suggested that the very small size (8-10 students) of our small groups influenced the preference for small-group learning. Team-based learning does not adversely affect examination performance. However, student satisfaction may be inferior, especially if compared to instruction in very small groups of 10 or fewer students.

  10. Effects of College Students' Characteristics, Culture, and Language on Using E-Texts in Distance Learning

    ERIC Educational Resources Information Center

    Ainsa, Patricia

    2015-01-01

    E-texts have become a main venue but research has not provided much guidance for practical adaptation, yet. This research query started in the spring of 2014 when an e-text was adopted for an undergraduate distance learning class. The change created some unexpected influence in the students' experiences. It was necessary to assess their…

  11. Allie: a database and a search service of abbreviations and long forms

    PubMed Central

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  12. Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2017-02-01

    With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

  13. Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

    PubMed Central

    2013-01-01

    Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556

  14. Enhanced Approximate Nearest Neighbor via Local Area Focused Search.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gonzales, Antonio; Blazier, Nicholas Paul

    Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less

  15. Problem-based learning in comparison with lecture-based learning among medical students.

    PubMed

    Faisal, Rizwan; Bahadur, Sher; Shinwari, Laiyla

    2016-06-01

    To compare performance of medical students exposed to problem-based learning and lecture-based learning. The descriptive study was conducted at Rehman Medical College, Peshawar, Pakistan from May 20 to September 20, 2014, and comprised 146 students of 3rd year MBBS who were randomised into two equal groups. One group was taught by the traditional lecture based learning, while problem-based learning was conducted for the other group on the same topic. At the end of sessions, the performance of the two groups was evaluated by one-best type of 50 multiple choice questions. Total marks were 100, with each question carrying 2 marks. SPSS 15 was used for statistical analysis. There were 146 students who were divided into two equal groups of 73(50%) each. The mean score in the group exposed to problem-based learning was 3.2 ± 0.8 while those attending lecture-based learning was 2.7±0.8 (p= 0.0001). Problem-based learning was more effective than lecture based learning in the academic performance of medical students.

  16. Shark: SQL and Rich Analytics at Scale

    DTIC Science & Technology

    2012-11-26

    learning programs up to 100 faster than Hadoop. Unlike previous systems, Shark shows that it is possible to achieve these speedups while retaining a...Shark to run SQL queries up to 100× faster than Apache Hive, and machine learning programs up to 100× faster than Hadoop. Unlike previous systems, Shark...so using a runtime that is optimized for such workloads and a programming model that is designed to express machine learn - ing algorithms. 4.1

  17. WellnessRules: A Web 3.0 Case Study in RuleML-Based Prolog-N3 Profile Interoperation

    NASA Astrophysics Data System (ADS)

    Boley, Harold; Osmun, Taylor Michael; Craig, Benjamin Larry

    An interoperation study, WellnessRules, is described, where rules about wellness opportunities are created by participants in rule languages such as Prolog and N3, and translated within a wellness community using RuleML/XML. The wellness rules are centered around participants, as profiles, encoding knowledge about their activities conditional on the season, the time-of-day, the weather, etc. This distributed knowledge base extends FOAF profiles with a vocabulary and rules about wellness group networking. The communication between participants is organized through Rule Responder, permitting wellness-profile translation and distributed querying across engines. WellnessRules interoperates between rules and queries in the relational (Datalog) paradigm of the pure-Prolog subset of POSL and in the frame (F-logic) paradigm of N3. An evaluation of Rule Responder instantiated for WellnessRules revealed acceptable Web response times.

  18. Effectiveness of national evidence-based medicine competition in Taiwan

    PubMed Central

    2013-01-01

    Background Competition and education are intimately related and can be combined in many ways. The role of competition in medical education of evidence-based medicine (EBM) has not been investigated. In order to enhance the dissemination and implementation of EBM in Taiwan, EBM competitions have been established among healthcare professionals. This study was to evaluate the impact of competition in EBM learning. Methods The EBM competition used PICO (patient, intervention, comparison, and outcome) queries to examine participants’ skills in framing an answerable question, literature search, critical appraisal and clinical application among interdisciplinary teams. A structured questionnaire survey was conducted to investigate EBM among participants in the years of 2009 and 2011. Participants completed a baseline questionnaire survey at three months prior to the competition and finished the same questionnaire right after the competition. Results Valid questionnaires were collected from 358 participants, included 162 physicians, 71 nurses, 101 pharmacists, and 24 other allied healthcare professionals. There were significant increases in participants’ knowledge of and skills in EBM (p < 0.001). Their barriers to literature searching and forming answerable questions significantly decreased (p < 0.01). Furthermore, there were significant increases in their access to the evidence-based retrieval databases, including the Cochrane Library (p < 0.001), MD Consult (p < 0.001), ProQuest (p < 0.001), UpToDate (p = 0.001), CINAHL (p = 0.001), and MicroMedex (p = 0.024). Conclusions The current study demonstrates a method that successfully enhanced the knowledge of, skills in, and behavior of EBM. The data suggest competition using PICO queries may serve as an effective way to facilitate the learning of EBM. PMID:23651869

  19. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.

  20. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659

  1. Minimization of annotation work: diagnosis of mammographic masses via active learning

    NASA Astrophysics Data System (ADS)

    Zhao, Yu; Zhang, Jingyang; Xie, Hongzhi; Zhang, Shuyang; Gu, Lixu

    2018-06-01

    The prerequisite for establishing an effective prediction system for mammographic diagnosis is the annotation of each mammographic image. The manual annotation work is time-consuming and laborious, which becomes a great hindrance for researchers. In this article, we propose a novel active learning algorithm that can adequately address this problem, leading to the minimization of the labeling costs on the premise of guaranteed performance. Our proposed method is different from the existing active learning methods designed for the general problem as it is specifically designed for mammographic images. Through its modified discriminant functions and improved sample query criteria, the proposed method can fully utilize the pairing of mammographic images and select the most valuable images from both the mediolateral and craniocaudal views. Moreover, in order to extend active learning to the ordinal regression problem, which has no precedent in existing studies, but is essential for mammographic diagnosis (mammographic diagnosis is not only a classification task, but also an ordinal regression task for predicting an ordinal variable, viz. the malignancy risk of lesions), multiple sample query criteria need to be taken into consideration simultaneously. We formulate it as a criteria integration problem and further present an algorithm based on self-adaptive weighted rank aggregation to achieve a good solution. The efficacy of the proposed method was demonstrated on thousands of mammographic images from the digital database for screening mammography. The labeling costs of obtaining optimal performance in the classification and ordinal regression task respectively fell to 33.8 and 19.8 percent of their original costs. The proposed method also generated 1228 wins, 369 ties and 47 losses for the classification task, and 1933 wins, 258 ties and 185 losses for the ordinal regression task compared to the other state-of-the-art active learning algorithms. By taking the particularities of mammographic images, the proposed AL method can indeed reduce the manual annotation work to a great extent without sacrificing the performance of the prediction system for mammographic diagnosis.

  2. Minimization of annotation work: diagnosis of mammographic masses via active learning.

    PubMed

    Zhao, Yu; Zhang, Jingyang; Xie, Hongzhi; Zhang, Shuyang; Gu, Lixu

    2018-05-22

    The prerequisite for establishing an effective prediction system for mammographic diagnosis is the annotation of each mammographic image. The manual annotation work is time-consuming and laborious, which becomes a great hindrance for researchers. In this article, we propose a novel active learning algorithm that can adequately address this problem, leading to the minimization of the labeling costs on the premise of guaranteed performance. Our proposed method is different from the existing active learning methods designed for the general problem as it is specifically designed for mammographic images. Through its modified discriminant functions and improved sample query criteria, the proposed method can fully utilize the pairing of mammographic images and select the most valuable images from both the mediolateral and craniocaudal views. Moreover, in order to extend active learning to the ordinal regression problem, which has no precedent in existing studies, but is essential for mammographic diagnosis (mammographic diagnosis is not only a classification task, but also an ordinal regression task for predicting an ordinal variable, viz. the malignancy risk of lesions), multiple sample query criteria need to be taken into consideration simultaneously. We formulate it as a criteria integration problem and further present an algorithm based on self-adaptive weighted rank aggregation to achieve a good solution. The efficacy of the proposed method was demonstrated on thousands of mammographic images from the digital database for screening mammography. The labeling costs of obtaining optimal performance in the classification and ordinal regression task respectively fell to 33.8 and 19.8 percent of their original costs. The proposed method also generated 1228 wins, 369 ties and 47 losses for the classification task, and 1933 wins, 258 ties and 185 losses for the ordinal regression task compared to the other state-of-the-art active learning algorithms. By taking the particularities of mammographic images, the proposed AL method can indeed reduce the manual annotation work to a great extent without sacrificing the performance of the prediction system for mammographic diagnosis.

  3. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    PubMed Central

    Sadesh, S.; Suganthe, R. C.

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626

  4. Effects of case-based learning on communication skills, problem-solving ability, and learning motivation in nursing students.

    PubMed

    Yoo, Moon-Sook; Park, Hyung-Ran

    2015-06-01

    The purpose of this study was to explore the effects of case-based learning on communication skills, problem-solving ability, and learning motivation in sophomore nursing students. In this prospective, quasi-experimental study, we compared the pretest and post-test scores of an experimental group and a nonequivalent, nonsynchronized control group. Both groups were selected using convenience sampling, and consisted of students enrolled in a health communication course in the fall semesters of 2011 (control group) and 2012 (experimental group) at a nursing college in Suwon, South Korea. The two courses covered the same material, but in 2011 the course was lecture-based, while in 2012, lectures were replaced by case-based learning comprising five authentic cases of patient-nurse communication. At post-test, the case-based learning group showed significantly greater communication skills, problem-solving ability, and learning motivation than the lecture-based learning group. This finding suggests that case-based learning is an effective learning and teaching method. © 2014 Wiley Publishing Asia Pty Ltd.

  5. A User-Centered Approach to Adaptive Hypertext Based on an Information Relevance Model

    NASA Technical Reports Server (NTRS)

    Mathe, Nathalie; Chen, James

    1994-01-01

    Rapid and effective to information in large electronic documentation systems can be facilitated if information relevant in an individual user's content can be automatically supplied to this user. However most of this knowledge on contextual relevance is not found within the contents of documents, it is rather established incrementally by users during information access. We propose a new model for interactively learning contextual relevance during information retrieval, and incrementally adapting retrieved information to individual user profiles. The model, called a relevance network, records the relevance of references based on user feedback for specific queries and user profiles. It also generalizes such knowledge to later derive relevant references for similar queries and profiles. The relevance network lets users filter information by context of relevance. Compared to other approaches, it does not require any prior knowledge nor training. More importantly, our approach to adaptivity is user-centered. It facilitates acceptance and understanding by users by giving them shared control over the adaptation without disturbing their primary task. Users easily control when to adapt and when to use the adapted system. Lastly, the model is independent of the particular application used to access information, and supports sharing of adaptations among users.

  6. Design Recommendations for Query Languages

    DTIC Science & Technology

    1980-09-01

    DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system

  7. Agent-Based Framework for Discrete Entity Simulations

    DTIC Science & Technology

    2006-11-01

    Postgres database server for environment queries of neighbors and continuum data. As expected for raw database queries (no database optimizations in...form. Eventually the code was ported to GNU C++ on the same single Intel Pentium 4 CPU running RedHat Linux 9.0 and Postgres database server...Again Postgres was used for environmental queries, and the tool remained relatively slow because of the immense number of queries necessary to assess

  8. Using a data base management system for modelling SSME test history data

    NASA Technical Reports Server (NTRS)

    Abernethy, K.

    1985-01-01

    The usefulness of a data base management system (DBMS) for modelling historical test data for the complete series of static test firings for the Space Shuttle Main Engine (SSME) was assessed. From an analysis of user data base query requirements, it became clear that a relational DMBS which included a relationally complete query language would permit a model satisfying the query requirements. Representative models and sample queries are discussed. A list of environment-particular evaluation criteria for the desired DBMS was constructed; these criteria include requirements in the areas of user-interface complexity, program independence, flexibility, modifiability, and output capability. The evaluation process included the construction of several prototype data bases for user assessement. The systems studied, representing the three major DBMS conceptual models, were: MIRADS, a hierarchical system; DMS-1100, a CODASYL-based network system; ORACLE, a relational system; and DATATRIEVE, a relational-type system.

  9. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  10. ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP

    NASA Astrophysics Data System (ADS)

    Chatziantoniou, Damianos; Sotiropoulos, Yannis

    Modern data analysis has given birth to numerous grouping constructs and programming paradigms, way beyond the traditional group by. Applications such as data warehousing, web log analysis, streams monitoring and social networks understanding necessitated the use of data cubes, grouping variables, windows and MapReduce. In this paper we review the associated set (ASSET) concept and discuss its applicability in both continuous and traditional data settings. Given a set of values B, an associated set over B is just a collection of annotated data multisets, one for each b(B. The goal is to efficiently compute aggregates over these data sets. An ASSET query consists of repeated definitions of associated sets and aggregates of these, possibly correlated, resembling a spreadsheet document. We review systems implementing ASSET queries both in continuous and persistent contexts and argue for associated sets' analytical abilities and optimization opportunities.

  11. Acceptability of Mental Health Stigma-Reduction Training and Initial Effects on Awareness Among Military Personnel

    DTIC Science & Technology

    2015-10-13

    attitudes. The primary limitation in this study was the one - group pretest – posttest design for the assessment of change in stigma awareness. The results...participated in a pretest , 2‑h stigma‑reduction training and immediate posttest . Acceptability of the training was measured by querying participants about...SpringerPlus (2015) 4:606 concern and how to conduct a group discussion about confidentiality and treatment options based on a segment from the

  12. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  13. Distributed XQuery-Based Integration and Visualization of Multimodality Brain Mapping Data

    PubMed Central

    Detwiler, Landon T.; Suciu, Dan; Franklin, Joshua D.; Moore, Eider B.; Poliakov, Andrew V.; Lee, Eunjung S.; Corina, David P.; Ojemann, George A.; Brinkley, James F.

    2008-01-01

    This paper addresses the need for relatively small groups of collaborating investigators to integrate distributed and heterogeneous data about the brain. Although various national efforts facilitate large-scale data sharing, these approaches are generally too “heavyweight” for individual or small groups of investigators, with the result that most data sharing among collaborators continues to be ad hoc. Our approach to this problem is to create a “lightweight” distributed query architecture, in which data sources are accessible via web services that accept arbitrary query languages but return XML results. A Distributed XQuery Processor (DXQP) accepts distributed XQueries in which subqueries are shipped to the remote data sources to be executed, with the resulting XML integrated by DXQP. A web-based application called DXBrain accesses DXQP, allowing a user to create, save and execute distributed XQueries, and to view the results in various formats including a 3-D brain visualization. Example results are presented using distributed brain mapping data sources obtained in studies of language organization in the brain, but any other XML source could be included. The advantage of this approach is that it is very easy to add and query a new source, the tradeoff being that the user needs to understand XQuery and the schemata of the underlying sources. For small numbers of known sources this burden is not onerous for a knowledgeable user, leading to the conclusion that the system helps to fill the gap between ad hoc local methods and large scale but complex national data sharing efforts. PMID:19198662

  14. Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.

    PubMed Central

    Combi, C.; Missora, L.; Pinciroli, F.

    1996-01-01

    Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722

  15. WATCHMAN: A Data Warehouse Intelligent Cache Manager

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Shim, Junho; Vingralek, Radek

    1996-01-01

    Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

  16. An index-based algorithm for fast on-line query processing of latent semantic analysis

    PubMed Central

    Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747

  17. An index-based algorithm for fast on-line query processing of latent semantic analysis.

    PubMed

    Zhang, Mingxi; Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

  18. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  19. A Distributed Multi-Agent System for Collaborative Information Management and Learning

    NASA Technical Reports Server (NTRS)

    Chen, James R.; Wolfe, Shawn R.; Wragg, Stephen D.; Koga, Dennis (Technical Monitor)

    2000-01-01

    In this paper, we present DIAMS, a system of distributed, collaborative agents to help users access, manage, share and exchange information. A DIAMS personal agent helps its owner find information most relevant to current needs. It provides tools and utilities for users to manage their information repositories with dynamic organization and virtual views. Flexible hierarchical display is integrated with indexed query search-to support effective information access. Automatic indexing methods are employed to support user queries and communication between agents. Contents of a repository are kept in object-oriented storage to facilitate information sharing. Collaboration between users is aided by easy sharing utilities as well as automated information exchange. Matchmaker agents are designed to establish connections between users with similar interests and expertise. DIAMS agents provide needed services for users to share and learn information from one another on the World Wide Web.

  20. SU-E-T-170: Characterization of the Location, Extent, and Proximity to Critical Structures of Target Volumes Provides Detail for Improved Outcome Predictions Among Pancreatic Cancer Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheng, Z; Moore, J; Rosati, L

    Purpose: In radiotherapy, size, location and proximity of the target to critical structures influence treatment decisions. It has been shown that proximity of the target predicts dosimetric sparing of critical structures. In addition to dosimetry, precise location of disease has further implications such as tumor invasion, or proximity to major arteries that inhibit surgery. Knowledge of which patients can be converted to surgical candidates by radiation may have high impact on future treat/no-treat decisions. We propose a method to improve our characterization of the location of pancreatic cancer and treatment volume extent with respect to nearby arteries with the goalmore » of developing features to improve clinical predictions and decisions. Methods: Oncospace is a local learning health system that systematically captures clinical outcomes and all aspects of radiotherapy treatment plans, including overlap volume histograms (OVH) – a measure of spatial relationships between two structures. Minimum and maximum distances of PTV and OARs based on OVH, PTV volume, anatomic location by ICD-9 code, and surgical outcome were queried. Normalized distance to center from the left and right kidney was calculated to indicate tumor location and laterality. Distance to critical arteries (celiac, superior mesenteric, common hepatic) is validated by surgical status (borderline resectable, locally advanced converted to resectable). Results: There were 205 pancreas stereotactic body radiotherapy patients treated from 2009–2015 queried. Location/laterality of tumor based on kidney OVH show strong trends between location by OVH and by ICD-9. Compared to the locally advanced group, the borderline resectable group showed larger geometrical distance from critical arteries (p=0.03). Conclusion: Our platform enabled analysis of shape/size-location relationships. These data suggest that PTV volume and attention to distance between PTVs and surrounding OARs and major arteries may be promising for improving characterization of treatment anatomy that can refine our ability for outcome predictions and decision making. Elekta, Toshiba.« less

  1. Group-oriented coordination models for distributed client-server computing

    NASA Technical Reports Server (NTRS)

    Adler, Richard M.; Hughes, Craig S.

    1994-01-01

    This paper describes group-oriented control models for distributed client-server interactions. These models transparently coordinate requests for services that involve multiple servers, such as queries across distributed databases. Specific capabilities include: decomposing and replicating client requests; dispatching request subtasks or copies to independent, networked servers; and combining server results into a single response for the client. The control models were implemented by combining request broker and process group technologies with an object-oriented communication middleware tool. The models are illustrated in the context of a distributed operations support application for space-based systems.

  2. CAPRI: A Geometric Foundation for Computational Analysis and Design

    NASA Technical Reports Server (NTRS)

    Haimes, Robert

    2006-01-01

    CAPRI is a software building tool-kit that refers to two ideas; (1) A simplified, object-oriented, hierarchical view of a solid part integrating both geometry and topology definitions, and (2) programming access to this part or assembly and any attached data. A complete definition of the geometry and application programming interface can be found in the document CAPRI: Computational Analysis PRogramming Interface appended to this report. In summary the interface is subdivided into the following functional components: 1. Utility routines -- These routines include the initialization of CAPRI, loading CAD parts and querying the operational status as well as closing the system down. 2. Geometry data-base queries -- This group of functions allow all top level applications to figure out and get detailed information on any geometric component in the Volume definition. 3. Point queries -- These calls allow grid generators, or solvers doing node adaptation, to snap points directly onto geometric entities. 4. Calculated or geometrically derived queries -- These entry points calculate data from the geometry to aid in grid generation. 5. Boundary data routines -- This part of CAPRI allows general data to be attached to Boundaries so that the boundary conditions can be specified and stored within CAPRI s data-base. 6. Tag based routines -- This part of the API allows the specification of properties associated with either the Volume (material properties) or Boundary (surface properties) entities. 7. Geometry based interpolation routines -- This part of the API facilitates Multi-disciplinary coupling and allows zooming through Boundary Attachments. 8. Geometric creation and manipulation -- These calls facilitate constructing simple solid entities and perform the Boolean solid operations. Geometry constructed in this manner has the advantage that if the data is kept consistent with the CAD package, therefore a new design can be incorporated directly and is manufacturable. 9. Master Model access This addition to the API allows for the querying of the parameters and dimensions of the model. The feature tree is also exposed so it is easy to see where the parameters are applied. Calls exist to allow for the modification of the parameters and the suppression/unsuppression of nodes in the tree. Part regeneration is performed by a single API call and a new part becomes available within CAPRI (if the regeneration was successful). This is described in a separate document. Components 1-7 are considered the CAPRI base level reader.

  3. Autocorrelation and Regularization of Query-Based Information Retrieval Scores

    DTIC Science & Technology

    2008-02-01

    of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation

  4. Query by example video based on fuzzy c-means initialized by fixed clustering center

    NASA Astrophysics Data System (ADS)

    Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

    2012-04-01

    Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Wu, Kesheng

    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less

  6. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  7. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  8. Concept-based query language approach to enterprise information systems

    NASA Astrophysics Data System (ADS)

    Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo

    2014-01-01

    In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.

  9. Retrieval feedback in MEDLINE.

    PubMed Central

    Srinivasan, P

    1996-01-01

    OBJECTIVE: To investigate a new approach for query expansion based on retrieval feedback. The first objective in this study was to examine alternative query-expansion methods within the same retrieval-feedback framework. The three alternatives proposed are: expansion on the MeSH query field alone, expansion on the free-text field alone, and expansion on both the MeSH and the free-text fields. The second objective was to gain further understanding of retrieval feedback by examining possible dependencies on relevant documents during the feedback cycle. DESIGN: Comparative study of retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a MEDLINE test collection of 75 queries and 2,334 MEDLINE citations. MEASUREMENTS: Retrieval effectivenesses of the original unexpanded and the alternative expanded queries were compared using 11-point-average precision scores (11-AvgP). These are averages of precision scores obtained at 11 standard recall points. RESULTS: All three expansion strategies significantly improved the original queries in terms of retrieval effectiveness. Expansion on MeSH alone was equivalent to expansion on both MeSH and the free-text fields. Expansion on the free-text field alone improved the queries significantly less than did the other two strategies. The second part of the study indicated that retrieval-feedback-based expansion yields significant performance improvements independent of the availability of relevant documents for feedback information. CONCLUSIONS: Retrieval feedback offers a robust procedure for query expansion that is most effective for MEDLINE when applied to the MeSH field. PMID:8653452

  10. Navigation as a New Form of Search for Agricultural Learning Resources in Semantic Repositories

    NASA Astrophysics Data System (ADS)

    Cano, Ramiro; Abián, Alberto; Mena, Elena

    Education is essential when it comes to raise public awareness on the environmental and economic benefits of organic agriculture and agroecology (OA & AE). Organic.Edunet, an EU funded project, aims at providing a freely-available portal where learning contents on OA & AE can be published and accessed through specialized technologies. This paper describes a novel mechanism for providing semantic capabilities (such as semantic navigational queries) to an arbitrary set of agricultural learning resources, in the context of the Organic.Edunet initiative.

  11. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    PubMed

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  12. A New Publicly Available Chemical Query Language, CSRML, to support Chemotype Representations for Application to Data-Mining and Modeling

    EPA Science Inventory

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transfor...

  13. Standard biological parts knowledgebase.

    PubMed

    Galdzicki, Michal; Rodriguez, Cesar; Chandran, Deepak; Sauro, Herbert M; Gennari, John H

    2011-02-24

    We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate "promoter" parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible.

  14. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech.

    PubMed

    Toth, Laszlo; Hoffmann, Ildiko; Gosztolya, Gabor; Vincze, Veronika; Szatloczki, Greta; Banreti, Zoltan; Pakaski, Magdolna; Kalman, Janos

    2018-01-01

    Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. An organizational framework and strategic implementation for system-level change to enhance research-based practice: QUERI Series

    PubMed Central

    Stetler, Cheryl B; McQueen, Lynn; Demakis, John; Mittman, Brian S

    2008-01-01

    Background The continuing gap between available evidence and current practice in health care reinforces the need for more effective solutions, in particular related to organizational context. Considerable advances have been made within the U.S. Veterans Health Administration (VA) in systematically implementing evidence into practice. These advances have been achieved through a system-level program focused on collaboration and partnerships among policy makers, clinicians, and researchers. The Quality Enhancement Research Initiative (QUERI) was created to generate research-driven initiatives that directly enhance health care quality within the VA and, simultaneously, contribute to the field of implementation science. This paradigm-shifting effort provided a natural laboratory for exploring organizational change processes. This article describes the underlying change framework and implementation strategy used to operationalize QUERI. Strategic approach to organizational change QUERI used an evidence-based organizational framework focused on three contextual elements: 1) cultural norms and values, in this case related to the role of health services researchers in evidence-based quality improvement; 2) capacity, in this case among researchers and key partners to engage in implementation research; 3) and supportive infrastructures to reinforce expectations for change and to sustain new behaviors as part of the norm. As part of a QUERI Series in Implementation Science, this article describes the framework's application in an innovative integration of health services research, policy, and clinical care delivery. Conclusion QUERI's experience and success provide a case study in organizational change. It demonstrates that progress requires a strategic, systems-based effort. QUERI's evidence-based initiative involved a deliberate cultural shift, requiring ongoing commitment in multiple forms and at multiple levels. VA's commitment to QUERI came in the form of visionary leadership, targeted allocation of resources, infrastructure refinements, innovative peer review and study methods, and direct involvement of key stakeholders. Stakeholders included both those providing and managing clinical care, as well as those producing relevant evidence within the health care system. The organizational framework and related implementation interventions used to achieve contextual change resulted in engaged investigators and enhanced uptake of research knowledge. QUERI's approach and progress provide working hypotheses for others pursuing similar system-wide efforts to routinely achieve evidence-based care. PMID:18510750

  16. Clustering and Flow Conservation Monitoring Tool for Software Defined Networks

    PubMed Central

    Puente Fernández, Jesús Antonio

    2018-01-01

    Prediction systems present some challenges on two fronts: the relation between video quality and observed session features and on the other hand, dynamics changes on the video quality. Software Defined Networks (SDN) is a new concept of network architecture that provides the separation of control plane (controller) and data plane (switches) in network devices. Due to the existence of the southbound interface, it is possible to deploy monitoring tools to obtain the network status and retrieve a statistics collection. Therefore, achieving the most accurate statistics depends on a strategy of monitoring and information requests of network devices. In this paper, we propose an enhanced algorithm for requesting statistics to measure the traffic flow in SDN networks. Such an algorithm is based on grouping network switches in clusters focusing on their number of ports to apply different monitoring techniques. Such grouping occurs by avoiding monitoring queries in network switches with common characteristics and then, by omitting redundant information. In this way, the present proposal decreases the number of monitoring queries to switches, improving the network traffic and preventing the switching overload. We have tested our optimization in a video streaming simulation using different types of videos. The experiments and comparison with traditional monitoring techniques demonstrate the feasibility of our proposal maintaining similar values decreasing the number of queries to the switches. PMID:29614049

  17. Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

    DTIC Science & Technology

    2014-11-01

    the same score, another singal will be used to rank these documents to break the ties , but the relative orders of other documents against these...documents remain the same. The tie- breaking step above is repeatedly applied to further break ties until all candidate signals are applied and the ranking...searched it on the Yahoo! search engine, which returned some query sug- gestions for the query. The original queries as well as their query suggestions

  18. Developing A Web-based User Interface for Semantic Information Retrieval

    NASA Technical Reports Server (NTRS)

    Berrios, Daniel C.; Keller, Richard M.

    2003-01-01

    While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.

  19. Privacy-Preserving Location-Based Services

    ERIC Educational Resources Information Center

    Chow, Chi Yin

    2010-01-01

    Location-based services (LBS for short) providers require users' current locations to answer their location-based queries, e.g., range and nearest-neighbor queries. Revealing personal location information to potentially untrusted service providers could create privacy risks for users. To this end, our objective is to design a privacy-preserving…

  20. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  1. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  2. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

    PubMed

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2013-11-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

  3. Use of controlled vocabularies to improve biomedical information retrieval tasks.

    PubMed

    Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian

    2013-01-01

    The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.

  4. Content-aware network storage system supporting metadata retrieval

    NASA Astrophysics Data System (ADS)

    Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun

    2008-12-01

    Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.

  5. Merged data models for multi-parameterized querying: Spectral data base meets GIS-based map archive

    NASA Astrophysics Data System (ADS)

    Naß, A.; D'Amore, M.; Helbert, J.

    2017-09-01

    Current and upcoming planetary missions deliver a huge amount of different data (remote sensing data, in-situ data, and derived products). Within this contribution present how different data (bases) can be managed and merged, to enable multi-parameterized querying based on the constant spatial context.

  6. Web Searching: A Process-Oriented Experimental Study of Three Interactive Search Paradigms.

    ERIC Educational Resources Information Center

    Dennis, Simon; Bruza, Peter; McArthur, Robert

    2002-01-01

    Compares search effectiveness when using query-based Internet search via the Google search engine, directory-based search via Yahoo, and phrase-based query reformulation-assisted search via the Hyperindex browser by means of a controlled, user-based experimental study of undergraduates at the University of Queensland. Discusses cognitive load,…

  7. A fully automatic end-to-end method for content-based image retrieval of CT scans with similar liver lesion annotations.

    PubMed

    Spanier, A B; Caplan, N; Sosna, J; Acar, B; Joskowicz, L

    2018-01-01

    The goal of medical content-based image retrieval (M-CBIR) is to assist radiologists in the decision-making process by retrieving medical cases similar to a given image. One of the key interests of radiologists is lesions and their annotations, since the patient treatment depends on the lesion diagnosis. Therefore, a key feature of M-CBIR systems is the retrieval of scans with the most similar lesion annotations. To be of value, M-CBIR systems should be fully automatic to handle large case databases. We present a fully automatic end-to-end method for the retrieval of CT scans with similar liver lesion annotations. The input is a database of abdominal CT scans labeled with liver lesions, a query CT scan, and optionally one radiologist-specified lesion annotation of interest. The output is an ordered list of the database CT scans with the most similar liver lesion annotations. The method starts by automatically segmenting the liver in the scan. It then extracts a histogram-based features vector from the segmented region, learns the features' relative importance, and ranks the database scans according to the relative importance measure. The main advantages of our method are that it fully automates the end-to-end querying process, that it uses simple and efficient techniques that are scalable to large datasets, and that it produces quality retrieval results using an unannotated CT scan. Our experimental results on 9 CT queries on a dataset of 41 volumetric CT scans from the 2014 Image CLEF Liver Annotation Task yield an average retrieval accuracy (Normalized Discounted Cumulative Gain index) of 0.77 and 0.84 without/with annotation, respectively. Fully automatic end-to-end retrieval of similar cases based on image information alone, rather that on disease diagnosis, may help radiologists to better diagnose liver lesions.

  8. Directory of selected forestry-related bibliographic data bases

    Treesearch

    Peter A. Evans

    1979-01-01

    This compilation lists 117 bibliographic data bases maintained by scientists of the Forest Service, U.S. Department of Agriculture. For each data base, the following information is provided; name of the data base; originator; date started; coverage by subject; geographic area, and size of collection; base format; retrieval format; ways to query; who to query; and...

  9. Cumulative query method for influenza surveillance using search engine data.

    PubMed

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  10. Automatically finding relevant citations for clinical guideline development.

    PubMed

    Bui, Duy Duc An; Jonnalagadda, Siddhartha; Del Fiol, Guilherme

    2015-10-01

    Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p<0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall@k +21.1%, p<0.001), PubMed's rank by "relevance" (average precision +6.1%, recall@k +14.8%, p<0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall@k +4.2%, p<0.001). Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Video-based face recognition via convolutional neural networks

    NASA Astrophysics Data System (ADS)

    Bao, Tianlong; Ding, Chunhui; Karmoshi, Saleem; Zhu, Ming

    2017-06-01

    Face recognition has been widely studied recently while video-based face recognition still remains a challenging task because of the low quality and large intra-class variation of video captured face images. In this paper, we focus on two scenarios of video-based face recognition: 1)Still-to-Video(S2V) face recognition, i.e., querying a still face image against a gallery of video sequences; 2)Video-to-Still(V2S) face recognition, in contrast to S2V scenario. A novel method was proposed in this paper to transfer still and video face images to an Euclidean space by a carefully designed convolutional neural network, then Euclidean metrics are used to measure the distance between still and video images. Identities of still and video images that group as pairs are used as supervision. In the training stage, a joint loss function that measures the Euclidean distance between the predicted features of training pairs and expanding vectors of still images is optimized to minimize the intra-class variation while the inter-class variation is guaranteed due to the large margin of still images. Transferred features are finally learned via the designed convolutional neural network. Experiments are performed on COX face dataset. Experimental results show that our method achieves reliable performance compared with other state-of-the-art methods.

  12. Secure Nearest Neighbor Query on Crowd-Sensing Data

    PubMed Central

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-01-01

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253

  13. Secure Nearest Neighbor Query on Crowd-Sensing Data.

    PubMed

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-09-22

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  14. Ontology-based geospatial data query and integration

    USGS Publications Warehouse

    Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

    2008-01-01

    Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.

  15. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services.

    PubMed

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.

  16. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services

    PubMed Central

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider. PMID:27571421

  17. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  18. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  19. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-08-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

  20. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-01-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650

  1. Earth-Base: A Free And Open Source, RESTful Earth Sciences Platform

    NASA Astrophysics Data System (ADS)

    Kishor, P.; Heim, N. A.; Peters, S. E.; McClennen, M.

    2012-12-01

    This presentation describes the motivation, concept, and architecture behind Earth-Base, a web-based, RESTful data-management, analysis and visualization platform for earth sciences data. Traditionally web applications have been built directly accessing data from a database using a scripting language. While such applications are great at bring results to a wide audience, they are limited in scope to the imagination and capabilities of the application developer. Earth-Base decouples the data store from the web application by introducing an intermediate "data application" tier. The data application's job is to query the data store using self-documented, RESTful URIs, and send the results back formatted as JavaScript Object Notation (JSON). Decoupling the data store from the application allows virtually limitless flexibility in developing applications, both web-based for human consumption or programmatic for machine consumption. It also allows outside developers to use the data in their own applications, potentially creating applications that the original data creator and app developer may not have even thought of. Standardized specifications for URI-based querying and JSON-formatted results make querying and developing applications easy. URI-based querying also allows utilizing distributed datasets easily. Companion mechanisms for querying data snapshots aka time-travel, usage tracking and license management, and verification of semantic equivalence of data are also described. The latter promotes the "What You Expect Is What You Get" (WYEIWYG) principle that can aid in data citation and verification.

  2. Irrelevance Reasoning in Knowledge Based Systems

    NASA Technical Reports Server (NTRS)

    Levy, A. Y.

    1993-01-01

    This dissertation considers the problem of reasoning about irrelevance of knowledge in a principled and efficient manner. Specifically, it is concerned with two key problems: (1) developing algorithms for automatically deciding what parts of a knowledge base are irrelevant to a query and (2) the utility of relevance reasoning. The dissertation describes a novel tool, the query-tree, for reasoning about irrelevance. Based on the query-tree, we develop several algorithms for deciding what formulas are irrelevant to a query. Our general framework sheds new light on the problem of detecting independence of queries from updates. We present new results that significantly extend previous work in this area. The framework also provides a setting in which to investigate the connection between the notion of irrelevance and the creation of abstractions. We propose a new approach to research on reasoning with abstractions, in which we investigate the properties of an abstraction by considering the irrelevance claims on which it is based. We demonstrate the potential of the approach for the cases of abstraction of predicates and projection of predicate arguments. Finally, we describe an application of relevance reasoning to the domain of modeling physical devices.

  3. The role of organizational research in implementing evidence-based practice: QUERI Series

    PubMed Central

    Yano, Elizabeth M

    2008-01-01

    Background Health care organizations exert significant influence on the manner in which clinicians practice and the processes and outcomes of care that patients experience. A greater understanding of the organizational milieu into which innovations will be introduced, as well as the organizational factors that are likely to foster or hinder the adoption and use of new technologies, care arrangements and quality improvement (QI) strategies are central to the effective implementation of research into practice. Unfortunately, much implementation research seems to not recognize or adequately address the influence and importance of organizations. Using examples from the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI), we describe the role of organizational research in advancing the implementation of evidence-based practice into routine care settings. Methods Using the six-step QUERI process as a foundation, we present an organizational research framework designed to improve and accelerate the implementation of evidence-based practice into routine care. Specific QUERI-related organizational research applications are reviewed, with discussion of the measures and methods used to apply them. We describe these applications in the context of a continuum of organizational research activities to be conducted before, during and after implementation. Results Since QUERI's inception, various approaches to organizational research have been employed to foster progress through QUERI's six-step process. We report on how explicit integration of the evaluation of organizational factors into QUERI planning has informed the design of more effective care delivery system interventions and enabled their improved "fit" to individual VA facilities or practices. We examine the value and challenges in conducting organizational research, and briefly describe the contributions of organizational theory and environmental context to the research framework. Conclusion Understanding the organizational context of delivering evidence-based practice is a critical adjunct to efforts to systematically improve quality. Given the size and diversity of VA practices, coupled with unique organizational data sources, QUERI is well-positioned to make valuable contributions to the field of implementation science. More explicit accommodation of organizational inquiry into implementation research agendas has helped QUERI researchers to better frame and extend their work as they move toward regional and national spread activities. PMID:18510749

  4. DBPQL: A view-oriented query language for the Intel Data Base Processor

    NASA Technical Reports Server (NTRS)

    Fishwick, P. A.

    1983-01-01

    An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.

  5. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

    PubMed Central

    2012-01-01

    Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org. PMID:22958570

  6. The Effectiveness of the Game-Based Learning System for the Improvement of American Sign Language Using Kinect

    ERIC Educational Resources Information Center

    Kamnardsiri, Teerawat; Hongsit, Ler-on; Khuwuthyakorn, Pattaraporn; Wongta, Noppon

    2017-01-01

    This paper investigated students' achievement for learning American Sign Language (ASL), using two different methods. There were two groups of samples. The first experimental group (Group A) was the game-based learning for ASL, using Kinect. The second control learning group (Group B) was the traditional face-to-face learning method, generally…

  7. A comparison of problem-based learning and conventional teaching in nursing ethics education.

    PubMed

    Lin, Chiou-Fen; Lu, Meei-Shiow; Chung, Chun-Chih; Yang, Che-Ming

    2010-05-01

    The aim of this study was to compare the learning effectiveness of peer tutored problem-based learning and conventional teaching of nursing ethics in Taiwan. The study adopted an experimental design. The peer tutored problem-based learning method was applied to an experimental group and the conventional teaching method to a control group. The study sample consisted of 142 senior nursing students who were randomly assigned to the two groups. All the students were tested for their nursing ethical discrimination ability both before and after the educational intervention. A learning satisfaction survey was also administered to both groups at the end of each course. After the intervention, both groups showed a significant increase in ethical discrimination ability. There was a statistically significant difference between the ethical discrimination scores of the two groups (P < 0.05), with the experimental group on average scoring higher than the control group. There were significant differences in satisfaction with self-motivated learning and critical thinking between the groups. Peer tutored problem-based learning and lecture-type conventional teaching were both effective for nursing ethics education, but problem-based learning was shown to be more effective. Peer tutored problem-based learning has the potential to enhance the efficacy of teaching nursing ethics in situations in which there are personnel and resource constraints.

  8. Quantum Private Query Based on Bell State and Single Photons

    NASA Astrophysics Data System (ADS)

    Gao, Xiang; Chang, Yan; Zhang, Shi-Bin; Yang, Fan; Zhang, Yan

    2018-03-01

    Quantum private query (QPQ) can protect both user's and database holder's privacy. In this paper, we propose a novel quantum private query protocol based on Bell state and single photons. As far as we know, no one has ever proposed the QPQ based on Bell state. By using the decoherence-free (DF) states, our protocol can resist the collective noise. Besides that, our protocol is a one-way quantum protocol, which can resist the Trojan horse attack and reduce the communication complexity. Our protocol can not only guarantee the participants' privacy but also stand against an external eavesdropper.

  9. Content-based retrieval of historical Ottoman documents stored as textual images.

    PubMed

    Saykol, Ediz; Sinop, Ali Kemal; Güdükbay, Ugur; Ulusoy, Ozgür; Cetin, A Enis

    2004-03-01

    There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.

  10. Random and Directed Walk-Based Top-k Queries in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    In wireless sensor networks, filter-based top-k query approaches are the state-of-the-art solutions and have been extensively researched in the literature, however, they are very sensitive to the network parameters, including the size of the network, dynamics of the sensors’ readings and declines in the overall range of all the readings. In this work, a random walk-based top-k query approach called RWTQ and a directed walk-based top-k query approach called DWTQ are proposed. At the beginning of a top-k query, one or several tokens are sent to the specific node(s) in the network by the base station. Then, each token walks in the network independently to record and process the readings in a random or directed way. A strategy of choosing the “right” way in DWTQ is carefully designed for the token(s) to arrive at the high-value regions as soon as possible. When designing the walking strategy for DWTQ, the spatial correlations of the readings are also considered. Theoretical analysis and simulation results indicate that RWTQ and DWTQ both are very robust against these parameters discussed previously. In addition, DWTQ outperforms TAG, FILA and EXTOK in transmission cost, energy consumption and network lifetime. PMID:26016914

  11. Defining Service and Education in Pediatrics.

    PubMed

    Boyer, Debra; Gagne, Josh; Kesselheim, Jennifer C

    2017-11-01

    Program directors (PDs) and trainees are often queried regarding the balance of service and education during pediatric residency training. We aimed to use qualitative methods to learn how pediatric residents and PDs define service and education and to identify activities that exemplify these concepts. Focus groups of pediatric residents and PDs were performed and the data qualitatively analyzed. Thematic analysis revealed 4 themes from focus group data: (1) misalignment of the perceived definition of service; (2) agreement about the definition of education; (3) overlapping perceptions of the value of service to training; and (4) additional suggestions for improved integration of education and service. Pediatric residents hold positive definitions of service and believe that service adds value to their education. Importantly, the discovery of heterogeneous definitions of service between pediatric residents and PDs warrants further investigation and may have ramifications for Accreditation Council for Graduate Medical Education and those responsible for residency curricula.

  12. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

    PubMed

    Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

    2017-11-05

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.

  13. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

    PubMed Central

    Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

    2017-01-01

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073

  14. SEQUOIA: significance enhanced network querying through context-sensitive random walk and minimization of network conductance.

    PubMed

    Jeong, Hyundoo; Yoon, Byung-Jun

    2017-03-14

    Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .

  15. Permanent Base Groups in Broadcast Journalism Instruction.

    ERIC Educational Resources Information Center

    Boldoc, William J.

    An instructor increases student performance, participation, and motivation in broadcast journalism and other media courses through the base group cooperative learning model. Base groups are a specific form of cooperative learning which enables students to become actively involved in small-group discussions and learning with a permanent group of…

  16. Social media and the classroom?

    NASA Astrophysics Data System (ADS)

    2015-02-01

    Many years ago, I learned (through eavesdropping on a conversation during lab) that my students had set up their own Facebook group. They told me they were using it to help each other with homework assignments. This year, my daughter took physics at a university. She and her friends were struggling a bit with the online quizzes. I suggested that she set up a Facebook community and add me as a member. I would answer questions and help the group study. My daughter's group used Facebook to get answers to specific questions from the quizzes. They often ended up helping each other because the questions were posed quite late in the evening. Questions ranged from exact copies of the original queries to "Does anyone know what equation to use for this?" I began to think that, although their grades were improving on the quizzes, they were not gaining any content knowledge. To combat this, I made and posted a few short video clips reteaching the content.

  17. Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters.

    PubMed

    Lokker, Cynthia; Haynes, R Brian; Wilczynski, Nancy L; McKibbon, K Ann; Walter, Stephen D

    2011-01-01

    Clinical Queries filters were developed to improve the retrieval of high-quality studies in searches on clinical matters. The study objective was to determine the yield of relevant citations and physician satisfaction while searching for diagnostic and treatment studies using the Clinical Queries page of PubMed compared with searching PubMed without these filters. Forty practicing physicians, presented with standardized treatment and diagnosis questions and one question of their choosing, entered search terms which were processed in a random, blinded fashion through PubMed alone and PubMed Clinical Queries. Participants rated search retrievals for applicability to the question at hand and satisfaction. For treatment, the primary outcome of retrieval of relevant articles was not significantly different between the groups, but a higher proportion of articles from the Clinical Queries searches met methodologic criteria (p=0.049), and more articles were published in core internal medicine journals (p=0.056). For diagnosis, the filtered results returned more relevant articles (p=0.031) and fewer irrelevant articles (overall retrieval less, p=0.023); participants needed to screen fewer articles before arriving at the first relevant citation (p<0.05). Relevance was also influenced by content terms used by participants in searching. Participants varied greatly in their search performance. Clinical Queries filtered searches returned more high-quality studies, though the retrieval of relevant articles was only statistically different between the groups for diagnosis questions. Retrieving clinically important research studies from Medline is a challenging task for physicians. Methodological search filters can improve search retrieval.

  18. A survey of context recognition in surgery.

    PubMed

    Pernek, Igor; Ferscha, Alois

    2017-10-01

    With the introduction of operating rooms of the future context awareness has gained importance in the surgical environment. This paper organizes and reviews different approaches for recognition of context in surgery. Major electronic research databases were queried to obtain relevant publications submitted between the years 2010 and 2015. Three different types of context were identified: (i) the surgical workflow context, (ii) surgeon's cognitive and (iii) technical state context. A total of 52 relevant studies were identified and grouped based on the type of context detected and sensors used. Different approaches were summarized to provide recommendations for future research. There is still room for improvement in terms of methods used and evaluations performed. Machine learning should be used more extensively to uncover hidden relationships between different properties of the surgeon's state, particularly when performing cognitive context recognition. Furthermore, validation protocols should be improved by performing more evaluations in situ and with a higher number of unique participants. The paper also provides a structured outline of recent context recognition methods to facilitate development of new generation context-aware surgical support systems.

  19. Successful Group Work: Using Cooperative Learning and Team-Based Learning in the Classroom

    ERIC Educational Resources Information Center

    Grant-Vallone, E. J.

    2011-01-01

    This research study examined student perceptions of group experiences in the classroom. The author used cooperative learning and team-based learning to focus on three characteristics that are critical for the success of groups: structure of activities, relationships of group members, and accountability of group members. Results indicated that…

  20. Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

    PubMed Central

    Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579

  1. Constraint-based Data Mining

    NASA Astrophysics Data System (ADS)

    Boulicaut, Jean-Francois; Jeudy, Baptiste

    Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declaratively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

  2. Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

    PubMed

    Zhong, Cheng; Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.

  3. Profile-IQ: Web-based data query system for local health department infrastructure and activities.

    PubMed

    Shah, Gulzar H; Leep, Carolyn J; Alexander, Dayna

    2014-01-01

    To demonstrate the use of National Association of County & City Health Officials' Profile-IQ, a Web-based data query system, and how policy makers, researchers, the general public, and public health professionals can use the system to generate descriptive statistics on local health departments. This article is a descriptive account of an important health informatics tool based on information from the project charter for Profile-IQ and the authors' experience and knowledge in design and use of this query system. Profile-IQ is a Web-based data query system that is based on open-source software: MySQL 5.5, Google Web Toolkit 2.2.0, Apache Commons Math library, Google Chart API, and Tomcat 6.0 Web server deployed on an Amazon EC2 server. It supports dynamic queries of National Profile of Local Health Departments data on local health department finances, workforce, and activities. Profile-IQ's customizable queries provide a variety of statistics not available in published reports and support the growing information needs of users who do not wish to work directly with data files for lack of staff skills or time, or to avoid a data use agreement. Profile-IQ also meets the growing demand of public health practitioners and policy makers for data to support quality improvement, community health assessment, and other processes associated with voluntary public health accreditation. It represents a step forward in the recent health informatics movement of data liberation and use of open source information technology solutions to promote public health.

  4. An Evaluation of the Interactive Query Expansion in an Online Library Catalogue with a Graphical User Interface.

    ERIC Educational Resources Information Center

    Hancock-Beaulieu, Micheline; And Others

    1995-01-01

    An online library catalog was used to evaluate an interactive query expansion facility based on relevance feedback for the Okapi, probabilistic, term weighting, retrieval system. A graphical user interface allowed searchers to select candidate terms extracted from relevant retrieved items to reformulate queries. Results suggested that the…

  5. A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805

    ERIC Educational Resources Information Center

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2011-01-01

    Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…

  6. Developing a kidney and urinary pathway knowledge base

    PubMed Central

    2011-01-01

    Background Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. Results We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. Conclusions The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain’s ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. Availability The KUPKB may be accessed via http://www.e-lico.eu/kupkb. PMID:21624162

  7. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  8. Geographic Video 3d Data Model And Retrieval

    NASA Astrophysics Data System (ADS)

    Han, Z.; Cui, C.; Kong, Y.; Wu, H.

    2014-04-01

    Geographic video includes both spatial and temporal geographic features acquired through ground-based or non-ground-based cameras. With the popularity of video capture devices such as smartphones, the volume of user-generated geographic video clips has grown significantly and the trend of this growth is quickly accelerating. Such a massive and increasing volume poses a major challenge to efficient video management and query. Most of the today's video management and query techniques are based on signal level content extraction. They are not able to fully utilize the geographic information of the videos. This paper aimed to introduce a geographic video 3D data model based on spatial information. The main idea of the model is to utilize the location, trajectory and azimuth information acquired by sensors such as GPS receivers and 3D electronic compasses in conjunction with video contents. The raw spatial information is synthesized to point, line, polygon and solid according to the camcorder parameters such as focal length and angle of view. With the video segment and video frame, we defined the three categories geometry object using the geometry model of OGC Simple Features Specification for SQL. We can query video through computing the spatial relation between query objects and three categories geometry object such as VFLocation, VSTrajectory, VSFOView and VFFovCone etc. We designed the query methods using the structured query language (SQL) in detail. The experiment indicate that the model is a multiple objective, integration, loosely coupled, flexible and extensible data model for the management of geographic stereo video.

  9. Using search engine query data to track pharmaceutical utilization: a study of statins.

    PubMed

    Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F

    2010-08-01

    To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.

  10. QBIC project: querying images by content, using color, texture, and shape

    NASA Astrophysics Data System (ADS)

    Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel

    1993-04-01

    In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.

  11. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    NASA Astrophysics Data System (ADS)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  12. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

    PubMed Central

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2016-01-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325

  13. Group-Based Active Learning of Classification Models.

    PubMed

    Luo, Zhipeng; Hauskrecht, Milos

    2017-05-01

    Learning of classification models from real-world data often requires additional human expert effort to annotate the data. However, this process can be rather costly and finding ways of reducing the human annotation effort is critical for this task. The objective of this paper is to develop and study new ways of providing human feedback for efficient learning of classification models by labeling groups of examples. Briefly, unlike traditional active learning methods that seek feedback on individual examples, we develop a new group-based active learning framework that solicits label information on groups of multiple examples. In order to describe groups in a user-friendly way, conjunctive patterns are used to compactly represent groups. Our empirical study on 12 UCI data sets demonstrates the advantages and superiority of our approach over both classic instance-based active learning work, as well as existing group-based active-learning methods.

  14. Discriminative Multi-View Interactive Image Re-Ranking.

    PubMed

    Li, Jun; Xu, Chang; Yang, Wankou; Sun, Changyin; Tao, Dacheng

    2017-07-01

    Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multi-view interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users' intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities. In addition, a discriminatively learned weight vector is obtained to reassign updated scores and target images for re-ranking. Compared with other multi-view learning techniques, our scheme not only generates a compact representation in the latent space from the redundant multi-view features but also maximally preserves the discriminative information in feature encoding by the large-margin principle. Furthermore, the generalization error bound of the proposed algorithm is theoretically analyzed and shown to be improved by the interactions between the latent space and discriminant function learning. Experimental results on two benchmark data sets demonstrate that our approach boosts baseline retrieval quality and is competitive with the other state-of-the-art re-ranking strategies.

  15. Remote file inquiry (RFI) system

    NASA Technical Reports Server (NTRS)

    1975-01-01

    System interrogates and maintains user-definable data files from remote terminals, using English-like, free-form query language easily learned by persons not proficient in computer programming. System operates in asynchronous mode, allowing any number of inquiries within limitation of available core to be active concurrently.

  16. Implementation of CUAHSI-HIS Community Project Components in a Local Observatory

    NASA Astrophysics Data System (ADS)

    Muste, M.; Arnold, N.; Kim, D.

    2008-12-01

    The deployment of the eleven WATERS Network local observatories using CUAHSI-HIS project products showed that water observations data collected by academic investigators could be stored, published on the Internet, federated with water observations data published by water agencies, and searched using a concept framework that connects with variables in each individual data source. For many within the water resources community, the CUAHSI-HIS community project represents a new opportunity to approach the management, publication, and analysis of their data systematically - i.e., moving from collections of ASCII text or spreadsheet files to relational data models. This research describes the initial efforts carried out by a University of Iowa research group during the component implementation of a hydrologic community project in a local CI-based digital watershed (DW). The goal was to test what types of data query the DW can handle and see how it performs in use cases where data streams are coupled with models for continuous forecasting. This paper also discusses the general context for the DW development and summarizes the lessons learned by the group during this initial developmental stage. Given the uniform and scalable nature of the community project components, it is expected that the workflows presented herein are transferable to other users and other watersheds.

  17. Regular paths in SparQL: querying the NCI Thesaurus.

    PubMed

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  18. Innovations in individual feature history management - The significance of feature-based temporal model

    USGS Publications Warehouse

    Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

    2008-01-01

    A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.

  19. VisGets: coordinated visualizations for web-based information exploration and discovery.

    PubMed

    Dörk, Marian; Carpendale, Sheelagh; Collins, Christopher; Williamson, Carey

    2008-01-01

    In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets--interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.

  20. Using AberOWL for fast and scalable reasoning over BioPortal ontologies.

    PubMed

    Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert

    2016-08-08

    Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.

  1. Visual analytics for semantic queries of TerraSAR-X image content

    NASA Astrophysics Data System (ADS)

    Espinoza-Molina, Daniela; Alonso, Kevin; Datcu, Mihai

    2015-10-01

    With the continuous image product acquisition of satellite missions, the size of the image archives is considerably increasing every day as well as the variety and complexity of their content, surpassing the end-user capacity to analyse and exploit them. Advances in the image retrieval field have contributed to the development of tools for interactive exploration and extraction of the images from huge archives using different parameters like metadata, key-words, and basic image descriptors. Even though we count on more powerful tools for automated image retrieval and data analysis, we still face the problem of understanding and analyzing the results. Thus, a systematic computational analysis of these results is required in order to provide to the end-user a summary of the archive content in comprehensible terms. In this context, visual analytics combines automated analysis with interactive visualizations analysis techniques for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Moreover, currently several researches are focused on associating the content of the images with semantic definitions for describing the data in a format to be easily understood by the end-user. In this paper, we present our approach for computing visual analytics and semantically querying the TerraSAR-X archive. Our approach is mainly composed of four steps: 1) the generation of a data model that explains the information contained in a TerraSAR-X product. The model is formed by primitive descriptors and metadata entries, 2) the storage of this model in a database system, 3) the semantic definition of the image content based on machine learning algorithms and relevance feedback, and 4) querying the image archive using semantic descriptors as query parameters and computing the statistical analysis of the query results. The experimental results shows that with the help of visual analytics and semantic definitions we are able to explain the image content using semantic terms and the relations between them answering questions such as what is the percentage of urban area in a region? or what is the distribution of water bodies in a city?

  2. Smart Point Cloud: Definition and Remaining Challenges

    NASA Astrophysics Data System (ADS)

    Poux, F.; Hallot, P.; Neuville, R.; Billen, R.

    2016-10-01

    Dealing with coloured point cloud acquired from terrestrial laser scanner, this paper identifies remaining challenges for a new data structure: the smart point cloud. This concept arises with the statement that massive and discretized spatial information from active remote sensing technology is often underused due to data mining limitations. The generalisation of point cloud data associated with the heterogeneity and temporality of such datasets is the main issue regarding structure, segmentation, classification, and interaction for an immediate understanding. We propose to use both point cloud properties and human knowledge through machine learning to rapidly extract pertinent information, using user-centered information (smart data) rather than raw data. A review of feature detection, machine learning frameworks and database systems indexed both for mining queries and data visualisation is studied. Based on existing approaches, we propose a new 3-block flexible framework around device expertise, analytic expertise and domain base reflexion. This contribution serves as the first step for the realisation of a comprehensive smart point cloud data structure.

  3. Sleep-wake time perception varies by direct or indirect query.

    PubMed

    Alameddine, Y; Ellenbogen, J M; Bianchi, M T

    2015-01-15

    The diagnosis of insomnia rests on self-report of difficulty initiating or maintaining sleep. However, subjective reports may be unreliable, and possibly may vary by the method of inquiry. We investigated this possibility by comparing within-individual response to direct versus indirect time queries after overnight polysomnography. We obtained self-reported sleep-wake times via morning questionnaires in 879 consecutive adult diagnostic polysomnograms. Responses were compared within subjects (direct versus indirect query) and across groups defined by apnea-hypopnea index and by self-reported insomnia symptoms in pre-sleep questionnaires. Direct queries required a time duration response, while indirect queries required clock times from which we calculated time durations. Direct and indirect queries of sleep latency were the same in only 41% of cases, and total sleep time queries matched in only 5.4%. For both latency and total sleep, the most common discrepancy involved the indirect value being larger than the direct response. The discrepancy between direct and indirect queries was not related to objective sleep metrics. The degree of discrepancy was not related to the presence of insomnia symptoms, although patients reporting insomnia symptoms showed underestimation of total sleep duration by direct response. Self-reported sleep latency and total sleep time are often internally inconsistent when comparing direct and indirect survey queries of each measure. These discrepancies represent substantive challenges to effective clinical practice, particularly when diagnosis and management depends on self-reported sleep patterns, as with insomnia. Although self-reported sleep-wake times remains fundamental to clinical practice, objective measures provide clinically relevant adjunctive information. © 2015 American Academy of Sleep Medicine.

  4. Robust hashing with local models for approximate similarity search.

    PubMed

    Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang

    2014-07-01

    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

  5. Self-adaptive relevance feedback based on multilevel image content analysis

    NASA Astrophysics Data System (ADS)

    Gao, Yongying; Zhang, Yujin; Fu, Yu

    2001-01-01

    In current content-based image retrieval systems, it is generally accepted that obtaining high-level image features is a key to improve the querying. Among the related techniques, relevance feedback has become a hot research aspect because it combines the information from the user to refine the querying results. In practice, many methods have been proposed to achieve the goal of relevance feedback. In this paper, a new scheme for relevance feedback is proposed. Unlike previous methods for relevance feedback, our scheme provides a self-adaptive operation. First, based on multi- level image content analysis, the relevant images from the user could be automatically analyzed in different levels and the querying could be modified in terms of different analysis results. Secondly, to make it more convenient to the user, the procedure of relevance feedback could be led with memory or without memory. To test the performance of the proposed method, a practical semantic-based image retrieval system has been established, and the querying results gained by our self-adaptive relevance feedback are given.

  6. Self-adaptive relevance feedback based on multilevel image content analysis

    NASA Astrophysics Data System (ADS)

    Gao, Yongying; Zhang, Yujin; Fu, Yu

    2000-12-01

    In current content-based image retrieval systems, it is generally accepted that obtaining high-level image features is a key to improve the querying. Among the related techniques, relevance feedback has become a hot research aspect because it combines the information from the user to refine the querying results. In practice, many methods have been proposed to achieve the goal of relevance feedback. In this paper, a new scheme for relevance feedback is proposed. Unlike previous methods for relevance feedback, our scheme provides a self-adaptive operation. First, based on multi- level image content analysis, the relevant images from the user could be automatically analyzed in different levels and the querying could be modified in terms of different analysis results. Secondly, to make it more convenient to the user, the procedure of relevance feedback could be led with memory or without memory. To test the performance of the proposed method, a practical semantic-based image retrieval system has been established, and the querying results gained by our self-adaptive relevance feedback are given.

  7. 37: COMPARISON OF TWO METHODS: TBL-BASED AND LECTURE-BASED LEARNING IN NURSING CARE OF PATIENTS WITH DIABETES IN NURSING STUDENTS

    PubMed Central

    Khodaveisi, Masoud; Qaderian, Khosro; Oshvandi, Khodayar; Soltanian, Ali Reza; Vardanjani, Mehdi molavi

    2017-01-01

    Background and aims learning plays an important role in developing nursing skills and right care-taking. The Present study aims to evaluate two learning methods based on team –based learning and lecture-based learning in learning care-taking of patients with diabetes in nursing students. Method In this quasi-experimental study, 64 students in term 4 in nursing college of Bukan and Miandoab were included in the study based on knowledge and performance questionnaire including 15 questions based on knowledge and 5 questions based on performance on care-taking in patients with diabetes were used as data collection tool whose reliability was confirmed by cronbach alpha (r=0.83) by the researcher. To compare the mean score of knowledge and performance in each group in pre-test step and post-test step, pair –t test and to compare mean of scores in two groups of control and intervention, the independent t- test was used. Results There was not significant statistical difference between two groups in pre terms of knowledge and performance score (p=0.784). There was significant difference between the mean of knowledge scores and diabetes performance in the post-test in the team-based learning group and lecture-based learning group (p=0.001). There was significant difference between the mean score of knowledge of diabetes care in pre-test and post-test in base learning groups (p=0.001). Conclusion In both methods team-based and lecture-based learning approaches resulted in improvement in learning in students, but the rate of learning in the team-based learning approach is greater compared to that of lecture-based learning and it is recommended that this method be used as a higher education method in the education of students.

  8. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  9. Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags

    NASA Astrophysics Data System (ADS)

    Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji

    We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.

  10. Web-Based Learning in a Geometry Course

    ERIC Educational Resources Information Center

    Chan, Hsungrow; Tsai, Pengheng; Huang, Tien-Yu

    2006-01-01

    This study concerns applying Web-based learning with learner controlled instructional materials in a geometry course. The experimental group learned in a Web-based learning environment, and the control group learned in a classroom. We observed that the learning method accounted for a total variation in learning effect of 19.1% in the 3rd grade and…

  11. A Probabilistic Feature Map-Based Localization System Using a Monocular Camera.

    PubMed

    Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun

    2015-08-31

    Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments.

  12. A Probabilistic Feature Map-Based Localization System Using a Monocular Camera

    PubMed Central

    Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun

    2015-01-01

    Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments. PMID:26404284

  13. Content-based image retrieval on mobile devices

    NASA Astrophysics Data System (ADS)

    Ahmad, Iftikhar; Abdullah, Shafaq; Kiranyaz, Serkan; Gabbouj, Moncef

    2005-03-01

    Content-based image retrieval area possesses a tremendous potential for exploration and utilization equally for researchers and people in industry due to its promising results. Expeditious retrieval of desired images requires indexing of the content in large-scale databases along with extraction of low-level features based on the content of these images. With the recent advances in wireless communication technology and availability of multimedia capable phones it has become vital to enable query operation in image databases and retrieve results based on the image content. In this paper we present a content-based image retrieval system for mobile platforms, providing the capability of content-based query to any mobile device that supports Java platform. The system consists of light-weight client application running on a Java enabled device and a server containing a servlet running inside a Java enabled web server. The server responds to image query using efficient native code from selected image database. The client application, running on a mobile phone, is able to initiate a query request, which is handled by a servlet in the server for finding closest match to the queried image. The retrieved results are transmitted over mobile network and images are displayed on the mobile phone. We conclude that such system serves as a basis of content-based information retrieval on wireless devices and needs to cope up with factors such as constraints on hand-held devices and reduced network bandwidth available in mobile environments.

  14. The Impact of Team-Based Learning on Nervous System Examination Knowledge of Nursing Students.

    PubMed

    Hemmati Maslakpak, Masomeh; Parizad, Naser; Zareie, Farzad

    2015-12-01

    Team-based learning is one of the active learning approaches in which independent learning is combined with small group discussion in the class. This study aimed to determine the impact of team-based learning in nervous system examination knowledge of nursing students. This quasi-experimental study was conducted on 3(rd) grade nursing students, including 5th semester (intervention group) and 6(th) semester (control group). The traditional lecture method and the team-based learning method were used for educating the examination of the nervous system for intervention and control groups, respectively. The data were collected by a test covering 40-questions (multiple choice, matching, gap-filling and descriptive questions) before and after intervention in both groups. Individual Readiness Assurance Test (RAT) and Group Readiness Assurance Test (GRAT) used to collect data in the intervention group. In the end, the collected data were analyzed by SPSS ver. 13 using descriptive and inferential statistical tests. In team-based learning group, mean and standard deviation was 13.39 (4.52) before the intervention, which had been increased to 31.07 (3.20) after the intervention and this increase was statistically significant. Also, there was a statistically significant difference between the scores of RAT and GRAT in team-based learning group. Using team-based learning approach resulted in much better improvement and stability in the nervous system examination knowledge of nursing students compared to traditional lecture method; therefore, this method could be efficiently used as an effective educational approach in nursing education.

  15. Architecture for knowledge-based and federated search of online clinical evidence.

    PubMed

    Coiera, Enrico; Walther, Martin; Nguyen, Ken; Lovell, Nigel H

    2005-10-24

    It is increasingly difficult for clinicians to keep up-to-date with the rapidly growing biomedical literature. Online evidence retrieval methods are now seen as a core tool to support evidence-based health practice. However, standard search engine technology is not designed to manage the many different types of evidence sources that are available or to handle the very different information needs of various clinical groups, who often work in widely different settings. The objectives of this paper are (1) to describe the design considerations and system architecture of a wrapper-mediator approach to federate search system design, including the use of knowledge-based, meta-search filters, and (2) to analyze the implications of system design choices on performance measurements. A trial was performed to evaluate the technical performance of a federated evidence retrieval system, which provided access to eight distinct online resources, including e-journals, PubMed, and electronic guidelines. The Quick Clinical system architecture utilized a universal query language to reformulate queries internally and utilized meta-search filters to optimize search strategies across resources. We recruited 227 family physicians from across Australia who used the system to retrieve evidence in a routine clinical setting over a 4-week period. The total search time for a query was recorded, along with the duration of individual queries sent to different online resources. Clinicians performed 1662 searches over the trial. The average search duration was 4.9 +/- 3.2 s (N = 1662 searches). Mean search duration to the individual sources was between 0.05 s and 4.55 s. Average system time (ie, system overhead) was 0.12 s. The relatively small system overhead compared to the average time it takes to perform a search for an individual source shows that the system achieves a good trade-off between performance and reliability. Furthermore, despite the additional effort required to incorporate the capabilities of each individual source (to improve the quality of search results), system maintenance requires only a small additional overhead.

  16. Private database queries based on counterfactual quantum key distribution

    NASA Astrophysics Data System (ADS)

    Zhang, Jia-Li; Guo, Fen-Zhuo; Gao, Fei; Liu, Bin; Wen, Qiao-Yan

    2013-08-01

    Based on the fundamental concept of quantum counterfactuality, we propose a protocol to achieve quantum private database queries, which is a theoretical study of how counterfactuality can be employed beyond counterfactual quantum key distribution (QKD). By adding crucial detecting apparatus to the device of QKD, the privacy of both the distrustful user and the database owner can be guaranteed. Furthermore, the proposed private-database-query protocol makes full use of the low efficiency in the counterfactual QKD, and by adjusting the relevant parameters, the protocol obtains excellent flexibility and extensibility.

  17. A Hybrid Spatio-Temporal Data Indexing Method for Trajectory Databases

    PubMed Central

    Ke, Shengnan; Gong, Jun; Li, Songnian; Zhu, Qing; Liu, Xintao; Zhang, Yeting

    2014-01-01

    In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type. PMID:25051028

  18. A hybrid spatio-temporal data indexing method for trajectory databases.

    PubMed

    Ke, Shengnan; Gong, Jun; Li, Songnian; Zhu, Qing; Liu, Xintao; Zhang, Yeting

    2014-07-21

    In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type.

  19. Web page sorting algorithm based on query keyword distance relation

    NASA Astrophysics Data System (ADS)

    Yang, Han; Cui, Hong Gang; Tang, Hao

    2017-08-01

    In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.

  20. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  1. Accessing the public MIMIC-II intensive care relational database for clinical research.

    PubMed

    Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G

    2013-01-10

    The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.

  2. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  3. Improve Biomedical Information Retrieval using Modified Learning to Rank Methods.

    PubMed

    Xu, Bo; Lin, Hongfei; Lin, Yuan; Ma, Yunlong; Yang, Liang; Wang, Jian; Yang, Zhihao

    2016-06-14

    In these years, the number of biomedical articles has increased exponentially, which becomes a problem for biologists to capture all the needed information manually. Information retrieval technologies, as the core of search engines, can deal with the problem automatically, providing users with the needed information. However, it is a great challenge to apply these technologies directly for biomedical retrieval, because of the abundance of domain specific terminologies. To enhance biomedical retrieval, we propose a novel framework based on learning to rank. Learning to rank is a series of state-of-the-art information retrieval techniques, and has been proved effective in many information retrieval tasks. In the proposed framework, we attempt to tackle the problem of the abundance of terminologies by constructing ranking models, which focus on not only retrieving the most relevant documents, but also diversifying the searching results to increase the completeness of the resulting list for a given query. In the model training, we propose two novel document labeling strategies, and combine several traditional retrieval models as learning features. Besides, we also investigate the usefulness of different learning to rank approaches in our framework. Experimental results on TREC Genomics datasets demonstrate the effectiveness of our framework for biomedical information retrieval.

  4. A semantic proteomics dashboard (SemPoD) for data management in translational research.

    PubMed

    Jayapandian, Catherine P; Zhao, Meng; Ewing, Rob M; Zhang, Guo-Qiang; Sahoo, Satya S

    2012-01-01

    One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research. The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system. SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.

  5. A Layered Searchable Encryption Scheme with Functional Components Independent of Encryption Methods

    PubMed Central

    Luo, Guangchun; Qin, Ke

    2014-01-01

    Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted server, which is especially suitable for protecting sensitive data in the cloud. However, various settings (based on symmetric or asymmetric encryption) and functionalities (ranked keyword query, range query, phrase query, etc.) are often realized by different methods with different searchable structures that are generally not compatible with each other, which limits the scope of application and hinders the functional extensions. We prove that asymmetric searchable structure could be converted to symmetric structure, and functions could be modeled separately apart from the core searchable structure. Based on this observation, we propose a layered searchable encryption (LSE) scheme, which provides compatibility, flexibility, and security for various settings and functionalities. In this scheme, the outputs of the core searchable component based on either symmetric or asymmetric setting are converted to some uniform mappings, which are then transmitted to loosely coupled functional components to further filter the results. In such a way, all functional components could directly support both symmetric and asymmetric settings. Based on LSE, we propose two representative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query (previously only available in asymmetric scheme). PMID:24719565

  6. Demonstration of quantum superiority in learning parity with noise with superconducting qubits

    NASA Astrophysics Data System (ADS)

    Ristè, Diego; da Silva, Marcus; Ryan, Colm; Cross, Andrew; Smolin, John; Gambetta, Jay; Chow, Jerry; Johnson, Blake

    A problem in machine learning is to identify the function programmed in an unknown device, or oracle, having only access to its output. In particular, a parity function computes the parity of a subset of a bit register. We implement an oracle executing parity functions in a five-qubit superconducting processor and compare the performance of a classical and a quantum learner. The classical learner reads the output of multiple oracle calls and uses the results to infer the hidden function. In addition to querying the oracle, the quantum learner can apply coherent rotations on the output register before the readout. We show that, given a target success probability, the quantum approach outperforms the classical one in the number of queries needed. Moreover, this gap increases with readout noise and with the size of the qubit register. This result shows that quantum advantage can already emerge in current systems with a few, noisy qubits. We acknowledge support from IARPA under Contract W911NF-10-1-0324.

  7. Pedagogy of Work-Based Learning: The Role of the Learning Group

    ERIC Educational Resources Information Center

    Siebert, Sabina; Mills, Vince; Tuff, Caroline

    2009-01-01

    Purpose: The aim of this paper is to evaluate the role of learning from participation in a group of work-based learners. Design/methodology/approach: This study relies on qualitative data obtained from a survey of perspectives of students on two work-based learning programmes. A group of 16 undergraduate and seven postgraduate students…

  8. OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.

    PubMed

    Naderi, Nona; Kappler, Thomas; Baker, Christopher J O; Witte, René

    2011-10-01

    Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. witte@semanticsoftware.info.

  9. Learning the preferences of physicians for the organization of result lists of medical evidence articles.

    PubMed

    O'Sullivan, D; Wilk, S; Michalowski, W; Slowinski, R; Thomas, R; Kadzinski, M; Farion, K

    2014-01-01

    Online medical knowledge repositories such as MEDLINE and The Cochrane Library are increasingly used by physicians to retrieve articles to aid with clinical decision making. The prevailing approach for organizing retrieved articles is in the form of a rank-ordered list, with the assumption that the higher an article is presented on a list, the more relevant it is. Despite this common list-based organization, it is seldom studied how physicians perceive the association between the relevance of articles and the order in which articles are presented. In this paper we describe a case study that captured physician preferences for 3-element lists of medical articles in order to learn how to organize medical knowledge for decision-making. Comprehensive relevance evaluations were developed to represent 3-element lists of hypothetical articles that may be retrieved from an online medical knowledge source such as MEDLINE or The Cochrane Library. Comprehensive relevance evaluations asses not only an article's relevance for a query, but also whether it has been placed on the correct list position. In other words an article may be relevant and correctly placed on a result list (e.g. the most relevant article appears first in the result list), an article may be relevant for a query but placed on an incorrect list position (e.g. the most relevant article appears second in a result list), or an article may be irrelevant for a query yet still appear in the result list. The relevance evaluations were presented to six senior physicians who were asked to express their preferences for an article's relevance and its position on a list by pairwise comparisons representing different combinations of 3-element lists. The elicited preferences were assessed using a novel GRIP (Generalized Regression with Intensities of Preference) method and represented as an additive value function. Value functions were derived for individual physicians as well as the group of physicians. The results show that physicians assign significant value to the 1st position on a list and they expect that the most relevant article is presented first. Whilst physicians still prefer obtaining a correctly placed article on position 2, they are also quite satisfied with misplaced relevant article. Low consideration of the 3rd position was uniformly confirmed. Our findings confirm the importance of placing the most relevant article on the 1st position on a list and the importance paid to position on a list significantly diminishes after the 2nd position. The derived value functions may be used by developers of clinical decision support applications to decide how best to organize medical knowledge for decision making and to create personalized evaluation measures that can augment typical measures used to evaluate information retrieval systems.

  10. LDRD final report :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brost, Randolph C.; McLendon, William Clarence,

    2013-01-01

    Modeling geospatial information with semantic graphs enables search for sites of interest based on relationships between features, without requiring strong a priori models of feature shape or other intrinsic properties. Geospatial semantic graphs can be constructed from raw sensor data with suitable preprocessing to obtain a discretized representation. This report describes initial work toward extending geospatial semantic graphs to include temporal information, and initial results applying semantic graph techniques to SAR image data. We describe an efficient graph structure that includes geospatial and temporal information, which is designed to support simultaneous spatial and temporal search queries. We also report amore » preliminary implementation of feature recognition, semantic graph modeling, and graph search based on input SAR data. The report concludes with lessons learned and suggestions for future improvements.« less

  11. The Effects of Preference for Information on Consumers’ Online Health Information Search Behavior

    PubMed Central

    2013-01-01

    Background Preference for information is a personality trait that affects people’s tendency to seek information in health-related situations. Prior studies have focused primarily on investigating its impact on patient-provider communication and on the implications for designing information interventions that prepare patients for medical procedures. Few studies have examined its impact on general consumers’ interactions with Web-based search engines for health information or the implications for designing more effective health information search systems. Objective This study intends to fill this gap by investigating the impact of preference for information on the search behavior of general consumers seeking health information, their perceptions of search tasks (representing information needs), and user experience with search systems. Methods Forty general consumers who had previously searched for health information online participated in the study in our usability lab. Preference for information was measured using Miller’s Monitor-Blunter Style Scale (MBSS) and the Krantz Health Opinion Survey-Information Scale (KHOS-I). Each participant completed four simulated health information search tasks: two look-up (fact-finding) and two exploratory. Their behaviors while interacting with the search systems were automatically logged and ratings of their perceptions of tasks and user experience with the systems were collected using Likert-scale questionnaires. Results The MBSS showed low reliability with the participants (Monitoring subscale: Cronbach alpha=.53; Blunting subscale: Cronbach alpha=.35). Thus, no further analyses were performed based on the scale. KHOS-I had sufficient reliability (Cronbach alpha=.77). Participants were classified into low- and high-preference groups based on their KHOS-I scores. The high-preference group submitted significantly shorter queries when completing the look-up tasks (P=.02). The high-preference group made a significantly higher percentage of parallel movements in query reformulation than did the low-preference group (P=.04), whereas the low-preference group made a significantly higher percentage of new concept movements than the high-preference group when completing the exploratory tasks (P=.01). The high-preference group found the exploratory tasks to be significantly more difficult (P=.05) and the systems to be less useful (P=.04) than did the low-preference group. Conclusions Preference for information has an impact on the search behavior of general consumers seeking health information. Those with a high preference were more likely to use more general queries when searching for specific factual information and to develop more complex mental representations of health concerns of an exploratory nature and try different combinations of concepts to explore these concerns. High-preference users were also more demanding on the system. Health information search systems should be tailored to fit individuals’ information preferences. PMID:24284061

  12. The effects of preference for information on consumers' online health information search behavior.

    PubMed

    Zhang, Yan

    2013-11-26

    Preference for information is a personality trait that affects people's tendency to seek information in health-related situations. Prior studies have focused primarily on investigating its impact on patient-provider communication and on the implications for designing information interventions that prepare patients for medical procedures. Few studies have examined its impact on general consumers' interactions with Web-based search engines for health information or the implications for designing more effective health information search systems. This study intends to fill this gap by investigating the impact of preference for information on the search behavior of general consumers seeking health information, their perceptions of search tasks (representing information needs), and user experience with search systems. Forty general consumers who had previously searched for health information online participated in the study in our usability lab. Preference for information was measured using Miller's Monitor-Blunter Style Scale (MBSS) and the Krantz Health Opinion Survey-Information Scale (KHOS-I). Each participant completed four simulated health information search tasks: two look-up (fact-finding) and two exploratory. Their behaviors while interacting with the search systems were automatically logged and ratings of their perceptions of tasks and user experience with the systems were collected using Likert-scale questionnaires. The MBSS showed low reliability with the participants (Monitoring subscale: Cronbach alpha=.53; Blunting subscale: Cronbach alpha=.35). Thus, no further analyses were performed based on the scale. KHOS-I had sufficient reliability (Cronbach alpha=.77). Participants were classified into low- and high-preference groups based on their KHOS-I scores. The high-preference group submitted significantly shorter queries when completing the look-up tasks (P=.02). The high-preference group made a significantly higher percentage of parallel movements in query reformulation than did the low-preference group (P=.04), whereas the low-preference group made a significantly higher percentage of new concept movements than the high-preference group when completing the exploratory tasks (P=.01). The high-preference group found the exploratory tasks to be significantly more difficult (P=.05) and the systems to be less useful (P=.04) than did the low-preference group. Preference for information has an impact on the search behavior of general consumers seeking health information. Those with a high preference were more likely to use more general queries when searching for specific factual information and to develop more complex mental representations of health concerns of an exploratory nature and try different combinations of concepts to explore these concerns. High-preference users were also more demanding on the system. Health information search systems should be tailored to fit individuals' information preferences.

  13. Standard Biological Parts Knowledgebase

    PubMed Central

    Galdzicki, Michal; Rodriguez, Cesar; Chandran, Deepak; Sauro, Herbert M.; Gennari, John H.

    2011-01-01

    We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate “promoter” parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible. PMID:21390321

  14. A weight based genetic algorithm for selecting views

    NASA Astrophysics Data System (ADS)

    Talebian, Seyed H.; Kareem, Sameem A.

    2013-03-01

    Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.

  15. Evaluation of an ontological resource for pharmacovigilance.

    PubMed

    Jaulent, Marie-Christine; Alecu, Iulian

    2009-01-01

    In this work, we present a methodology for evaluating an ontology designed in a previous study to describe adverse drug reactions. We evaluate it in term of its fitness for grouping cases in pharmacovigilance. We define as gold standard the Standardized MedDRA Queries (SMQs) developed manually to group terms representing similar medical conditions. We perform an automatic search in the ontology in order to retrieve concepts related to the medical conditions. An optimal query is built for each medical condition. The evaluation relies on the comparison between the terms in the SMQ and the terms subsumed by the query. The result is quantified by sensitivity and specificity. We applied this methodology for 24 SMQs and we obtain a mean sensitivity of 0.82. This work allows validating the semantic resource and provides, in perspective, tools to maintain the ontology while the knowledge is evolving.

  16. A Text Knowledge Base from the AI Handbook.

    ERIC Educational Resources Information Center

    Simmons, Robert F.

    1987-01-01

    Describes a prototype natural language text knowledge system (TKS) that was used to organize 50 pages of a handbook on artificial intelligence as an inferential knowledge base with natural language query and command capabilities. Representation of text, database navigation, query systems, discourse structuring, and future research needs are…

  17. Hybrid Filtering in Semantic Query Processing

    ERIC Educational Resources Information Center

    Jeong, Hanjo

    2011-01-01

    This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…

  18. A novel content-based medical image retrieval method based on query topic dependent image features (QTDIF)

    NASA Astrophysics Data System (ADS)

    Xiong, Wei; Qiu, Bo; Tian, Qi; Mueller, Henning; Xu, Changsheng

    2005-04-01

    Medical image retrieval is still mainly a research domain with a large variety of applications and techniques. With the ImageCLEF 2004 benchmark, an evaluation framework has been created that includes a database, query topics and ground truth data. Eleven systems (with a total of more than 50 runs) compared their performance in various configurations. The results show that there is not any one feature that performs well on all query tasks. Key to successful retrieval is rather the selection of features and feature weights based on a specific set of input features, thus on the query task. In this paper we propose a novel method based on query topic dependent image features (QTDIF) for content-based medical image retrieval. These feature sets are designed to capture both inter-category and intra-category statistical variations to achieve good retrieval performance in terms of recall and precision. We have used Gaussian Mixture Models (GMM) and blob representation to model medical images and construct the proposed novel QTDIF for CBIR. Finally, trained multi-class support vector machines (SVM) are used for image similarity ranking. The proposed methods have been tested over the Casimage database with around 9000 images, for the given 26 image topics, used for imageCLEF 2004. The retrieval performance has been compared with the medGIFT system, which is based on the GNU Image Finding Tool (GIFT). The experimental results show that the proposed QTDIF-based CBIR can provide significantly better performance than systems based general features only.

  19. Efficient processing of multiple nested event pattern queries over multi-dimensional event streams based on a triaxial hierarchical model.

    PubMed

    Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong

    2016-09-01

    For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Efficient hemodynamic event detection utilizing relational databases and wavelet analysis

    NASA Technical Reports Server (NTRS)

    Saeed, M.; Mark, R. G.

    2001-01-01

    Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.

  1. A Fuzzy Query Mechanism for Human Resource Websites

    NASA Astrophysics Data System (ADS)

    Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

    Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.

  2. Inquiry-Based Integrated Science Education: Implementation of Local Content “Soil Washing” Project To Improve Junior High School Students’ Environmental Literacy

    NASA Astrophysics Data System (ADS)

    Syifahayu

    2017-02-01

    The study was conducted based on teaching and learning problems led by conventional method that had been done in the process of learning science. It gave students lack opportunities to develop their competence and thinking skills. Consequently, the process of learning science was neglected. Students did not have opportunity to improve their critical attitude and creative thinking skills. To cope this problem, the study was conducted using Project-Based Learning model through inquiry-based science education about environment. The study also used an approach called Sains Lingkungan and Teknologi masyarakat - “Saling Temas” (Environmental science and Technology in Society) which promoted the local content in Lampung as a theme in integrated science teaching and learning. The study was a quasi-experimental with pretest-posttest control group design. Initially, the subjects were given a pre-test. The experimental group was given inquiry learning method while the control group was given conventional learning. After the learning process, the subjects of both groups were given post-test. Quantitative analysis was performed using the Mann-Whitney U-test and also a qualitative descriptive. Based on the result, environmental literacy skills of students who get inquiry learning strategy, with project-based learning model on the theme soil washing, showed significant differences. The experimental group is better than the control group. Data analysis showed the p-value or sig. (2-tailed) is 0.000 <α = 0.05 with the average N-gain of experimental group is 34.72 and control group is 16.40. Besides, the learning process becomes more meaningful.

  3. Sensitivity and Predictive Value of 15 PubMed Search Strategies to Answer Clinical Questions Rated Against Full Systematic Reviews

    PubMed Central

    Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-01-01

    Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care. PMID:22693047

  4. Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

    PubMed

    Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-06-12

    Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care.

  5. The Pathway Active Learning Environment: An interactive web-based tool for physics education

    NASA Astrophysics Data System (ADS)

    Nakamura, Christopher Matthew

    The work described here represents an effort to design, construct, and test an interactive online multimedia learning environment that can provide physics instruction to students in their homes. The system was designed with one-on-one human tutoring in mind as the mode of instruction. The system uses an original combination of a video-based tutor that incorporates natural language processing video-centered lessons and additional illustrative multimedia. Our Synthetic Interview (SI) tutor provides pre-recorded video answers from expert physics instructors in response to students' typed natural language questions. Our lessons cover Newton's laws and provide a context for the tutoring interaction to occur, connect physics ideas to real-world behavior of mechanical systems, and allow for quantitative testing of physics. Additional multimedia can be used to supplement the SI tutors' explanations and illustrate the physics of interest. The system is targeted at students of algebra-based and concept-based physics at the college and high school level. The system logs queries to the SI tutor, responses to lesson questions and several other interactions with the system, tagging those interactions with a username and timestamp. We have provided several groups of students with access to our system under several different conditions ranging from the controlled conditions of our interview facility to the naturalistic conditions of use at home. In total nearly two-hundred students have accessed the system. To gain insight into the ways students might use the system and understand the utility of its various components we analyzed qualitative interview data collected with 22 algebra-based physics students who worked with our system in our interview facility. We also performed a descriptive analysis of data from the system's log of user interactions. Finally we explored the use of machine learning to explore the possibility of using automated assessment to augment the interactive capabilities of the system as well as to identify productive and unproductive use patterns. This work establishes a proof-of-concept level demonstration of the feasibility of deploying this type of system. The impact of this work and the possibility of future research efforts are discussed in the context of Internet technologies that are changing rapidly.

  6. An XML-Based Manipulation and Query Language for Rule-Based Information

    NASA Astrophysics Data System (ADS)

    Mansour, Essam; Höpfner, Hagen

    Rules are utilized to assist in the monitoring process that is required in activities, such as disease management and customer relationship management. These rules are specified according to the application best practices. Most of research efforts emphasize on the specification and execution of these rules. Few research efforts focus on managing these rules as one object that has a management life-cycle. This paper presents our manipulation and query language that is developed to facilitate the maintenance of this object during its life-cycle and to query the information contained in this object. This language is based on an XML-based model. Furthermore, we evaluate the model and language using a prototype system applied to a clinical case study.

  7. Knowledge Acquisition of Generic Queries for Information Retrieval

    PubMed Central

    Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.

    2002-01-01

    Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.

  8. Active Learning Using Hint Information.

    PubMed

    Li, Chun-Liang; Ferng, Chun-Sung; Lin, Hsuan-Tien

    2015-08-01

    The abundance of real-world data and limited labeling budget calls for active learning, an important learning paradigm for reducing human labeling efforts. Many recently developed active learning algorithms consider both uncertainty and representativeness when making querying decisions. However, exploiting representativeness with uncertainty concurrently usually requires tackling sophisticated and challenging learning tasks, such as clustering. In this letter, we propose a new active learning framework, called hinted sampling, which takes both uncertainty and representativeness into account in a simpler way. We design a novel active learning algorithm within the hinted sampling framework with an extended support vector machine. Experimental results validate that the novel active learning algorithm can result in a better and more stable performance than that achieved by state-of-the-art algorithms. We also show that the hinted sampling framework allows improving another active learning algorithm designed from the transductive support vector machine.

  9. Supervised multimedia categorization

    NASA Astrophysics Data System (ADS)

    Aldershoff, Frank; Salden, Alfons H.; Iacob, Sorin M.; Kempen, Masja

    2003-01-01

    Static multimedia on the Web can already be hardly structured manually. Although unavoidable and necessary, manual annotation of dynamic multimedia becomes even less feasible when multimedia quickly changes in complexity, i.e. in volume, modality, and usage context. The latter context could be set by learning or other purposes of the multimedia material. This multimedia dynamics calls for categorisation systems that index, query and retrieve multimedia objects on the fly in a similar way as a human expert would. We present and demonstrate such a supervised dynamic multimedia object categorisation system. Our categorisation system comes about by continuously gauging it to a group of human experts who annotate raw multimedia for a certain domain ontology given a usage context. Thus effectively our system learns the categorisation behaviour of human experts. By inducing supervised multi-modal content and context-dependent potentials our categorisation system associates field strengths of raw dynamic multimedia object categorisations with those human experts would assign. After a sufficient long period of supervised machine learning we arrive at automated robust and discriminative multimedia categorisation. We demonstrate the usefulness and effectiveness of our multimedia categorisation system in retrieving semantically meaningful soccer-video fragments, in particular by taking advantage of multimodal and domain specific information and knowledge supplied by human experts.

  10. Exploring Meteorology Education in Community College: Lecture-based Instruction and Dialogue-based Group Learning

    NASA Astrophysics Data System (ADS)

    Finley, Jason Paul

    This study examined the impact of dialogue-based group instruction on student learning and engagement in community college meteorology education. A quasi-experimental design was used to compare lecture-based instruction with dialogue-based group instruction during two class sessions at one community college in southern California. Pre- and post-tests were used to measure learning and interest, while surveys were conducted two days after the learning events to assess engagement, perceived learning, and application of content. The results indicated that the dialogue-based group instruction was more successful in helping students learn than the lecture-based instruction. Each question that assessed learning had a higher score for the dialogue group that was statistically significant (alpha < 0.05) compared to the lecture group. The survey questions about perceived learning and application of content also exhibited higher scores that were statistically significant for the dialogue group. The qualitative portion of these survey questions supported the quantitative results and showed that the dialogue students were able to remember more concepts and apply these concepts to their lives. Dialogue students were also more engaged, as three out of the five engagement-related survey questions revealed statistically significantly higher scores for them. The qualitative data also supported increased engagement for the dialogue students. Interest in specific meteorological topics did not change significantly for either group of students; however, interest in learning about severe weather was higher for the dialogue group. Neither group found the learning events markedly meaningful, although more students from the dialogue group found pronounced meaning centered on applying severe weather knowledge to their lives. Active engagement in the dialogue approach kept these students from becoming distracted and allowed them to become absorbed in the learning event. This higher engagement most likely contributed to the resulting higher learning. Together, these results indicate that dialogue education, especially compared to lecture methods, has a great potential for helping students learn meteorology. Dialogue education can also help students engage in weather-related concepts and potentially develop better-informed citizens in a world with a changing climate.

  11. Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries

    ERIC Educational Resources Information Center

    Lazarinis, Fotis

    2008-01-01

    Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…

  12. Development of a Web-based Glaucoma Registry at King Khaled Eye Specialist Hospital, Saudi Arabia: A Cost-Effective Methodology

    PubMed Central

    Zaman, Babar; Khandekar, Rajiv; Al Shahwan, Sami; Song, Jonathan; Al Jadaan, Ibrahim; Al Jiasim, Leyla; Owaydha, Ohood; Asghar, Nasira; Hijazi, Amar; Edward, Deepak P.

    2014-01-01

    In this brief communication, we present the steps used to establish a web-based congenital glaucoma registry at our institution. The contents of a case report form (CRF) were developed by a group of glaucoma subspecialists. Information Technology (IT) specialists used Lime Survey softwareTM to create an electronic CRF. A MY Structured Query Language (MySQL) server was used as a database with a virtual machine operating system. Two ophthalmologists and 2 IT specialists worked for 7 hours, and a biostatistician and a data registrar worked for 24 hours each to establish the electronic CRF. Using the CRF which was transferred to the Lime survey tool, and the MYSQL server application, data could be directly stored in spreadsheet programs that included Microsoft Excel, SPSS, and R-Language and queried in real-time. In a pilot test, clinical data from 80 patients with congenital glaucoma were entered into the registry and successful descriptive analysis and data entry validation was performed. A web-based disease registry was established in a short period of time in a cost-efficient manner using available resources and a team-based approach. PMID:24791112

  13. Development of a web-based glaucoma registry at King Khaled Eye Specialist Hospital, Saudi Arabia: a cost-effective methodology.

    PubMed

    Zaman, Babar; Khandekar, Rajiv; Al Shahwan, Sami; Song, Jonathan; Al Jadaan, Ibrahim; Al Jiasim, Leyla; Owaydha, Ohood; Asghar, Nasira; Hijazi, Amar; Edward, Deepak P

    2014-01-01

    In this brief communication, we present the steps used to establish a web-based congenital glaucoma registry at our institution. The contents of a case report form (CRF) were developed by a group of glaucoma subspecialists. Information Technology (IT) specialists used Lime Survey softwareTM to create an electronic CRF. A MY Structured Query Language (MySQL) server was used as a database with a virtual machine operating system. Two ophthalmologists and 2 IT specialists worked for 7 hours, and a biostatistician and a data registrar worked for 24 hours each to establish the electronic CRF. Using the CRF which was transferred to the Lime survey tool, and the MYSQL server application, data could be directly stored in spreadsheet programs that included Microsoft Excel, SPSS, and R-Language and queried in real-time. In a pilot test, clinical data from 80 patients with congenital glaucoma were entered into the registry and successful descriptive analysis and data entry validation was performed. A web-based disease registry was established in a short period of time in a cost-efficient manner using available resources and a team-based approach.

  14. Using background knowledge for picture organization and retrieval

    NASA Astrophysics Data System (ADS)

    Quintana, Yuri

    1997-01-01

    A picture knowledge base management system is described that is used to represent, organize and retrieve pictures from a frame knowledge base. Experiments with human test subjects were conducted to obtain further descriptions of pictures from news magazines. These descriptions were used to represent the semantic content of pictures in frame representations. A conceptual clustering algorithm is described which organizes pictures not only on the observable features, but also on implicit properties derived from the frame representations. The algorithm uses inheritance reasoning to take into account background knowledge in the clustering. The algorithm creates clusters of pictures using a group similarity function that is based on the gestalt theory of picture perception. For each cluster created, a frame is generated which describes the semantic content of pictures in the cluster. Clustering and retrieval experiments were conducted with and without background knowledge. The paper shows how the use of background knowledge and semantic similarity heuristics improves the speed, precision, and recall of queries processed. The paper concludes with a discussion of how natural language processing of can be used to assist in the development of knowledge bases and the processing of user queries.

  15. An Automated Approach to Reasoning Under Multiple Perspectives

    NASA Technical Reports Server (NTRS)

    deBessonet, Cary

    2004-01-01

    This is the final report with emphasis on research during the last term. The context for the research has been the development of an automated reasoning technology for use in SMS (symbolic Manipulation System), a system used to build and query knowledge bases (KBs) using a special knowledge representation language SL (Symbolic Language). SMS interpreters assertive SL input and enters the results as components of its universe. The system operates in two basic models: 1) constructive mode (for building KBs); and 2) query/search mode (for querying KBs). Query satisfaction consists of matching query components with KB components. The system allows "penumbral matches," that is, matches that do not exactly meet the specifications of the query, but which are deemed relevant for the conversational context. If the user wants to know whether SMS has information that holds, say, for "any chow," the scope of relevancy might be set so that the system would respond based on a finding that it has information that holds for "most dogs," although this is not exactly what was called for by the query. The response would be qualified accordingly, as would normally be the case in ordinary human conversation. The general goal of the research was to develop an approach by which assertive content could be interpreted from multiple perspectives so that reasoning operations could be successfully conducted over the results. The interpretation of an SL statement such as, "{person believes [captain (asserted (perhaps)) (astronaut saw (comet (bright)))]}," which in English would amount to asserting something to the effect that, "Some person believes that a captain perhaps asserted that an astronaut saw a bright comet," would require the recognition of multiple perspectives, including some that are: a) epistemically-based (focusing on "believes"); b) assertion-based (focusing on "asserted"); c) perception-based (focusing on "saw"); d) adjectivally-based (focusing on "bight"); and e) modally-based (focusing on "perhaps"). Any conclusion reached under a line of reasoning that employs such an assertion or its associated implications should somehow reflect the employed perspectives. The investigators made significant progress in developing an approach that would enable a system to conduct reasoning operations over assertions of this kind while maintaining consistency in its knowledge bases. Significant accomplishments were made in the areas of: 1) integration and inferencing; 2) generation of perspectives, including wholistic ad composite views; and 3) consistency maintenance.

  16. Small numbers, disclosure risk, security, and reliability issues in Web-based data query systems.

    PubMed

    Rudolph, Barbara A; Shah, Gulzar H; Love, Denise

    2006-01-01

    This article describes the process for developing consensus guidelines and tools for releasing public health data via the Web and highlights approaches leading agencies have taken to balance disclosure risk with public dissemination of reliable health statistics. An agency's choice of statistical methods for improving the reliability of released data for Web-based query systems is based upon a number of factors, including query system design (dynamic analysis vs preaggregated data and tables), population size, cell size, data use, and how data will be supplied to users. The article also describes those efforts that are necessary to reduce the risk of disclosure of an individual's protected health information.

  17. G-Bean: an ontology-graph based web tool for biomedical literature retrieval

    PubMed Central

    2014-01-01

    Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588

  18. G-Bean: an ontology-graph based web tool for biomedical literature retrieval.

    PubMed

    Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S

    2014-01-01

    Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.

  19. An efficient approach for video information retrieval

    NASA Astrophysics Data System (ADS)

    Dong, Daoguo; Xue, Xiangyang

    2005-01-01

    Today, more and more video information can be accessed through internet, satellite, etc.. Retrieving specific video information from large-scale video database has become an important and challenging research topic in the area of multimedia information retrieval. In this paper, we introduce a new and efficient index structure OVA-File, which is a variant of VA-File. In OVA-File, the approximations close to each other in data space are stored in close positions of the approximation file. The benefit is that only a part of approximations close to the query vector need to be visited to get the query result. Both shot query algorithm and video clip algorithm are proposed to support video information retrieval efficiently. The experimental results showed that the queries based on OVA-File were much faster than that based on VA-File with small loss of result quality.

  20. A SQL-Database Based Meta-CASE System and its Query Subsystem

    NASA Astrophysics Data System (ADS)

    Eessaar, Erki; Sgirka, Rünno

    Meta-CASE systems simplify the creation of CASE (Computer Aided System Engineering) systems. In this paper, we present a meta-CASE system that provides a web-based user interface and uses an object-relational database system (ORDBMS) as its basis. The use of ORDBMSs allows us to integrate different parts of the system and simplify the creation of meta-CASE and CASE systems. ORDBMSs provide powerful query mechanism. The proposed system allows developers to use queries to evaluate and gradually improve artifacts and calculate values of software measures. We illustrate the use of the systems by using SimpleM modeling language and discuss the use of SQL in the context of queries about artifacts. We have created a prototype of the meta-CASE system by using PostgreSQL™ ORDBMS and PHP scripting language.

Top