Secure Skyline Queries on Cloud Platform.
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-04-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
Secure Skyline Queries on Cloud Platform
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-01-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
Camera Geolocation From Mountain Images
2015-09-17
be reliably extracted from query images. However, in real-life scenarios the skyline in a query image may be blurred or invisible , due to occlusions...extracted from multiple mountain ridges is critical to reliably geolocating challenging real-world query images with blurred or invisible mountain skylines...Buddemeier, A. Bissacco, F. Brucher, T. Chua, H. Neven, and J. Yagnik, “Tour the world: building a web -scale landmark recognition engine,” in Proc. of
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.
Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen
2017-09-25
One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †
Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen
2017-01-01
One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679
Honeybees use the skyline in orientation.
Towne, William F; Ritrovato, Antoinette E; Esposto, Antonina; Brown, Duncan F
2017-07-01
In view-based navigation, animals acquire views of the landscape from various locations and then compare the learned views with current views in order to orient in certain directions or move toward certain destinations. One landscape feature of great potential usefulness in view-based navigation is the skyline, the silhouette of terrestrial objects against the sky, as it is distant, relatively stable and easy to detect. The skyline has been shown to be important in the view-based navigation of ants, but no flying insect has yet been shown definitively to use the skyline in this way. Here, we show that honeybees do indeed orient using the skyline. A feeder was surrounded with an artificial replica of the natural skyline there, and the bees' departures toward the nest were recorded from above with a video camera under overcast skies (to eliminate celestial cues). When the artificial skyline was rotated, the bees' departures were rotated correspondingly, showing that the bees oriented by the artificial skyline alone. We discuss these findings in the context of the likely importance of the skyline in long-range homing in bees, the likely importance of altitude in using the skyline, the likely role of ultraviolet light in detecting the skyline, and what we know about the bees' ability to resolve skyline features. © 2017. Published by The Company of Biologists Ltd.
Application of a fast skyline computation algorithm for serendipitous searching problems
NASA Astrophysics Data System (ADS)
Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary
2018-02-01
Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.
Programs for skyline planning.
Ward W. Carson
1975-01-01
This paper describes four computer programs for the logging engineer's use in planning log harvesting by skyline systems. One program prepares terrain profile plots from maps mounted on a digitizer; the other programs prepare load-carrying capability and other information for single and multispan standing skylines and single span running skylines. In general, the...
Skyline: an open source document editor for creating and analyzing targeted proteomics experiments.
MacLean, Brendan; Tomazela, Daniela M; Shulman, Nicholas; Chambers, Matthew; Finney, Gregory L; Frewen, Barbara; Kern, Randall; Tabb, David L; Liebler, Daniel C; MacCoss, Michael J
2010-04-01
Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.
Efficiently Selecting the Best Web Services
NASA Astrophysics Data System (ADS)
Goncalves, Marlene; Vidal, Maria-Esther; Regalado, Alfredo; Yacoubi Ayadi, Nadia
Emerging technologies and linking data initiatives have motivated the publication of a large number of datasets, and provide the basis for publishing Web services and tools to manage the available data. This wealth of resources opens a world of possibilities to satisfy user requests. However, Web services may have similar functionality and assess different performance; therefore, it is required to identify among the Web services that satisfy a user request, the ones with the best quality. In this paper we propose a hybrid approach that combines reasoning tasks with ranking techniques to aim at the selection of the Web services that best implement a user request. Web service functionalities are described in terms of input and output attributes annotated with existing ontologies, non-functionality is represented as Quality of Services (QoS) parameters, and user requests correspond to conjunctive queries whose sub-goals impose restrictions on the functionality and quality of the services to be selected. The ontology annotations are used in different reasoning tasks to infer service implicit properties and to augment the size of the service search space. Furthermore, QoS parameters are considered by a ranking metric to classify the services according to how well they meet a user non-functional condition. We assume that all the QoS parameters of the non-functional condition are equally important, and apply the Top-k Skyline approach to select the k services that best meet this condition. Our proposal relies on a two-fold solution which fires a deductive-based engine that performs different reasoning tasks to discover the services that satisfy the requested functionality, and an efficient implementation of the Top-k Skyline approach to compute the top-k services that meet the majority of the QoS constraints. Our Top-k Skyline solution exploits the properties of the Skyline Frequency metric and identifies the top-k services by just analyzing a subset of the services that meet the non-functional condition. We report on the effects of the proposed reasoning tasks, the quality of the top-k services selected by the ranking metric, and the performance of the proposed ranking techniques. Our results suggest that the number of services can be augmented by up two orders of magnitude. In addition, our ranking techniques are able to identify services that have the best values in at least half of the QoS parameters, while the performance is improved.
Skyline: an open source document editor for creating and analyzing targeted proteomics experiments
MacLean, Brendan; Tomazela, Daniela M.; Shulman, Nicholas; Chambers, Matthew; Finney, Gregory L.; Frewen, Barbara; Kern, Randall; Tabb, David L.; Liebler, Daniel C.; MacCoss, Michael J.
2010-01-01
Summary: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project. Contact: brendanx@u.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20147306
Modeling and query the uncertainty of network constrained moving objects based on RFID data
NASA Astrophysics Data System (ADS)
Han, Liang; Xie, Kunqing; Ma, Xiujun; Song, Guojie
2007-06-01
The management of network constrained moving objects is more and more practical, especially in intelligent transportation system. In the past, the location information of moving objects on network is collected by GPS, which cost high and has the problem of frequent update and privacy. The RFID (Radio Frequency IDentification) devices are used more and more widely to collect the location information. They are cheaper and have less update. And they interfere in the privacy less. They detect the id of the object and the time when moving object passed by the node of the network. They don't detect the objects' exact movement in side the edge, which lead to a problem of uncertainty. How to modeling and query the uncertainty of the network constrained moving objects based on RFID data becomes a research issue. In this paper, a model is proposed to describe the uncertainty of network constrained moving objects. A two level index is presented to provide efficient access to the network and the data of movement. The processing of imprecise time-slice query and spatio-temporal range query are studied in this paper. The processing includes four steps: spatial filter, spatial refinement, temporal filter and probability calculation. Finally, some experiments are done based on the simulated data. In the experiments the performance of the index is studied. The precision and recall of the result set are defined. And how the query arguments affect the precision and recall of the result set is also discussed.
A hydraulic assist for a manual skyline lock
Cleveland J. Biller
1977-01-01
A hydraulic locking mechanism was designed to replace the manual skyline lock on a small standing skyline with gravity carriage. It improved the efficiency of the operation by reducing setup and takedown times and reduced the hazard to the crew.
48. VIEW OF SKYLINE DRIVE FROM THE ROCKY PEAK OF ...
48. VIEW OF SKYLINE DRIVE FROM THE ROCKY PEAK OF STONY MAN MOUNTAIN (EL. 4,011). LOOKING NORTHEAST. STONY MAN OVERLOOK VISIBLE IN THE DISTANCE. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
Conceptions of Height and Verticality in the History of Skyscrapers and Skylines
NASA Astrophysics Data System (ADS)
Maslovskaya, Oksana; Ignatov, Grigoriy
2018-03-01
The main goal of this article is to reveal the significance of height and verticality history of skyscrapers and skylines. The objectives are as follows: 1. trace the origin of design concepts related to skyscraper; 2. discuss the perceived experience of the cultural aspects of skyscrapers and skylines; 3. describe the differences and similarities of the profiles of with comparable skylines. The methodology of study is designed to explore the perceived theory and principals of skyscraper and skyline development phenomenon and its key features. The skyscraper reveals an assertive creative form of vertical design. Skyscraper construction also relates to the origin of ancient cultural symbolism as the dominant vertical element as the main features of an ordered space. The historical idea of height reaches back to the earliest civilization such as the Tower of Babel. Philosophical approaches of elements of such post-structuralism have been included in studying of skyscraper phenomenon. The analysis of skyscraper and their resulting skyline are examined to show the connection to their origins with their concepts of height and verticality. From the historical perspective, cities with skyscrapers and a skyline turn out to be an assertive manifestation of common ideas of height and verticality.
Skyline Harvesting in Appalachia
J. N. Kochenderfer; G. W. Wendel
1978-01-01
The URUS, a small standing skyline system, was tested in the Appalachian Mountains of north-central West Virginia. Some problems encountered with this small, mobile system are discussed. From the results of this test and observation of skyline systems used in the western United States, the authors suggest some machine characteristics that would be desirable for use in...
Operational test of the prototype peewee yarder.
Charles N. Mann; Ronald W. Mifflin
1979-01-01
An operational test of a small, prototype running skyline yarder was conducted early in 1978. Test results indicate that this yarder concept promises a low cost, high performance system for harvesting small logs where skyline methods are indicated. Timber harvest by thinning took place on 12 uphill and 2 downhill skyline roads, and clearcut harvesting was performed on...
The SKYTOWER and SKYMOBILE programs for locating and designing skyline harvest units.
R.H. Twito; R.J. McGaughey; S.E. Reutebuch
1988-01-01
PLANS, a software package for integrated timber-harvest planning, uses digital terrain models to provide the topographic data needed to fit harvest and transportation designs to specific terrain. SKYTOWER and SKYMOBILE are integral programs in the PLANS package and are used to design the timber-harvest units for skyline systems. SKYTOWER determines skyline payloads and...
Optimization of the Controlled Evaluation of Closed Relational Queries
NASA Astrophysics Data System (ADS)
Biskup, Joachim; Lochner, Jan-Hendrik; Sonntag, Sebastian
For relational databases, controlled query evaluation is an effective inference control mechanism preserving confidentiality regarding a previously declared confidentiality policy. Implementations of controlled query evaluation usually lack efficiency due to costly theorem prover calls. Suitably constrained controlled query evaluation can be implemented efficiently, but is not flexible enough from the perspective of database users and security administrators. In this paper, we propose an optimized framework for controlled query evaluation in relational databases, being efficiently implementable on the one hand and relaxing the constraints of previous approaches on the other hand.
Applying Wave (registered trademark) to Build an Air Force Community of Interest Shared Space
2007-08-01
Performance. It is essential that an inverse transform be defined for every transform, or else the query mediator must be smart enough to figure out how...to invert it. Without an inverse transform , if an incoming query constrains on the transformed attribute, the query mediator might generate a query...plan that is horribly inefficient. If you must code a custom transformation function, you must also code the inverse transform . Putting the
An analysis of running skyline load path.
Ward W. Carson; Charles N. Mann
1971-01-01
This paper is intended for those who wish to prepare an algorithm to determine the load path of a running skyline. The mathematics of a simplified approach to this running skyline design problem are presented. The approach employs assumptions which reduce the complexity of the problem to the point where it can be solved on desk-top computers of limited capacities. The...
Panorama: A Targeted Proteomics Knowledge Base
2015-01-01
Panorama is a web application for storing, sharing, analyzing, and reusing targeted assays created and refined with Skyline,1 an increasingly popular Windows client software tool for targeted proteomics experiments. Panorama allows laboratories to store and organize curated results contained in Skyline documents with fine-grained permissions, which facilitates distributed collaboration and secure sharing of published and unpublished data via a web-browser interface. It is fully integrated with the Skyline workflow and supports publishing a document directly to a Panorama server from the Skyline user interface. Panorama captures the complete Skyline document information content in a relational database schema. Curated results published to Panorama can be aggregated and exported as chromatogram libraries. These libraries can be used in Skyline to pick optimal targets in new experiments and to validate peak identification of target peptides. Panorama is open-source and freely available. It is distributed as part of LabKey Server,2 an open source biomedical research data management system. Laboratories and organizations can set up Panorama locally by downloading and installing the software on their own servers. They can also request freely hosted projects on https://panoramaweb.org, a Panorama server maintained by the Department of Genome Sciences at the University of Washington. PMID:25102069
Registration of Panoramic/Fish-Eye Image Sequence and LiDAR Points Using Skyline Features
Zhu, Ningning; Jia, Yonghong; Ji, Shunping
2018-01-01
We propose utilizing a rigorous registration model and a skyline-based method for automatic registration of LiDAR points and a sequence of panoramic/fish-eye images in a mobile mapping system (MMS). This method can automatically optimize original registration parameters and avoid the use of manual interventions in control point-based registration methods. First, the rigorous registration model between the LiDAR points and the panoramic/fish-eye image was built. Second, skyline pixels from panoramic/fish-eye images and skyline points from the MMS’s LiDAR points were extracted, relying on the difference in the pixel values and the registration model, respectively. Third, a brute force optimization method was used to search for optimal matching parameters between skyline pixels and skyline points. In the experiments, the original registration method and the control point registration method were used to compare the accuracy of our method with a sequence of panoramic/fish-eye images. The result showed: (1) the panoramic/fish-eye image registration model is effective and can achieve high-precision registration of the image and the MMS’s LiDAR points; (2) the skyline-based registration method can automatically optimize the initial attitude parameters, realizing a high-precision registration of a panoramic/fish-eye image and the MMS’s LiDAR points; and (3) the attitude correction values of the sequences of panoramic/fish-eye images are different, and the values must be solved one by one. PMID:29883431
Schilling, Birgit; Rardin, Matthew J; MacLean, Brendan X; Zawadzka, Anna M; Frewen, Barbara E; Cusack, Michael P; Sorensen, Dylan J; Bereman, Michael S; Jing, Enxuan; Wu, Christine C; Verdin, Eric; Kahn, C Ronald; Maccoss, Michael J; Gibson, Bradford W
2012-05-01
Despite advances in metabolic and postmetabolic labeling methods for quantitative proteomics, there remains a need for improved label-free approaches. This need is particularly pressing for workflows that incorporate affinity enrichment at the peptide level, where isobaric chemical labels such as isobaric tags for relative and absolute quantitation and tandem mass tags may prove problematic or where stable isotope labeling with amino acids in cell culture labeling cannot be readily applied. Skyline is a freely available, open source software tool for quantitative data processing and proteomic analysis. We expanded the capabilities of Skyline to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments. Moreover, unlike existing programs, Skyline MS1 filtering can be used with mass spectrometers from four major vendors, which allows results to be compared directly across laboratories. The new quantitative and graphical tools now available in Skyline specifically support interrogation of multiple acquisitions for MS1 filtering, including visual inspection of peak picking and both automated and manual integration, key features often lacking in existing software. In addition, Skyline MS1 filtering displays retention time indicators from underlying MS/MS data contained within the spectral library to ensure proper peak selection. The modular structure of Skyline also provides well defined, customizable data reports and thus allows users to directly connect to existing statistical programs for post hoc data analysis. To demonstrate the utility of the MS1 filtering approach, we have carried out experiments on several MS platforms and have specifically examined the performance of this method to quantify two important post-translational modifications: acetylation and phosphorylation, in peptide-centric affinity workflows of increasing complexity using mouse and human models.
Schilling, Birgit; Rardin, Matthew J.; MacLean, Brendan X.; Zawadzka, Anna M.; Frewen, Barbara E.; Cusack, Michael P.; Sorensen, Dylan J.; Bereman, Michael S.; Jing, Enxuan; Wu, Christine C.; Verdin, Eric; Kahn, C. Ronald; MacCoss, Michael J.; Gibson, Bradford W.
2012-01-01
Despite advances in metabolic and postmetabolic labeling methods for quantitative proteomics, there remains a need for improved label-free approaches. This need is particularly pressing for workflows that incorporate affinity enrichment at the peptide level, where isobaric chemical labels such as isobaric tags for relative and absolute quantitation and tandem mass tags may prove problematic or where stable isotope labeling with amino acids in cell culture labeling cannot be readily applied. Skyline is a freely available, open source software tool for quantitative data processing and proteomic analysis. We expanded the capabilities of Skyline to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments. Moreover, unlike existing programs, Skyline MS1 filtering can be used with mass spectrometers from four major vendors, which allows results to be compared directly across laboratories. The new quantitative and graphical tools now available in Skyline specifically support interrogation of multiple acquisitions for MS1 filtering, including visual inspection of peak picking and both automated and manual integration, key features often lacking in existing software. In addition, Skyline MS1 filtering displays retention time indicators from underlying MS/MS data contained within the spectral library to ensure proper peak selection. The modular structure of Skyline also provides well defined, customizable data reports and thus allows users to directly connect to existing statistical programs for post hoc data analysis. To demonstrate the utility of the MS1 filtering approach, we have carried out experiments on several MS platforms and have specifically examined the performance of this method to quantify two important post-translational modifications: acetylation and phosphorylation, in peptide-centric affinity workflows of increasing complexity using mouse and human models. PMID:22454539
NASA Technical Reports Server (NTRS)
Page, Lance; Shen, C. N.
1991-01-01
This paper describes skyline-based terrain matching, a new method for locating the vantage point of laser range-finding measurements on a global map previously prepared by satellite or aerial mapping. Skylines can be extracted from the range-finding measurements and modelled from the global map, and are represented in parametric, cylindrical form with azimuth angle as the independent variable. The three translational parameters of the vantage point are determined with a three-dimensional matching of these two sets of skylines.
3D exploitation of large urban photo archives
NASA Astrophysics Data System (ADS)
Cho, Peter; Snavely, Noah; Anderson, Ross
2010-04-01
Recent work in computer vision has demonstrated the potential to automatically recover camera and scene geometry from large collections of uncooperatively-collected photos. At the same time, aerial ladar and Geographic Information System (GIS) data are becoming more readily accessible. In this paper, we present a system for fusing these data sources in order to transfer 3D and GIS information into outdoor urban imagery. Applying this system to 1000+ pictures shot of the lower Manhattan skyline and the Statue of Liberty, we present two proof-of-concept examples of geometry-based photo enhancement which are difficult to perform via conventional image processing: feature annotation and image-based querying. In these examples, high-level knowledge projects from 3D world-space into georegistered 2D image planes and/or propagates between different photos. Such automatic capabilities lay the groundwork for future real-time labeling of imagery shot in complex city environments by mobile smart phones.
3. ENVIRONMENT, FROM NORTH, SHOWING RICHMOND SKYLINE, BRIDGE DECK AND ...
3. ENVIRONMENT, FROM NORTH, SHOWING RICHMOND SKYLINE, BRIDGE DECK AND ROADWAY, AND NORTH APPROACH - Fifth Street Viaduct, Spanning Bacon's Quarter Branch Valley on Fifth Street, Richmond, Independent City, VA
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
2018-05-03
The Tsugaru Iwaki Skyline is a toll road in northern Japan, which partially ascends Mount Iwaki stratovolcano, and is notable for its steep gradient and 69 hairpin turns. The road ascends 806 meters over an average gradient of 8.66%, with some sections going up to 10%. The Tsugaru Iwaki Skyline has been considered one of the most dangerous mountain roads in the world. (Wikipedia) The image was acquired May 26, 2015, and is located at 40.6 degrees north, 140.3 degrees east. https://photojournal.jpl.nasa.gov/catalog/PIA22385
2010-05-01
Skyline Algorithms 2.2.1 Block-Nested Loops A simple way to find the skyline is to use the block-nested loops ( BNL ) algorithm [3], which is the algorithm...by an NDS member are discarded. After every individual has been compared with the NDS, the NDS is the dataset’s skyline. In the best case for BNL ...SFS) algorithm [4] is a variation on BNL that first introduces the idea of initially ordering the individuals by a monotonically increasing scoring
77 FR 15118 - Buy American Exceptions Under the American Recovery and Reinvestment Act of 2009
Federal Register 2010, 2011, 2012, 2013, 2014
2012-03-14
... heat pumps for the Skyline Crest Sustainability Upgrade project. FOR FURTHER INFORMATION CONTACT... Skyline Crest Sustainability Upgrade project. The exception was granted by HUD on the basis that the...
Lapierre, Marguerite; Blin, Camille; Lambert, Amaury; Achaz, Guillaume; Rocha, Eduardo P C
2016-07-01
Recent studies have linked demographic changes and epidemiological patterns in bacterial populations using coalescent-based approaches. We identified 26 studies using skyline plots and found that 21 inferred overall population expansion. This surprising result led us to analyze the impact of natural selection, recombination (gene conversion), and sampling biases on demographic inference using skyline plots and site frequency spectra (SFS). Forward simulations based on biologically relevant parameters from Escherichia coli populations showed that theoretical arguments on the detrimental impact of recombination and especially natural selection on the reconstructed genealogies cannot be ignored in practice. In fact, both processes systematically lead to spurious interpretations of population expansion in skyline plots (and in SFS for selection). Weak purifying selection, and especially positive selection, had important effects on skyline plots, showing patterns akin to those of population expansions. State-of-the-art techniques to remove recombination further amplified these biases. We simulated three common sampling biases in microbiological research: uniform, clustered, and mixed sampling. Alone, or together with recombination and selection, they further mislead demographic inferences producing almost any possible skyline shape or SFS. Interestingly, sampling sub-populations also affected skyline plots and SFS, because the coalescent rates of populations and their sub-populations had different distributions. This study suggests that extreme caution is needed to infer demographic changes solely based on reconstructed genealogies. We suggest that the development of novel sampling strategies and the joint analyzes of diverse population genetic methods are strictly necessary to estimate demographic changes in populations where selection, recombination, and biased sampling are present. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Incremental Query Rewriting with Resolution
NASA Astrophysics Data System (ADS)
Riazanov, Alexandre; Aragão, Marcelo A. T.
We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.
IJA: an efficient algorithm for query processing in sensor networks.
Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa
2011-01-01
One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm.
IJA: An Efficient Algorithm for Query Processing in Sensor Networks
Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa
2011-01-01
One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm. PMID:22319375
Fragger: a protein fragment picker for structural queries.
Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J
2017-01-01
Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.
Semantic integration of information about orthologs and diseases: the OGO system.
Miñarro-Gimenez, Jose Antonio; Egaña Aranguren, Mikel; Martínez Béjar, Rodrigo; Fernández-Breis, Jesualdo Tomás; Madrid, Marisa
2011-12-01
Semantic Web technologies like RDF and OWL are currently applied in life sciences to improve knowledge management by integrating disparate information. Many of the systems that perform such task, however, only offer a SPARQL query interface, which is difficult to use for life scientists. We present the OGO system, which consists of a knowledge base that integrates information of orthologous sequences and genetic diseases, providing an easy to use ontology-constrain driven query interface. Such interface allows the users to define SPARQL queries through a graphical process, therefore not requiring SPARQL expertise. Copyright © 2011 Elsevier Inc. All rights reserved.
Block 2. Photograph represents general view taken from the north/west ...
Block 2. Photograph represents general view taken from the north/west region of the May D & F Tower. Photograph shows the main public gathering space for Skyline Park and depicts a light feature and an Information sign - Skyline Park, 1500-1800 Arapaho Street, Denver, Denver County, CO
ERIC Educational Resources Information Center
Skyline Coll., San Bruno, CA.
A joint project was conducted between Toyota Motor Sales and Skyline College (in the San Francisco, California, area) to create an automotive technician training program that would serve the needs of working adults. During the project, a model high technology curriculum suitable for adults was developed, the quality of instruction available for…
A Skyline Plugin for Pathway-Centric Data Browsing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Degan, Michael G.; Ryadinskiy, Lillian; Fujimoto, Grant M.
For targeted proteomics to be broadly adopted in biological laboratories as a routine experimental protocol, wet-bench biologists must be able to approach SRM assay design in the same way they approach biological experimental design. Most often, biological hypotheses are envisioned in a set of protein interactions, networks and pathways. We present a plugin for the popular Skyline tool that presents public mass spectrometry data in a pathway-centric view to assist users in browsing available data and determining how to design quantitative experiments. Selected proteins and their underlying mass spectra are imported to Skyline for further assay design (transition selection). Themore » same plugin can be used for hypothesis-drive DIA data analysis, again utilizing the pathway view to help narrow down the set of proteins which will be investigated. The plugin is backed by the PNNL Biodiversity Library, a corpus of 3 million peptides from >100 organisms, and the draft human proteome. Users can upload personal data to the plugin to use the pathway navigation prior to importing their own data into Skyline.« less
A Skyline Plugin for Pathway-Centric Data Browsing
NASA Astrophysics Data System (ADS)
Degan, Michael G.; Ryadinskiy, Lillian; Fujimoto, Grant M.; Wilkins, Christopher S.; Lichti, Cheryl F.; Payne, Samuel H.
2016-11-01
For targeted proteomics to be broadly adopted in biological laboratories as a routine experimental protocol, wet-bench biologists must be able to approach selected reaction monitoring (SRM) and parallel reaction monitoring (PRM) assay design in the same way they approach biological experimental design. Most often, biological hypotheses are envisioned in a set of protein interactions, networks, and pathways. We present a plugin for the popular Skyline tool that presents public mass spectrometry data in a pathway-centric view to assist users in browsing available data and determining how to design quantitative experiments. Selected proteins and their underlying mass spectra are imported to Skyline for further assay design (transition selection). The same plugin can be used for hypothesis-driven data-independent acquisition (DIA) data analysis, again utilizing the pathway view to help narrow down the set of proteins that will be investigated. The plugin is backed by the Pacific Northwest National Laboratory (PNNL) Biodiversity Library, a corpus of 3 million peptides from >100 organisms, and the draft human proteome. Users can upload personal data to the plugin to use the pathway navigation prior to importing their own data into Skyline.
Spectral Skyline Separation: Extended Landmark Databases and Panoramic Imaging
Differt, Dario; Möller, Ralf
2016-01-01
Evidence from behavioral experiments suggests that insects use the skyline as a cue for visual navigation. However, changes of lighting conditions, over hours, days or possibly seasons, significantly affect the appearance of the sky and ground objects. One possible solution to this problem is to extract the “skyline” by an illumination-invariant classification of the environment into two classes, ground objects and sky. In a previous study (Insect models of illumination-invariant skyline extraction from UV (ultraviolet) and green channels), we examined the idea of using two different color channels available for many insects (UV and green) to perform this segmentation. We found out that for suburban scenes in temperate zones, where the skyline is dominated by trees and artificial objects like houses, a “local” UV segmentation with adaptive thresholds applied to individual images leads to the most reliable classification. Furthermore, a “global” segmentation with fixed thresholds (trained on an image dataset recorded over several days) using UV-only information is only slightly worse compared to using both the UV and green channel. In this study, we address three issues: First, to enhance the limited range of environments covered by the dataset collected in the previous study, we gathered additional data samples of skylines consisting of minerals (stones, sand, earth) as ground objects. We could show that also for mineral-rich environments, UV-only segmentation achieves a quality comparable to multi-spectral (UV and green) segmentation. Second, we collected a wide variety of ground objects to examine their spectral characteristics under different lighting conditions. On the one hand, we found that the special case of diffusely-illuminated minerals increases the difficulty to reliably separate ground objects from the sky. On the other hand, the spectral characteristics of this collection of ground objects covers well with the data collected in the skyline databases, increasing, due to the increased variety of ground objects, the validity of our findings for novel environments. Third, we collected omnidirectional images, as often used for visual navigation tasks, of skylines using an UV-reflective hyperbolic mirror. We could show that “local” separation techniques can be adapted to the use of panoramic images by splitting the image into segments and finding individual thresholds for each segment. Contrarily, this is not possible for ‘global’ separation techniques. PMID:27690053
J. E. Baumgras; C. B. LeDoux; J. R. Sherar
1993-01-01
To evaluate the potential for moderating the visual impact and soil disturbance associated with timber harvesting on steep-slope hardwood sites, thinning and shelterwood harvests were conducted with a skyline yarding system. Operations were monitored to document harvesting production, residual stand damage, soil disturbance, and visual quality. Yarding costs for...
Skyline Gathers K-12 Together Under One Roof.
ERIC Educational Resources Information Center
American School Board Journal, 1968
1968-01-01
Skyline School is a flexible and economical elementary and high school design for 400 pupils. The library, a large resource center serving all ages, and the administration offices are accented by landscaped courts. There are two instructional material centers per grade grouping of K-6 and 7-12. Grades 1-6 surround the kindergarten, which has…
Hardwood silviculture and skyline yarding on steep slopes: economic and environmental impacts
John E. Baumgras; Chris B. LeDoux
1995-01-01
Ameliorating the visual and environmental impact associated with harvesting hardwoods on steep slopes will require the efficient use of skyline yarding along with silvicultural alternatives to clearcutting. In evaluating the effects of these alternatives on harvesting revenue, results of field studies and computer simulations were used to estimate costs and revenue for...
ERIC Educational Resources Information Center
Burns, Robert J.
The major purpose of this evaluation report is to scrutinize the Skyline Wide Educational Plan (SWEP) research methods and analytical schemes and to communicate the project's constituency priorities relative to the educational programs and processes of the future. A Delphi technique was used as the primary mechanism for gathering and scrutinizing…
Tree damage from skyline logging in a western larch/Douglas-fir stand
Robert E. Benson; Michael J. Gonsior
1981-01-01
Damage to shelterwood leave trees and to understory trees in shelterwood and clearcut logging units logged with skyline yarders was measured, and related to stand conditions, harvesting specifications, and yarding system-terrain interactions. About 23 percent of the marked leave trees in the shelterwood units were killed in logging, and about 10 percent had moderate to...
Mountain Logging Symposium Proceedings Held in West Virginia on Jun 5-7, 1984
1984-06-07
and board" analysis ( Lysons and Mann 1967) provided a method to make skyline payload determination feasible using topographic maps or field run... Lysons , Hilton H.; Mann, Charles N. Skyline tension and deflection handbook. Res. Pap. PNW-39. Portland, OR: U.S. Department of Agriculture, Forest...those described by Mifflin and Lysons (1978)and Miyata (1980). The estimated cost for the Clearwater Yarder and a four-man crew was $48.27 per
ERIC Educational Resources Information Center
Dallas Independent School District, TX. Dept. of Research and Evaluation.
This volume consists of a number of appendixes containing data and analyses that were compiled to aid administrators of the Skyline Wide Educational Plan (SWEP) in their efforts to develop a comprehensive secondary school plan for the Dallas-Fort Worth metroplex in the 1970's. Much of the volume is devoted to various facility considerations…
Using Data Warehouses to extract knowledge from Agro-Hydrological simulations
NASA Astrophysics Data System (ADS)
Bouadi, Tassadit; Gascuel-Odoux, Chantal; Cordier, Marie-Odile; Quiniou, René; Moreau, Pierre
2013-04-01
In recent years, simulation models have been used more and more in hydrology to test the effect of scenarios and help stakeholders in decision making. Agro-hydrological models have oriented agricultural water management, by testing the effect of landscape structure and farming system changes on water and chemical emission in rivers. Such models generate a large amount of data while few of them, such as daily concentrations at the outlet of the catchment, or annual budgets regarding soil, water and atmosphere emissions, are stored and analyzed. Thus, a great amount of information is lost from the simulation process. This is due to the large volumes of simulated data, but also to the difficulties in analyzing and transforming the data in an usable information. In this talk we illustrate a data warehouse which has been built to store and manage simulation data coming from the agro-hydrological model TNT (Topography-based nitrogen transfer and transformations, (Beaujouan et al., 2002)). This model simulates the transfer and transformation of nitrogen in agricultural catchments. TNT was used over 10 years on the Yar catchment (western France), a 50 km2 square area which present a detailed data set and have to facing to environmental issue (coastal eutrophication). 44 output key simulated variables are stored at a daily time step, i.e, 8 GB of storage size, which allows the users to explore the N emission in space and time, to quantify all the processes of transfer and transformation regarding the cropping systems, their location within the catchment, the emission in water and atmosphere, and finally to get new knowledge and help in making specific and detailed decision in space and time. We present the dimensional modeling process of the Nitrogen in catchment data warehouse (i.e. the snowflake model). After identifying the set of multileveled dimensions with complex hierarchical structures and relationships among related dimension levels, we chose the snowflake model to design our agri-environmental data warehouse. The snowflake schema is required for flexible querying complex dimension relationships. We have designed the Nitrogen in catchment data warehouse using the open source Business Intelligence Platform Pentaho Version 3.5. We use the online analytical processing (OLAP) to access and exploit, intuitively and quickly, the multidimensional and aggregated data from the Nitrogen in catchment data warehouse. We illustrate how the data warehouse can be efficiently used to explore spatio-temporal dimensions and to discover new knowledge and enrich the exploitation level of simulations. We show how the OLAP tool can be used to provide the user with the ability to synthesize environmental information and to understand nitrates emission in surface water by using comparative, personalized views on historical data. To perform advanced analyses that aim to find meaningful patterns and relationships in the data, the Nitrogen in catchment data warehouse should be extended with data mining or information retrieval methods as Skyline queries (Bouadi et al., 2012). (Beaujouan et al., 2002) Beaujouan, V., Durand, P., Ruiz, L., Aurousseau, P., and Cotteret, G. (2002). A hydrological model dedicated to topography-based simulation of nitrogen transfer and transformation: rationale and application to the geomorphology denitrification relationship. Hydrological Processes, pages 493-507. (Bouadi et al., 2012) Bouadi, T., Cordier, M., and Quiniou, R. (2012). Incremental computation of skyline queries with dynamic preferences. In DEXA (1), pages 219-233.
Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples
Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav
2018-01-01
Abstract Motivation As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Results Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Availability and implementation Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. Contact chris.wilks@jhu.edu or langmea@cs.jhu.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968689
Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples.
Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav; Langmead, Ben
2018-01-01
As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. chris.wilks@jhu.edu or langmea@cs.jhu.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Extract useful knowledge from agro-hydrological simulations data for decision making
NASA Astrophysics Data System (ADS)
Gascuel-odoux, C.; Bouadi, T.; Cordier, M.; Quiniou, R.
2013-12-01
In recent years, models have been developed and used to test the effect of scenarios and help stakeholders in decision making. Agro-hydrological models have guided agricultural water management by testing the effect of landscape structure and farming system changes on water quantity and quality. Such models generate a large amount of data but few are stored and are often not customized for stakeholders, so that a great amount of information is lost from the simulation process or not transformed in a usable format. A first approach, already published (Trepos et al., 2012), has been developed to identify object oriented tree patterns, that represent surface flow and pollutant pathways from plot to plot, involved in water pollution by herbicides. A simulation model (Gascuel-odoux et al., 2009) predicted herbicide transfer rate, defined as the proportion of applied herbicide that reaches water courses. The predictions were used as a set of learning examples for symbolic learning techniques to induce rules based on qualitative and quantitative attributes and explain two extreme classes in transfer rate. Two automatic symbolic learning techniques were used: the inductive logic programming approach to induce spatial tree patterns, and an attribute-value method to induce aggregated attributes of the trees. A visualization interface allows the users to identify rules explaining contamination and mitigation measures improving the current situation. A second approach has been recently developed to analyse directly the simulated data (Bouadi et al, submitted). A data warehouse called N-catch has been built to store and manage simulation data from the agro-hydrological model TNT2 (Beaujouan et al., 2002). 44 output key simulated variables are stored per plot and at a daily time step on a 50 squared km area, i.e, 8 GB of storage size. After identifying the set of multileveled dimensions integrating hierarchical structures and relationships among related dimension levels, N-Catch has been designed using the open source Business Intelligence Platform Pentaho. We show how to use online analytical processing (OLAP) to access and exploit, intuitively and quickly, the multidimensional and aggregated data from the N-Catch data warehouse. We illustrate how the data warehouse can be used to explore spatio-temporal dimensions efficiently and to discover new knowledge at multiple levels of simulation. OLAP tool can be used to synthesize environmental information and understand nitrogen emissions in water bodies by generating comparative and personalized views of historical data. This DWH is currently extended with data mining or information retrieval methods as Skyline queries to perform advanced analyses (Bouadi et al., 2012). Bouadi et al. N-Catch: A Data Warehouse for Multilevel Analysis of Simulated Nitrogen Data from an Agro-hydrological Model. Submitted. Bouadi et al., 2012) Bouadi, T., Cordier, M., and Quiniou, R. (2012). Incremental computation of skyline queries with dynamic preferences. In DEXA (1), pages 219-233. Trepos et al. 2012. Mining simulation data by rule induction to determine critical source areas of stream water pollution by herbicides. Computers and Electronics in Agriculture 86, 75-88.
Querying archetype-based EHRs by search ontology-based XPath engineering.
Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich
2018-05-11
Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
KBGIS-II: A knowledge-based geographic information system
NASA Technical Reports Server (NTRS)
Smith, Terence; Peuquet, Donna; Menon, Sudhakar; Agarwal, Pankaj
1986-01-01
The architecture and working of a recently implemented Knowledge-Based Geographic Information System (KBGIS-II), designed to satisfy several general criteria for the GIS, is described. The system has four major functions including query-answering, learning and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial object language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is performing all its designated tasks successfully. Future reports will relate performance characteristics of the system.
Rapid Assessment of Contaminants and Interferences in Mass Spectrometry Data Using Skyline
NASA Astrophysics Data System (ADS)
Rardin, Matthew J.
2018-04-01
Proper sample preparation in proteomic workflows is essential to the success of modern mass spectrometry experiments. Complex workflows often require reagents which are incompatible with MS analysis (e.g., detergents) necessitating a variety of sample cleanup procedures. Efforts to understand and mitigate sample contamination are a continual source of disruption with respect to both time and resources. To improve the ability to rapidly assess sample contamination from a diverse array of sources, I developed a molecular library in Skyline for rapid extraction of contaminant precursor signals using MS1 filtering. This contaminant template library is easily managed and can be modified for a diverse array of mass spectrometry sample preparation workflows. Utilization of this template allows rapid assessment of sample integrity and indicates potential sources of contamination. [Figure not available: see fulltext.
NASA Astrophysics Data System (ADS)
Stockdale, James; Ineson, Philip
2016-04-01
Modelled predictions of the response of terrestrial systems to climate change are highly variable, yet the response of net ecosystem exchange (NEE) is a vital ecosystem behaviour to understand due to its inherent feedback to the carbon cycle. The establishment and subsequent monitoring of replicated experimental manipulations are a direct method to reveal these responses, yet are difficult to achieve as they typically resource-heavy and labour intensive. We actively manipulated the temperature at three agricultural grasslands in southern England and deployed novel 'SkyLine' systems, recently developed at the University of York, to continuously monitor GHG fluxes. Each 'SkyLine' is a low-cost and fully autonomous technology yet produces fluxes at a near-continuous temporal frequency and across a wide spatial area. The results produced by 'SkyLine' enable the detail response of each system to increased temperature over diurnal and seasonal timescales. Unexpected differences in NEE are shown between superficially similar ecosystems which, upon investigation, suggest that interactions between a variety of environmental variables are key and that knowledge of pre-existing environmental conditions help to predict a systems response to future climate. For example, the prevailing hydrological conditions at each site appear to affect its response to changing temperature. The high-frequency data shown here, combined with the fully-replicated experimental design reveal complex interactions which must be understood to improve predictions of ecosystem response to a changing climate.
2015-01-01
Food consumption is an important behavior that is regulated by an intricate array of neuropeptides (NPs). Although many feeding-related NPs have been identified in mammals, precise mechanisms are unclear and difficult to study in mammals, as current methods are not highly multiplexed and require extensive a priori knowledge about analytes. New advances in data-independent acquisition (DIA) MS/MS and the open-source quantification software Skyline have opened up the possibility to identify hundreds of compounds and quantify them from a single DIA MS/MS run. An untargeted DIA MSE quantification method using Skyline software for multiplexed, discovery-driven quantification was developed and found to produce linear calibration curves for peptides at physiologically relevant concentrations using a protein digest as internal standard. By using this method, preliminary relative quantification of the crab Cancer borealis neuropeptidome (<2 kDa, 137 peptides from 18 families) was possible in microdialysates from 8 replicate feeding experiments. Of these NPs, 55 were detected with an average mass error below 10 ppm. The time-resolved profiles of relative concentration changes for 6 are shown, and there is great potential for the use of this method in future experiments to aid in correlation of NP changes with behavior. This work presents an unbiased approach to winnowing candidate NPs related to a behavior of interest in a functionally relevant manner, and demonstrates the success of such a UPLC-MSE quantification method using the open source software Skyline. PMID:25552291
Schmerberg, Claire M; Liang, Zhidan; Li, Lingjun
2015-01-21
Food consumption is an important behavior that is regulated by an intricate array of neuropeptides (NPs). Although many feeding-related NPs have been identified in mammals, precise mechanisms are unclear and difficult to study in mammals, as current methods are not highly multiplexed and require extensive a priori knowledge about analytes. New advances in data-independent acquisition (DIA) MS/MS and the open-source quantification software Skyline have opened up the possibility to identify hundreds of compounds and quantify them from a single DIA MS/MS run. An untargeted DIA MS(E) quantification method using Skyline software for multiplexed, discovery-driven quantification was developed and found to produce linear calibration curves for peptides at physiologically relevant concentrations using a protein digest as internal standard. By using this method, preliminary relative quantification of the crab Cancer borealis neuropeptidome (<2 kDa, 137 peptides from 18 families) was possible in microdialysates from 8 replicate feeding experiments. Of these NPs, 55 were detected with an average mass error below 10 ppm. The time-resolved profiles of relative concentration changes for 6 are shown, and there is great potential for the use of this method in future experiments to aid in correlation of NP changes with behavior. This work presents an unbiased approach to winnowing candidate NPs related to a behavior of interest in a functionally relevant manner, and demonstrates the success of such a UPLC-MS(E) quantification method using the open source software Skyline.
Towards ontology-driven navigation of the lipid bibliosphere
Baker, Christopher JO; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R
2008-01-01
Background The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. Results We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. Conclusion As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology. PMID:18315858
Towards ontology-driven navigation of the lipid bibliosphere.
Baker, Christopher Jo; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R
2008-01-01
The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology.
8. Engineering Drawing of Panama Gun Mount by U.S. Engineering ...
8. Engineering Drawing of Panama Gun Mount by U.S. Engineering Office, San Francisco, California - Fort Funston, Panama Mounts for 155mm Guns, Skyline Boulevard & Great Highway, San Francisco, San Francisco County, CA
76 FR 49753 - Privacy Act of 1974; System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-11
... Defense. DHA 14 System name: Computer/Electronics Accommodations Program for People with Disabilities... with ``Computer/Electronic Accommodations Program.'' System location: Delete entry and replace with ``Computer/Electronic Accommodations Program, Skyline 5, Suite 302, 5111 Leesburg Pike, Falls Church, VA...
The exhibit is a 10'x10' skyline truss which will be used to highlight the activities of the U.S.-German Bilateral Working Group in the area of brownfields revitalization. The U.S. product, Sustainable Management Approaches and Revitalization Tools - electronic (SMARTe) will be d...
Implementation of precast concrete deck system NUDECK (2nd generation).
DOT National Transportation Integrated Search
2013-12-01
The first generation of precast concrete deck system, NUDECK, developed by the University of NebraskaLincoln (UNL) for Nebraska Department of Roads (NDOR), was implemented on the Skyline Bridge, : Omaha, NE in 2004. The project was highly successful ...
2009-03-15
STS119-S-025 (15 March 2009) --- The setting sun paints the clouds over NASA's Kennedy Space Center in Florida before the launch of Space Shuttle Discovery on the STS-119 mission. Liftoff is scheduled for 7:43 p.m. (EDT) on March 15, 2009.
4. A river level view of the Broad Street bridge ...
4. A river level view of the Broad Street bridge and Columbus skyline from the railroad truss north of the bridge. - Broad Street Bridge, Spanning Scioto River at U.S. Route 40 (Broad Street), Columbus, Franklin County, OH
Deadly Everest Avalanche Site Spotted by NASA Spacecraft
2014-04-28
On Friday, April 26, 2014, an avalanche on Mount Everest killed at least 13 Sherpa guides. NASA Terra spacecraft looked toward the northeast, with Mount Everest center, and Lhotse, the fourth-highest mountain on Earth, on the skyline to right center.
101. Catalog HHistory 1, C.C.C., 34 Landscaping, Negative No. 1340 ...
101. Catalog H-History 1, C.C.C., 34 Landscaping, Negative No. 1340 (Photographer and date unknown) BANK BLENDING WORK BY CCC. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
98. Catalog HHistory 1, C.C.C., 19 Tree Planting, Negative No. ...
98. Catalog H-History 1, C.C.C., 19 Tree Planting, Negative No. P 474c (Photographer and date unknown) TRANSPLANTING TREE. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
Hu, Weiming; Fan, Yabo; Xing, Junliang; Sun, Liang; Cai, Zhaoquan; Maybank, Stephen
2018-09-01
We construct a new efficient near duplicate image detection method using a hierarchical hash code learning neural network and load-balanced locality-sensitive hashing (LSH) indexing. We propose a deep constrained siamese hash coding neural network combined with deep feature learning. Our neural network is able to extract effective features for near duplicate image detection. The extracted features are used to construct a LSH-based index. We propose a load-balanced LSH method to produce load-balanced buckets in the hashing process. The load-balanced LSH significantly reduces the query time. Based on the proposed load-balanced LSH, we design an effective and feasible algorithm for near duplicate image detection. Extensive experiments on three benchmark data sets demonstrate the effectiveness of our deep siamese hash encoding network and load-balanced LSH.
KBGIS-2: A knowledge-based geographic information system
NASA Technical Reports Server (NTRS)
Smith, T.; Peuquet, D.; Menon, S.; Agarwal, P.
1986-01-01
The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2.
Evaluating the constructability of NUDECK precast concrete deck panels for Kearney Bypass Project.
DOT National Transportation Integrated Search
2015-02-01
The first generation of precast concrete deck system, NUDECK, was implemented on the Skyline Bridge, : Omaha, NE in 2004. The second generation of NUDECK system was developed to further simplify the : system and improve its constructability and durab...
66. BIG MEADOWS. VIEW OF PARKING AREA AT THE GATED ...
66. BIG MEADOWS. VIEW OF PARKING AREA AT THE GATED ENTRANCE TO RAPIDAN FIRE ROAD, THE ACCESS ROAD TO CAMP HOOVER. LOOKING SOUTH, MILE 51.3. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
2. VIEW OF PARK SIGNAGE AT FRONT ROYAL. SIGN SAYS: ...
2. VIEW OF PARK SIGNAGE AT FRONT ROYAL. SIGN SAYS: "NORTH ENTRANCE SHENANDOAH NATIONAL PARK." LOCATED ON EXIT SIDE OF ROAD. LOOKING SOUTHWEST, MILE 0.0. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
100. Catalog HHistory 1, C.C.C., 34 Landscaping, Negative No. P ...
100. Catalog H-History 1, C.C.C., 34 Landscaping, Negative No. P 733c (Photographer and date unknown) SLOPE MAINTENANCE WORK BY CCC. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
99. Catalog HHistory 1, C.C.C., 23 Guard Rail Construction, Negative ...
99. Catalog H-History 1, C.C.C., 23 Guard Rail Construction, Negative No. P455e (Photographer and date unknown) GUARD RAIL INSTALLATION. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
10. Detail of map showing Battery Davis and Panama Gun ...
10. Detail of map showing Battery Davis and Panama Gun Mounts at right, by U.S. Engineering Office, San Francisco, California, August 5, 1934. - Fort Funston, Panama Mounts for 155mm Guns, Skyline Boulevard & Great Highway, San Francisco, San Francisco County, CA
5. VIEW LOOKING NORTHEAST INTO CENTRAL COURTYARD OF TECHWOOD DORMITORY, ...
5. VIEW LOOKING NORTHEAST INTO CENTRAL COURTYARD OF TECHWOOD DORMITORY, SHOWING WEST FRONT OF CENTER WING AND PART OF SOUTH SIDE OF NORTH WING. MIDTOWN SKYLINE VISIBLE IN BACKGROUND. - Techwood Homes, McDaniel Dormitory, 581-587 Techwood Drive, Atlanta, Fulton County, GA
Rulon B. Gardner
1980-01-01
Larch-fir stands in northwest Montana were experimentally logged to determine the influence of increasingly intensive levels of utilization upon rates of yarding production, under three different silvicultural prescriptions. Variables influencing rate of production were also identified.
CARIBIAM: constrained Association Rules using Interactive Biological IncrementAl Mining.
Rahal, Imad; Rahhal, Riad; Wang, Baoying; Perrizo, William
2008-01-01
This paper analyses annotated genome data by applying a very central data-mining technique known as Association Rule Mining (ARM) with the aim of discovering rules and hypotheses capable of yielding deeper insights into this type of data. In the literature, ARM has been noted for producing an overwhelming number of rules. This work proposes a new technique capable of using domain knowledge in the form of queries in order to efficiently mine only the subset of the associations that are of interest to investigators in an incremental and interactive manner.
75 FR 5289 - Defense Health Board (DHB) Meeting
Federal Register 2010, 2011, 2012, 2013, 2014
2010-02-02
... DEPARTMENT OF DEFENSE Office of the Secretary Defense Health Board (DHB) Meeting AGENCY... announces that the Defense Health Board (DHB or Board) will meet on March 1-2, 2010, to address and.... Feeks, Executive Secretary, Defense Health Board, Five Skyline Place, 5111 Leesburg Pike, Suite 810...
3 CFR 8410 - Proclamation 8410 of September 3, 2009. National Days of Prayer and Remembrance, 2009
Code of Federal Regulations, 2010 CFR
2010-01-01
... struck the skyline of New York City, the structure of the Pentagon, and the grass of Pennsylvania. In the... world. They have left the safety of home so that our Nation might be more secure. They have endured...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-07
... commercial and noncommercial vegetation management and road system modifications and maintenance. DATES... stands and old forest habitat; (2) improve watershed conditions and reduce road- related impacts to... commercial timber harvest on about 3,265 acres utilizing tractor/off-road jammer (1,124 acres), skyline (926...
Economics of hardwood silviculture using skyline and conventional logging
John E. Baumgras; Gary W. Miller; Chris B. LeDoux
1995-01-01
Managing Appalachian hardwood forests to satisfy the growing and diverse demands on this resource will require alternatives to traditional silvicultural methods and harvesting systems. Determining the relative economic efficiency of these alternative methods and systems with respect to harvest cash flows is essential. The effects of silvicultural methods and roundwood...
Block 3. Central view of Block 3 observed from the ...
Block 3. Central view of Block 3 observed from the west to the east. This photograph reveals the alignment of trees within the central path of the park. In addition, this photograph exposes broken bricks aligning tree beds - Skyline Park, 1500-1800 Arapaho Street, Denver, Denver County, CO
SIMYAR: a cable-yarding simulation model.
R.J. McGaughey; R.H. Twito
1987-01-01
A skyline-logging simulation model designed to help planners evaluate potential yarding options and alternative harvest plans is presented. The model, called SIMYAR, uses information about the timber stand, yarding equipment, and unit geometry to estimate yarding co stand productivity for a particular operation. The costs of felling, bucking, loading, and hauling are...
Balloon logging with the inverted skyline
NASA Technical Reports Server (NTRS)
Mosher, C. F.
1975-01-01
There is a gap in aerial logging techniques that has to be filled. The need for a simple, safe, sizeable system has to be developed before aerial logging will become effective and accepted in the logging industry. This paper presents such a system designed on simple principles with realistic cost and ecological benefits.
97. Catalog B, Higher Plants, 200 2 American Chestnut Tree, ...
97. Catalog B, Higher Plants, 200 2 American Chestnut Tree, Negative No. 6032 (Photographer and date unknown) THIS GHOST FOREST OF BLIGHTED CHESTNUTS ONCE STOOD APPROXIMATELY AT THE LOCATION OF THE BYRD VISITOR CENTER. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
ERIC Educational Resources Information Center
Fedore, Heidi
2005-01-01
In 2002, with pressure on students and educators mounting regarding performance on standardized tests, the author, who is an assistant principal at Skyline High School in Issaquah, Washington, and some staff members decided to have a little fun in the midst of the preparation for the state's high-stakes test, the Washington Assessment of Student…
PHOTOGRAPH NUMBERS 40, 39, 38 FORM A 189 DEGREE PANORAMA ...
PHOTOGRAPH NUMBERS 40, 39, 38 FORM A 189 DEGREE PANORAMA FROM LEFT TO RIGHT. PHOTOGRAPH NUMBER 38 LOOKING NORTHEAST TO SKYLINE FROM ROOF OF POLSON BUILDING; PHOTOGRAPH NUMBER 39 VIEW NORTH; PHOTOGRAPH NUMBER 40 VIEW NORTHWEST. - Alaskan Way Viaduct and Battery Street Tunnel, Seattle, King County, WA
Federal Register 2010, 2011, 2012, 2013, 2014
2012-02-17
...) Multicolor Inc.; (7) Novelty Handicrafts Co., Ltd.; (8) Pacific Imports; (9) Papillon Ribbon & Bow (Canada... Lion Ribbon Company, Inc., for the following companies: (1) Apex Ribbon; (2) Apex Trimmings; (3) FinerRibbon.com ; (4) Hsien Chan Enterprise Co., Ltd.; (5) Hubschercorp; (6) Intercontinental Skyline; (7...
Nutrient losses from timber harvesting in a larch/ Douglas-fir forest
Nellie M. Stark
1979-01-01
Nutrient levels as a result of experimental clearcutting, shelterwood cutting, and group selection cutting - each with three levels of harvesting intensity - were studied in a larchfir forest in northwest Montana, experimentally logged with a skyline system. None of the treatments altered nutrient levels in an intermittent stream, nor were excessive amounts of...
Trends in streamflow and suspended sediment after logging, North Fork Caspar Creek
Jack Lewis; Elizabeth T. Keppeler
2007-01-01
Streamflow and suspended sediment were intensively monitored at fourteen gaging stations before and after logging a second-growth redwood (Sequoia sempervirens) forest. About 50 percent of the watershed was harvested, primarily by clear-cutting with skyline-cable systems. New road construction and tractor skidding were restricted to gently-sloping...
102. Catalog HHistory 1, C.C.C., 34 Landscaping, Negative No. 6040a ...
102. Catalog H-History 1, C.C.C., 34 Landscaping, Negative No. 6040a (Photographer and date unknown) BEAUTIFICATION PROGRAM STARTED AS SOON AS GRADING ALONG THE DRIVE WAS COMPLETED. CCC CAMP 3 SHOWN PLANTING LAUREL. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
Smith Assists in Superstorm Sandy Relief Efforts | Poster
By Cathy McClintock, Guest Writer It should have been routine by now for a 30-year volunteer firefighter/ emergency medical technician from Thurmont, Md., but it wasn’t. That first night, as Ross Smith, IT security, looked across the Hudson River from Jersey City, N.J., he saw an unusually dark New York skyline.
The Automatic Recognition of the Abnormal Sky-subtraction Spectra Based on Hadoop
NASA Astrophysics Data System (ADS)
An, An; Pan, Jingchang
2017-10-01
The skylines, superimposing on the target spectrum as a main noise, If the spectrum still contains a large number of high strength skylight residuals after sky-subtraction processing, it will not be conducive to the follow-up analysis of the target spectrum. At the same time, the LAMOST can observe a quantity of spectroscopic data in every night. We need an efficient platform to proceed the recognition of the larger numbers of abnormal sky-subtraction spectra quickly. Hadoop, as a distributed parallel data computing platform, can deal with large amounts of data effectively. In this paper, we conduct the continuum normalization firstly and then a simple and effective method will be presented to automatic recognize the abnormal sky-subtraction spectra based on Hadoop platform. Obtain through the experiment, the Hadoop platform can implement the recognition with more speed and efficiency, and the simple method can recognize the abnormal sky-subtraction spectra and find the abnormal skyline positions of different residual strength effectively, can be applied to the automatic detection of abnormal sky-subtraction of large number of spectra.
Model for Evaluating the Cost Consequences of Deferring New System Acquisition Through Upgrades
1999-07-01
Analysis & Evaluation The Pentagon Washington, DC 20301 Attn: Mr. Eric Coulter, Director Projection Forces Division, Room 2E314 Lt Col Kathleen Conley...1034 Office of the Air National Guard ANG/AQM 5109 Leesburg Pike Skyline VI, Suite 302A Falls Church, VA 22041-3201 Attn: Col Brent Marler 1 Lt Col
Installation and use of epoxy-grouted rock anchors for skyline logging in southeast Alaska.
W.L. Schroeder; D.N. Swanston
1992-01-01
Field tests of the load-carrying capacity of epoxy-grouted rock anchors in poor quality bedrock on Wrangel Island in southeast Alaska demonstrated the effectiveness of rock anchors as substitutes for stump anchors for logging system guylines. Ultimate capacity depends mainly on rock hardness or strength and length of the imbedded anchor.
An earth anchor system: installation and design guide.
R.L. Copstead; D.D. Studier
1990-01-01
A system for anchoring the guylines and skylines of cable yarding equipment is presented. A description of three types of tipping plate anchors is given. Descriptions of the installation equipment and methods specific to each type are given. Procedures for determining the correct number of anchors to install are included, as are guidelines for installing the anchors so...
Production and cost of a live skyline cable yarder tested in Appalachia
Edward L. Fisher; Harry G. Gibson; Cleveland J. Biller
1980-01-01
Logging systems that are profitable and environmentally acceptable are needed in Appalachian hardwood forests. Small, mobile cable yarders show promise in meeting both economic and environmental objectives. One such yarder, the Ecologger, was tested on the Jefferson National Forest near Marion, Virginia. Production rates and costs are presented for the system along...
103. Catalog HHistory 1, C.C.C., 58 Landscaping, Negative No. 870 ...
103. Catalog H-History 1, C.C.C., 58 Landscaping, Negative No. 870 10 ca. 1936 PROPAGATION AND PLANTING. ROOTED PLANTS TRANSPLANTED FROM HOT BEDS TO CANS TO SHADED BEDS IN PREPARATION FOR PLANTING ON ROAD SLOPES. NURSERY AT NORTH ENTRANCE. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
Cost and production analysis of the Bitterroot Miniyarder on an Appalachian hardwood site
John E. Baumgras; Penn A. Peters; Penn A. Peters
1985-01-01
An 18-horsepower skyline yarder was studied on a steep slope clearcut, yarding small hardwood trees uphill for fuelwood. Yarding cycle characteristics sampled include: total cycle time including delays, 5.20 minutes; yarding distance, 208 feet (350 feet maximum); turn volume, 11.6 cubic feet (24 cubic feet maximum); pieces per turn, 2.3. Cost analysis shows yarding...
Gary D. Falk
1981-01-01
A systematic procedure for predicting the payload capability of running, live, and standing skylines is presented. Three hand-held calculator programs are used to predict payload capability that includes the effect of partial suspension. The programs allow for predictions for downhill yarding and for yarding away from the yarder. The equations and basic principles...
A second look at cable logging in the Appalachians
Harry G. Gibson; Cleveland J. Biller
1975-01-01
Cable logging, once used extensively in the Appalachians, is being re-examined to see if smaller, more mobile systems can help solve some of the timber-managment problems on steep slopes. A small Austrian skyline was tested in West Virginia to determine its feasibility for harvesting enstern hardwoods. The short-term test included both selection and clearcut harvesting...
Gods of the City? Reflecting on City Building Games as an Early Introduction to Urban Systems
ERIC Educational Resources Information Center
Bereitschaft, Bradley
2016-01-01
For millions of gamers and students alike, city building games (CBGs) like SimCity and the more recent Cities: Skylines present a compelling initial introduction to the world of urban planning and development. As such, these games have great potential to shape players' understanding and expectations of real urban patterns and processes. In this…
DNA breaks and end resection measured genome-wide by end sequencing | Center for Cancer Research
About the Cover The cover depicts a ribbon of DNA portrayed as a city skyline. The central gap in the landscape localizes to the precise site of the DNA break. The features surrounding the break denote the processing of DNA-end structures (end-resection) emanating from the break location. Cover artwork by Ethan Tyler, NIH. Abstract
ERIC Educational Resources Information Center
Yeager, Susan Cadavid
2017-01-01
This case study examined the implementation of a baccalaureate degree at Skyline Community College--one of the 15 California community colleges authorized to offer baccalaureate degrees established as part of a pilot program enacted by the California Legislature via Senate Bill 850 (2014). The study explored the policies and procedures in place at…
Ait Kaci Azzou, Sadoune; Larribe, Fabrice; Froda, Sorana
2015-01-01
The effective population size over time (demographic history) can be retraced from a sample of contemporary DNA sequences. In this paper, we propose a novel methodology based on importance sampling (IS) for exploring such demographic histories. Our starting point is the generalized skyline plot with the main difference being that our procedure, skywis plot, uses a large number of genealogies. The information provided by these genealogies is combined according to the IS weights. Thus, we compute a weighted average of the effective population sizes on specific time intervals (epochs), where the genealogies that agree more with the data are given more weight. We illustrate by a simulation study that the skywis plot correctly reconstructs the recent demographic history under the scenarios most commonly considered in the literature. In particular, our method can capture a change point in the effective population size, and its overall performance is comparable with the one of the bayesian skyline plot. We also introduce the case of serially sampled sequences and illustrate that it is possible to improve the performance of the skywis plot in the case of an exponential expansion of the effective population size. PMID:26300910
NASA Astrophysics Data System (ADS)
MacDonald, B.; Finot, M.; Heiken, B.; Trowbridge, T.; Ackler, H.; Leonard, L.; Johnson, E.; Chang, B.; Keating, T.
2009-08-01
Skyline Solar Inc. has developed a novel silicon-based PV system to simultaneously reduce energy cost and improve scalability of solar energy. The system achieves high gain through a combination of high capacity factor and optical concentration. The design approach drives innovation not only into the details of the system hardware, but also into manufacturing and deployment-related costs and bottlenecks. The result of this philosophy is a modular PV system whose manufacturing strategy relies only on currently existing silicon solar cell, module, reflector and aluminum parts supply chains, as well as turnkey PV module production lines and metal fabrication industries that already exist at enormous scale. Furthermore, with a high gain system design, the generating capacity of all components is multiplied, leading to a rapidly scalable system. The product design and commercialization strategy cooperate synergistically to promise dramatically lower LCOE with substantially lower risk relative to materials-intensive innovations. In this paper, we will present the key design aspects of Skyline's system, including aspects of the optical, mechanical and thermal components, revealing the ease of scalability, low cost and high performance. Additionally, we will present performance and reliability results on modules and the system, using ASTM and UL/IEC methodologies.
Upscaling of greenhouse gas emissions in upland forestry following clearfell
NASA Astrophysics Data System (ADS)
Toet, Sylvia; Keane, Ben; Yamulki, Sirwan; Blei, Emanuel; Gibson-Poole, Simon; Xenakis, Georgios; Perks, Mike; Morison, James; Ineson, Phil
2016-04-01
Data on greenhouse gas (GHG) emissions caused by forest management activities are limited. Management such as clearfelling may, however, have major impacts on the GHG balance of forests through effects of soil disturbance, increased water table, and brash and root inputs. Besides carbon dioxide (CO2), the biogenic GHGs nitrous oxide (N2O) and methane (CH4) may also contribute to GHG emissions from managed forests. Accurate flux estimates of all three GHGs are therefore necessary, but, since GHG emissions usually show large spatial and temporal variability, in particular CH4 and N2O fluxes, high-frequency GHG flux measurements and better understanding of their controls are central to improve process-based flux models and GHG budgets at multiple scales. In this study, we determined CO2, CH4 and N2O emissions following felling in a mature Sitka spruce (Picea sitchensis) stand in an upland forest in northern England. High-frequency measurements were made along a transect using a novel, automated GHG chamber flux system ('SkyLine') developed at the University of York. The replicated, linear experiment aimed (1) to quantify GHG emissions from three main topographical features at the clearfell site, i.e. the ridges on which trees had been planted, the hollows in between and the drainage ditches, and (2) to determine the effects of the green-needle component of the discarded brash. We also measured abiotic soil and climatic factors alongside the 'SkyLine' GHG flux measurements to identify drivers of the observed GHG emissions. All three topographic features were overall sources of GHG emissions (in CO2 equivalents), and, although drainage ditches are often not included in studies, GHG emissions per unit area were highest from ditches, followed by ridges and lowest in hollows. The CO2 emissions were most important in the GHG balance of ridges and hollows, but CH4 emissions were very high from the drainage ditches, contributing to over 50% of their overall net GHG emissions. Ridges usually emitted N2O, whilst N2O emissions from hollows and ditches were very low. As much as 25% of the total GHG flux resulted from large intermittent emissions from the ditches following rainfall. Addition of green needles from the brash immediately increased soil respiration and reduced CH4 emission in comparison to controls. To upscale our high-frequency 'SkyLine' GHG flux measurements at the different topographic features to the field scale, we collected high resolution imagery from unmanned aerial vehicle (UAV) flights. We will compare results using this upscaling technique to GHG emissions simultaneously measured by eddy covariance with the 'SkyLine' system in the predominant footprint. This detailed knowledge of the spatial and temporal distribution of GHG emissions in an upland forest after felling and their drivers, and development of robust upscaling techniques can provide important tools to improve GHG flux models and to design appropriate management practices in upland forestry to mitigate GHG emissions following clearfell.
Predicting bunching costs for the Radio Horse 9 winch
Chris B. LeDoux; Bruce W. Kling; Patrice A. Harou; Patrice A. Harou
1987-01-01
Data from field studies and a prebunching cost simulator have been assembled and converted into a general equation that can be used to estimate the prebunching cost of the Radio Horse 9 winch. The methods can be used to estimate prebunching cost for bunching under the skyline corridor for swinging with cable systems, for bunching to skid trail edge to be picked up by a...
A topographic index to quantify the effect of mesoscale and form on site productivity
W. Henry McNab
1992-01-01
Landform is related to environmental factorsthat affectsite productivity in mountainous areas. I devised a simple index of landform and tested this index as a predictor of site index Ãn the Blue Ridge physiographic province. The landform index is the mean of eight slope gradients from plot center to skyline. A preliminary test indicated that the index was...
76 FR 76684 - Idaho: Tentative Approval of State Underground Storage Tank Program
Federal Register 2010, 2011, 2012, 2013, 2014
2011-12-08
.... Skyline, Suite B, Idaho Falls, ID 83402 from 10 a.m. to 12 p.m. and 1 p.m. to 4 p.m.; and 6. IDEQ Lewiston... ENVIRONMENTAL PROTECTION AGENCY 40 CFR Part 281 [EPA-R10-UST-2011-0896; FRL-9502-6] Idaho...). ACTION: Proposed rule. SUMMARY: The State of Idaho has applied for final approval of its Underground...
104. Catalog HHistory 1, C.C.C., 73 Picnic Furniture Construction, Negative ...
104. Catalog H-History 1, C.C.C., 73 Picnic Furniture Construction, Negative No. 8821 ca. 1936 WOOD UTILIZATION. COMPLETED RUSTIC BENCH MADE BY CCC ENROLLEES AT CAMP NP-3 FOR USE AT PARKING OVERLOOKS AND PICNIC GROUNDS. NOTE SAW IN BACKGROUND USED FOR HALVING CHESTNUT. - Skyline Drive, From Front Royal, VA to Rockfish Gap, VA , Luray, Page County, VA
Raza, Muhammad Taqi; Yoo, Seung-Wha; Kim, Ki-Hyung; Joo, Seong-Soon; Jeong, Wun-Cheol
2009-01-01
Web Portals function as a single point of access to information on the World Wide Web (WWW). The web portal always contacts the portal’s gateway for the information flow that causes network traffic over the Internet. Moreover, it provides real time/dynamic access to the stored information, but not access to the real time information. This inherent functionality of web portals limits their role for resource constrained digital devices in the Ubiquitous era (U-era). This paper presents a framework for the web portal in the U-era. We have introduced the concept of Local Regions in the proposed framework, so that the local queries could be solved locally rather than having to route them over the Internet. Moreover, our framework enables one-to-one device communication for real time information flow. To provide an in-depth analysis, firstly, we provide an analytical model for query processing at the servers for our framework-oriented web portal. At the end, we have deployed a testbed, as one of the world’s largest IP based wireless sensor networks testbed, and real time measurements are observed that prove the efficacy and workability of the proposed framework. PMID:22346693
Raza, Muhammad Taqi; Yoo, Seung-Wha; Kim, Ki-Hyung; Joo, Seong-Soon; Jeong, Wun-Cheol
2009-01-01
Web Portals function as a single point of access to information on the World Wide Web (WWW). The web portal always contacts the portal's gateway for the information flow that causes network traffic over the Internet. Moreover, it provides real time/dynamic access to the stored information, but not access to the real time information. This inherent functionality of web portals limits their role for resource constrained digital devices in the Ubiquitous era (U-era). This paper presents a framework for the web portal in the U-era. We have introduced the concept of Local Regions in the proposed framework, so that the local queries could be solved locally rather than having to route them over the Internet. Moreover, our framework enables one-to-one device communication for real time information flow. To provide an in-depth analysis, firstly, we provide an analytical model for query processing at the servers for our framework-oriented web portal. At the end, we have deployed a testbed, as one of the world's largest IP based wireless sensor networks testbed, and real time measurements are observed that prove the efficacy and workability of the proposed framework.
Cycle-time equation for the Koller K300 cable yarder operating on steep slopes in the Northeast
Neil K. Huyler; Chris B. LeDoux
1997-01-01
Describes a delay-free-cycle time equation for the Koller K300 skyline yarder operating on steep slopes in the Northeast. Using the equation, the average delay-free-cycle time was 5.72 minutes. This means that about 420 cubic feet of material per hour can be produced. The important variables used in the equation were slope yarding distance, lateral yarding distance,...
Reliable Execution Based on CPN and Skyline Optimization for Web Service Composition
Ha, Weitao; Zhang, Guojun
2013-01-01
With development of SOA, the complex problem can be solved by combining available individual services and ordering them to best suit user's requirements. Web services composition is widely used in business environment. With the features of inherent autonomy and heterogeneity for component web services, it is difficult to predict the behavior of the overall composite service. Therefore, transactional properties and nonfunctional quality of service (QoS) properties are crucial for selecting the web services to take part in the composition. Transactional properties ensure reliability of composite Web service, and QoS properties can identify the best candidate web services from a set of functionally equivalent services. In this paper we define a Colored Petri Net (CPN) model which involves transactional properties of web services in the composition process. To ensure reliable and correct execution, unfolding processes of the CPN are followed. The execution of transactional composition Web service (TCWS) is formalized by CPN properties. To identify the best services of QoS properties from candidate service sets formed in the TCSW-CPN, we use skyline computation to retrieve dominant Web service. It can overcome that the reduction of individual scores to an overall similarity leads to significant information loss. We evaluate our approach experimentally using both real and synthetically generated datasets. PMID:23935431
Maclean, Brendan; Tomazela, Daniela M; Abbatiello, Susan E; Zhang, Shucha; Whiteaker, Jeffrey R; Paulovich, Amanda G; Carr, Steven A; Maccoss, Michael J
2010-12-15
Proteomics experiments based on Selected Reaction Monitoring (SRM, also referred to as Multiple Reaction Monitoring or MRM) are being used to target large numbers of protein candidates in complex mixtures. At present, instrument parameters are often optimized for each peptide, a time and resource intensive process. Large SRM experiments are greatly facilitated by having the ability to predict MS instrument parameters that work well with the broad diversity of peptides they target. For this reason, we investigated the impact of using simple linear equations to predict the collision energy (CE) on peptide signal intensity and compared it with the empirical optimization of the CE for each peptide and transition individually. Using optimized linear equations, the difference between predicted and empirically derived CE values was found to be an average gain of only 7.8% of total peak area. We also found that existing commonly used linear equations fall short of their potential, and should be recalculated for each charge state and when introducing new instrument platforms. We provide a fully automated pipeline for calculating these equations and individually optimizing CE of each transition on SRM instruments from Agilent, Applied Biosystems, Thermo-Scientific and Waters in the open source Skyline software tool ( http://proteome.gs.washington.edu/software/skyline ).
Implementation of statistical process control for proteomic experiments via LC MS/MS.
Bereman, Michael S; Johnson, Richard; Bollinger, James; Boss, Yuval; Shulman, Nick; MacLean, Brendan; Hoofnagle, Andrew N; MacCoss, Michael J
2014-04-01
Statistical process control (SPC) is a robust set of tools that aids in the visualization, detection, and identification of assignable causes of variation in any process that creates products, services, or information. A tool has been developed termed Statistical Process Control in Proteomics (SProCoP) which implements aspects of SPC (e.g., control charts and Pareto analysis) into the Skyline proteomics software. It monitors five quality control metrics in a shotgun or targeted proteomic workflow. None of these metrics require peptide identification. The source code, written in the R statistical language, runs directly from the Skyline interface, which supports the use of raw data files from several of the mass spectrometry vendors. It provides real time evaluation of the chromatographic performance (e.g., retention time reproducibility, peak asymmetry, and resolution), and mass spectrometric performance (targeted peptide ion intensity and mass measurement accuracy for high resolving power instruments) via control charts. Thresholds are experiment- and instrument-specific and are determined empirically from user-defined quality control standards that enable the separation of random noise and systematic error. Finally, Pareto analysis provides a summary of performance metrics and guides the user to metrics with high variance. The utility of these charts to evaluate proteomic experiments is illustrated in two case studies.
Reliable execution based on CPN and skyline optimization for Web service composition.
Chen, Liping; Ha, Weitao; Zhang, Guojun
2013-01-01
With development of SOA, the complex problem can be solved by combining available individual services and ordering them to best suit user's requirements. Web services composition is widely used in business environment. With the features of inherent autonomy and heterogeneity for component web services, it is difficult to predict the behavior of the overall composite service. Therefore, transactional properties and nonfunctional quality of service (QoS) properties are crucial for selecting the web services to take part in the composition. Transactional properties ensure reliability of composite Web service, and QoS properties can identify the best candidate web services from a set of functionally equivalent services. In this paper we define a Colored Petri Net (CPN) model which involves transactional properties of web services in the composition process. To ensure reliable and correct execution, unfolding processes of the CPN are followed. The execution of transactional composition Web service (TCWS) is formalized by CPN properties. To identify the best services of QoS properties from candidate service sets formed in the TCSW-CPN, we use skyline computation to retrieve dominant Web service. It can overcome that the reduction of individual scores to an overall similarity leads to significant information loss. We evaluate our approach experimentally using both real and synthetically generated datasets.
John E. Baumgras; Chris B. LeDoux
1986-01-01
Cable yarding can reduce the environmental impact of timber harvesting on steep slopes by increasing road spacing and reducing soil disturbance. To determine the cost of harvesting forest biomass with a small cable yarder, a 13.4 kW (18 hp) skyline yarder was tested on two southern Appalachian sites. At both sites, fuelwood was harvested from the boles of hardwood...
2. A panoramic view of the historical district as seen ...
2. A panoramic view of the historical district as seen from the top of the Waterford Towers. This picture shows the Town Street bridge in the foreground, the Broad Street bridge in the background, Central High School on the left and the Columbus skyline on the right (facing north), and Bicentennial Park just below. - Broad Street Bridge, Spanning Scioto River at U.S. Route 40 (Broad Street), Columbus, Franklin County, OH
Tan, Chee-Heng; Teh, Ying-Wah
2013-08-01
The main obstacles in mass adoption of cloud computing for database operations in healthcare organization are the data security and privacy issues. In this paper, it is shown that IT services particularly in hardware performance evaluation in virtual machine can be accomplished effectively without IT personnel gaining access to actual data for diagnostic and remediation purposes. The proposed mechanisms utilized the hypothetical data from TPC-H benchmark, to achieve 2 objectives. First, the underlying hardware performance and consistency is monitored via a control system, which is constructed using TPC-H queries. Second, the mechanism to construct stress-testing scenario is envisaged in the host, using a single or combination of TPC-H queries, so that the resource threshold point can be verified, if the virtual machine is still capable of serving critical transactions at this constraining juncture. This threshold point uses server run queue size as input parameter, and it serves 2 purposes: It provides the boundary threshold to the control system, so that periodic learning of the synthetic data sets for performance evaluation does not reach the host's constraint level. Secondly, when the host undergoes hardware change, stress-testing scenarios are simulated in the host by loading up to this resource threshold level, for subsequent response time verification from real and critical transactions.
User-Driven Geolocation of Untagged Desert Imagery Using Digital Elevation Models (Open Access)
2013-09-12
IEEE International Conference on, pages 3677–3680. IEEE, 2011. [13] W. Zhang and J. Kosecka. Image based localization in urban environments. In 3D ...non- urban environments such as deserts. Our system generates synthetic skyline views from a DEM and extracts stable concavity-based features from these...fine as 100m2. 1. Introduction Automatic geolocation of imagery has many exciting use cases. For example, such a tool could semantically orga- nize
User-Driven Geolocation of Untagged Desert Imagery Using Digital Elevation Models
2013-01-01
Conference on, pages 3677–3680. IEEE, 2011. [13] W. Zhang and J. Kosecka. Image based localization in urban environments. In 3D Data Processing...non- urban environments such as deserts. Our system generates synthetic skyline views from a DEM and extracts stable concavity-based features from these...fine as 100m2. 1. Introduction Automatic geolocation of imagery has many exciting use cases. For example, such a tool could semantically orga- nize
Nasso, Sara; Goetze, Sandra; Martens, Lennart
2015-09-04
Selected reaction monitoring (SRM) MS is a highly selective and sensitive technique to quantify protein abundances in complex biological samples. To enhance the pace of SRM large studies, a validated, robust method to fully automate absolute quantification and to substitute for interactive evaluation would be valuable. To address this demand, we present Ariadne, a Matlab software. To quantify monitored targets, Ariadne exploits metadata imported from the transition lists, and targets can be filtered according to mProphet output. Signal processing and statistical learning approaches are combined to compute peptide quantifications. To robustly estimate absolute abundances, the external calibration curve method is applied, ensuring linearity over the measured dynamic range. Ariadne was benchmarked against mProphet and Skyline by comparing its quantification performance on three different dilution series, featuring either noisy/smooth traces without background or smooth traces with complex background. Results, evaluated as efficiency, linearity, accuracy, and precision of quantification, showed that Ariadne's performance is independent of data smoothness and complex background presence and that Ariadne outperforms mProphet on the noisier data set and improved 2-fold Skyline's accuracy and precision for the lowest abundant dilution with complex background. Remarkably, Ariadne could statistically distinguish from each other all different abundances, discriminating dilutions as low as 0.1 and 0.2 fmol. These results suggest that Ariadne offers reliable and automated analysis of large-scale SRM differential expression studies.
NASA Technical Reports Server (NTRS)
Dunham, R. S.
1976-01-01
FORTRAN coded out-of-core equation solvers that solve using direct methods symmetric banded systems of simultaneous algebraic equations. Banded, frontal and column (skyline) solvers were studied as well as solvers that can partition the working area and thus could fit into any available core. Comparison timings are presented for several typical two dimensional and three dimensional continuum type grids of elements with and without midside nodes. Extensive conclusions are also given.
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
Lustration: Transitional Justice in Poland and Its Continuous Struggle to Make Means With the Past
2008-06-01
Warsaw, just as the secret police did over its citizens. The skyline of Warsaw, dominated by this building, offers a daily reminder of life under the...the communist regime (especially acts of collaboration with the secret police) and in turn disqualifying members of these groups from holding high...Ministry of Interior for their name to be vetted through the Secret Police files of the former regime.3 A similar approach was adopted in Poland, but due
Ancient Chinese Astronomy - An Overview
NASA Astrophysics Data System (ADS)
Shi, Yunli
Documentary and archaeological evidence testifies the early origin and continuous development of ancient Chinese astronomy to meet both the ideological and practical needs of a society largely based on agriculture. There was a long period when the beginning of the year, month, and season was determined by direct observation of celestial phenomena, including their alignments with respect to the local skyline. As the need for more exact study arose, new instruments for more exact observation were invented and the system of calendrical astronomy became entirely mathematized.
An approach in building a chemical compound search engine in oracle database.
Wang, H; Volarath, P; Harrison, R
2005-01-01
A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.
Issues central to a useful image understanding environment
NASA Astrophysics Data System (ADS)
Beveridge, J. Ross; Draper, Bruce A.; Hanson, Allen R.; Riseman, Edward M.
1992-04-01
A recent DARPA initiative has sparked interested in software environments for computer vision. The goal is a single environment to support both basic research and technology transfer. This paper lays out six fundamental attributes such a system must possess: (1) support for both C and Lisp, (2) extensibility, (3) data sharing, (4) data query facilities tailored to vision, (5) graphics, and (6) code sharing. The first three attributes fundamentally constrain the system design. Support for both C and Lisp demands some form of database or data-store for passing data between languages. Extensibility demands that system support facilities, such as spatial retrieval of data, be readily extended to new user-defined datatypes. Finally, data sharing demands that data saved by one user, including data of a user-defined type, must be readable by another user.
Morris, Melody K; Shriver, Zachary; Sasisekharan, Ram; Lauffenburger, Douglas A
2012-03-01
Mathematical models have substantially improved our ability to predict the response of a complex biological system to perturbation, but their use is typically limited by difficulties in specifying model topology and parameter values. Additionally, incorporating entities across different biological scales ranging from molecular to organismal in the same model is not trivial. Here, we present a framework called "querying quantitative logic models" (Q2LM) for building and asking questions of constrained fuzzy logic (cFL) models. cFL is a recently developed modeling formalism that uses logic gates to describe influences among entities, with transfer functions to describe quantitative dependencies. Q2LM does not rely on dedicated data to train the parameters of the transfer functions, and it permits straight-forward incorporation of entities at multiple biological scales. The Q2LM framework can be employed to ask questions such as: Which therapeutic perturbations accomplish a designated goal, and under what environmental conditions will these perturbations be effective? We demonstrate the utility of this framework for generating testable hypotheses in two examples: (i) a intracellular signaling network model; and (ii) a model for pharmacokinetics and pharmacodynamics of cell-cytokine interactions; in the latter, we validate hypotheses concerning molecular design of granulocyte colony stimulating factor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The connection between landscapes and the solar ephemeris in honeybees.
Towne, William F; Moscrip, Heather
2008-12-01
Honeybees connect the sun's daily pattern of azimuthal movement to some aspect of the landscape around their nests. In the present study, we ask what aspect of the landscape is used in this context--the entire landscape panorama or only sectors seen along familiar flight routes. Previous studies of the solar ephemeris memory in bees have generally used bees that had experience flying a specific route, usually along a treeline, to a feeder. When such bees were moved to a differently oriented treeline on overcast days, the bees oriented their communicative dances as if they were still at the first treeline, based on a memory of the sun's course in relation to some aspect of the site, possibly the familiar route along the treeline or possibly the entire landscape or skyline panorama. Our results show that bees lacking specific flight-route training can nonetheless recall the sun's compass bearing relative to novel flight routes in their natal landscape. Specifically, we moved a hive from one landscape to a differently oriented twin landscape, and only after transplantation under overcast skies did we move a feeder away from the hive. These bees nonetheless danced accurately by memory of the sun's course in relation to their natal landscape. The bees' knowledge of the relationship between the sun and landscape, therefore, is not limited to familiar flight routes and so may encompass, at least functionally, the entire panorama. Further evidence suggests that the skyline in particular may be the bees' preferred reference in this context.
Paraskevis, Dimitrios; Paraschiv, Simona; Sypsa, Vana; Nikolopoulos, Georgios; Tsiara, Chryssa; Magiorkinis, Gkikas; Psichogiou, Mina; Flampouris, Andreas; Mardarescu, Mariana; Niculescu, Iulia; Batan, Ionelia; Malliori, Meni; Otelea, Dan; Hatzakis, Angelos
2015-10-01
A significant increase in HIV-1 diagnoses was reported among Injecting Drug Users (IDUs) in the Athens (17-fold) and Bucharest (9-fold) metropolitan areas starting 2011. Molecular analyses were conducted on HIV-1 sequences from IDUs comprising 51% and 20% of the diagnosed cases among IDUs during 2011-2013 for Greece and Romania, respectively. Phylodynamic analyses were performed using the newly developed birth-death serial skyline model which allows estimating of important epidemiological parameters, as implemented in BEAST programme. Most infections (>90%) occurred within four and three IDU local transmission networks in Athens and Bucharest, respectively. For all Romanian clusters, the viral strains originated from local circulating strains, whereas in Athens, the local strains seeded only two of the four sub-outbreaks. Birth-death skyline plots suggest a more explosive nature for sub-outbreaks in Bucharest than in Athens. In Athens, two sub-outbreaks had been controlled (Re<1.0) by 2013 and two appeared to be endemic (Re∼1). In Bucharest one outbreak continued to expand (Re>1.0) and two had been controlled (Re<1.0). The lead times were shorter for the outbreak in Athens than in Bucharest. Enhanced molecular surveillance proved useful to gain information about the origin, causal pathways, dispersal patterns and transmission dynamics of the outbreaks that can be useful in a public health setting. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ashwood, Christopher; Lin, Chi-Hung; Thaysen-Andersen, Morten; Packer, Nicolle H.
2018-03-01
Profiling cellular protein glycosylation is challenging due to the presence of highly similar glycan structures that play diverse roles in cellular physiology. As the anomericity and the exact linkage type of a single glycosidic bond can influence glycan function, there is a demand for improved and automated methods to confirm detailed structural features and to discriminate between structurally similar isomers, overcoming a significant bottleneck in the analysis of data generated by glycomics experiments. We used porous graphitized carbon-LC-ESI-MS/MS to separate and detect released N- and O-glycan isomers from mammalian model glycoproteins using negative mode resonance activation CID-MS/MS. By interrogating similar fragment spectra from closely related glycan isomers that differ only in arm position and sialyl linkage, product fragment ions for discrimination between these features were discovered. Using the Skyline software, at least two diagnostic fragment ions of high specificity were validated for automated discrimination of sialylation and arm position in N-glycan structures, and sialylation in O-glycan structures, complementing existing structural diagnostic ions. These diagnostic ions were shown to be useful for isomer discrimination using both linear and 3D ion trap mass spectrometers when analyzing complex glycan mixtures from cell lysates. Skyline was found to serve as a useful tool for automated assessment of glycan isomer discrimination. This platform-independent workflow can potentially be extended to automate the characterization and quantitation of other challenging glycan isomers. [Figure not available: see fulltext.
Jung, Daewui; Li, Qi; Kong, Ling-Feng; Ni, Gang; Nakano, Tomoyuki; Matsukuma, Akihiko; Kim, Sanghee; Park, Chungoo; Lee, Hyuk Je; Park, Joong-Ki
2015-01-01
The present-day genetic structure of a species reflects both historical demography and patterns of contemporary gene flow among populations. To precisely understand how these factors shape current population structure of the northwestern (NW) Pacific marine gastropod, Thais clavigera, we determined the partial nucleotide sequences of the mitochondrial COI gene for 602 individuals sampled from 29 localities spanning almost the whole distribution of T. clavigera in the NW Pacific Ocean (~3,700 km). Results from population genetic and demographic analyses (AMOVA, ΦST-statistics, haplotype networks, Tajima’s D, Fu’s FS, mismatch distribution, and Bayesian skyline plots) revealed a lack of genealogical branches or geographical clusters, and a high level of genetic (haplotype) diversity within each of studied population. Nevertheless, low but significant genetic structuring was detected among some geographical populations separated by the Changjiang River, suggesting the presence of geographical barriers to larval dispersal around this region. Several lines of evidence including significant negative Tajima’s D and Fu’s FS statistics values, the unimodally shaped mismatch distribution, and Bayesian skyline plots suggest a population expansion at marine isotope stage 11 (MIS 11; 400 ka), the longest and warmest interglacial interval during the Pleistocene epoch. The lack of genetic structure among the great majority of the NW Pacific T. clavigera populations may be attributable to high gene flow by current-driven long-distance dispersal of prolonged planktonic larval phase of this species. PMID:26171966
Query Auto-Completion Based on Word2vec Semantic Similarity
NASA Astrophysics Data System (ADS)
Shao, Taihua; Chen, Honghui; Chen, Wanyu
2018-04-01
Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
EquiX-A Search and Query Language for XML.
ERIC Educational Resources Information Center
Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander
2002-01-01
Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
Spatial and symbolic queries for 3D image data
NASA Astrophysics Data System (ADS)
Benson, Daniel C.; Zick, Gregory L.
1992-04-01
We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.
GenoQuery: a new querying module for functional annotation in a genomic warehouse
Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine
2008-01-01
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
SPARK: Adapting Keyword Query to Semantic Search
NASA Astrophysics Data System (ADS)
Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong
Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J
2016-08-02
Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
An advanced web query interface for biological databases
Latendresse, Mario; Karp, Peter D.
2010-01-01
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
SPARQL Query Re-writing Using Partonomy Based Transformation Rules
NASA Astrophysics Data System (ADS)
Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.
Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.
The full proteomics analysis of a small tumor sample (similar in mass to a few grains of rice) produces well over 500 megabytes of unprocessed "raw" data when analyzed on a mass spectrometer (MS). Thus, for every proteomics experiment there is a vast amount of raw data that must be analyzed and interrogated in order to extract biological information. Moreover, the raw data output from different MS vendors are generally in different formats inhibiting the ability of labs to productively work together.
Space Shuttle Discovery DC Fly-Over
2012-04-17
Space shuttle Discovery, mounted atop a NASA 747 Shuttle Carrier Aircraft (SCA), flies over the Washington skyline as seen from a NASA T-38 aircraft, Tuesday, April 17, 2012. Discovery, the first orbiter retired from NASA’s shuttle fleet, completed 39 missions, spent 365 days in space, orbited the Earth 5,830 times, and traveled 148,221,675 miles. NASA will transfer Discovery to the National Air and Space Museum to begin its new mission to commemorate past achievements in space and to educate and inspire future generations of explorers. Photo Credit: (NASA/Robert Markowitz)
2006-06-01
SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance
NASA Astrophysics Data System (ADS)
Wang, Chuan; Hao, Liang; Zhao, Lian-Jie
2011-08-01
We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.
A study of medical and health queries to web search engines.
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
2004-03-01
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Monitoring Moving Queries inside a Safe Region
Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan
2014-01-01
With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652
RDF-GL: A SPARQL-Based Graphical Query Language for RDF
NASA Astrophysics Data System (ADS)
Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.
Cumulative query method for influenza surveillance using search engine data.
Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il
2014-12-16
Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.
A Query Integrator and Manager for the Query Web
Brinkley, James F.; Detwiler, Landon T.
2012-01-01
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems
2013-01-01
as follows: —Let kall be the k value for the SNDOP-ALL query and for each SNDOP query i, let ki be the k for that query. For each query i, set ki... kall − 1. —Number each element of vi ∈ V such that gI(vi) and V C(vi) are true. For the ith SNDOP query, let vi be the corresponding element of V —Let...vertices of S. PROOF. We set up |V | SNDOP-queries as follows: —Let kall be the k value for the SNDOP-ALL query and and for each SNDOP-query i, let ki be
A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.
Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei
2013-10-08
Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.
Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen
2014-01-01
Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537
Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman
2014-07-04
The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.
Unique patellofemoral alignment in a patient with a symptomatic bipartite patella.
Ishikawa, Masakazu; Adachi, Nobuo; Deie, Masataka; Nakamae, Atsuo; Nakasa, Tomoyuki; Kamei, Goki; Takazawa, Kobun; Ochi, Mitsuo
2016-01-01
A symptomatic bipartite patella is rarely seen in athletic adolescents or young adults in daily clinical practice. To date, only a limited number of studies have focused on patellofemoral alignment. The current study revealed a unique patellofemoral alignment in a patient with a symptomatic bipartite patella. Twelve patients with 12 symptomatic bipartite patellae who underwent arthroscopic vastus lateralis release (VLR) were investigated (10 males and two females, age: 15.7±4.4years). The radiographic data of contralateral intact and affected knees were reviewed retrospectively. From the lateral- and skyline-view imaging, the following parameters were measured: the congruence angle (CA), the lateral patellofemoral angle (LPA), and the Caton-Deschamps index (CDI). As an additional parameter, the bipartite fragment angle (BFA) was evaluated against the main part of the patella in the skyline view. Compared with the contralateral side, the affected patellae were significantly medialized and laterally tilted (CA: P=0.019; LPA: P=0.016), although there was no significant difference in CDI (P=0.877). This patellar malalignment was found to significantly change after VLR (CA: P=0.001; LPA: P=0.003) and the patellar height was significantly lower than in the preoperative condition (P=0.016). In addition, the BFA significantly shifted to a higher degree after operation (P=0.001). Patients with symptomatic bipartite patellae presented significantly medialized and laterally tilted patellae compared with the contralateral intact side. This malalignment was corrected by VLR, and the alignment of the bipartite fragment was also significantly changed. Level IV, case series. Copyright © 2015 Elsevier B.V. All rights reserved.
SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory
NASA Astrophysics Data System (ADS)
Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.
2002-12-01
We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.
Teng, Rui; Leibnitz, Kenji; Miura, Ryu
2013-01-01
An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172
Ontological Approach to Military Knowledge Modeling and Management
2004-03-01
federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries
NASA Astrophysics Data System (ADS)
Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.
2015-05-01
A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.
VISAGE: Interactive Visual Graph Querying.
Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng
2016-06-01
Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.
VISAGE: Interactive Visual Graph Querying
Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng
2017-01-01
Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.
Jamil, Hasan M
2017-01-01
Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
Which factors predict the time spent answering queries to a drug information centre?
Reppe, Linda A.; Spigset, Olav
2010-01-01
Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480
Personalized query suggestion based on user behavior
NASA Astrophysics Data System (ADS)
Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui
Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-07-04
As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-01-01
Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323
Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas
2017-07-03
MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.
A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model
Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei
2013-01-01
Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078
Mining Longitudinal Web Queries: Trends and Patterns.
ERIC Educational Resources Information Center
Wang, Peiling; Berry, Michael W.; Yang, Yiheng
2003-01-01
Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…
Optimizing a Query by Transformation and Expansion.
Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank
2017-01-01
In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.
WATCHMAN: A Data Warehouse Intelligent Cache Manager
NASA Technical Reports Server (NTRS)
Scheuermann, Peter; Shim, Junho; Vingralek, Radek
1996-01-01
Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.
Assisting Consumer Health Information Retrieval with Query Recommendations
Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily
2006-01-01
Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944
PAQ: Persistent Adaptive Query Middleware for Dynamic Environments
NASA Astrophysics Data System (ADS)
Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin
Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.
NASA Astrophysics Data System (ADS)
Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee
2010-04-01
The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Spatial aggregation query in dynamic geosensor networks
NASA Astrophysics Data System (ADS)
Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun
2007-11-01
Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.
Viking Lander imaging investigation: Picture catalog of primary mission experiment data record
NASA Technical Reports Server (NTRS)
Tucker, R. B.
1978-01-01
All the images returned by the two Viking Landers during the primary phase of the Viking Mission are presented. Listings of supplemental information which described the conditions under which the images were acquired are included together with skyline drawings which show where the images are positioned in the field of view of the cameras. Subsets of the images are listed in a variety of sequences to aid in locating images of interest. The format and organization of the digital magnetic tape storage of the images are described. The mission and the camera system are briefly described.
Shuttle Enterprise Flight to New York
2012-04-27
Space shuttle Enterprise, mounted atop a NASA 747 Shuttle Carrier Aircraft (SCA), is seen as it flies over the Manhattan Skyline with Freedom Tower in the background, Friday, April 27, 2012, in New York. Enterprise was the first shuttle orbiter built for NASA performing test flights in the atmosphere and was incapable of spaceflight. Originally housed at the Smithsonian's Steven F. Udvar-Hazy Center, Enterprise will be demated from the SCA and placed on a barge that will eventually be moved by tugboat up the Hudson River to the Intrepid Sea, Air & Space Museum in June. Photo Credit: (NASA/Robert Markowitz)
Shuttle Enterprise Flight to New York
2012-04-27
Space shuttle Enterprise, mounted atop a NASA 747 Shuttle Carrier Aircraft (SCA), is seen as it flies near the Statue of Liberty and the Manhattan skyline, Friday, April 27, 2012, in New York. Enterprise was the first shuttle orbiter built for NASA performing test flights in the atmosphere and was incapable of spaceflight. Originally housed at the Smithsonian's Steven F. Udvar-Hazy Center, Enterprise will be demated from the SCA and placed on a barge that will eventually be moved by tugboat up the Hudson River to the Intrepid Sea, Air & Space Museum in June. Photo Credit: (NASA/Robert Markowitz)
Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM
2008-01-01
Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391
PropBase Query Layer: a single portal to UK subsurface physical property databases
NASA Astrophysics Data System (ADS)
Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham
2013-04-01
Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.
Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias
2018-03-01
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang
2017-01-01
To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.
Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang
2017-01-01
To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239
Improve Performance of Data Warehouse by Query Cache
NASA Astrophysics Data System (ADS)
Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod
2010-11-01
The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.
Safari, Leila; Patrick, Jon D
2018-06-01
This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.
Evaluation of Sub Query Performance in SQL Server
NASA Astrophysics Data System (ADS)
Oktavia, Tanty; Sujarwo, Surya
2014-03-01
The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.
Distributed query plan generation using multiobjective genetic algorithm.
Panicker, Shina; Kumar, T V Vijay
2014-01-01
A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.
Distributed Query Plan Generation Using Multiobjective Genetic Algorithm
Panicker, Shina; Vijay Kumar, T. V.
2014-01-01
A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513
Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.
Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Query Health: standards-based, cross-platform population health surveillance
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371
Query Health: standards-based, cross-platform population health surveillance.
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Using search engine query data to track pharmaceutical utilization: a study of statins.
Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F
2010-08-01
To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.
CSRQ: Communication-Efficient Secure Range Queries in Two-Tiered Sensor Networks
Dai, Hua; Ye, Qingqun; Yang, Geng; Xu, Jia; He, Ruiliang
2016-01-01
In recent years, we have seen many applications of secure query in two-tiered wireless sensor networks. Storage nodes are responsible for storing data from nearby sensor nodes and answering queries from Sink. It is critical to protect data security from a compromised storage node. In this paper, the Communication-efficient Secure Range Query (CSRQ)—a privacy and integrity preserving range query protocol—is proposed to prevent attackers from gaining information of both data collected by sensor nodes and queries issued by Sink. To preserve privacy and integrity, in addition to employing the encoding mechanisms, a novel data structure called encrypted constraint chain is proposed, which embeds the information of integrity verification. Sink can use this encrypted constraint chain to verify the query result. The performance evaluation shows that CSRQ has lower communication cost than the current range query protocols. PMID:26907293
SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.
Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan
2014-08-15
Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.
Improving accuracy for identifying related PubMed queries by an integrated approach.
Lu, Zhiyong; Wilbur, W John
2009-10-01
PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.
Improving accuracy for identifying related PubMed queries by an integrated approach
Lu, Zhiyong; Wilbur, W. John
2009-01-01
PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232
Shao, Yuhao; Yin, Xiaoxi; Kang, Dian; Shen, Boyu; Zhu, Zhangpei; Li, Xinuo; Li, Haofeng; Xie, Lin; Wang, Guangji; Liang, Yan
2017-08-01
Liquid chromatography mass spectrometry based methods provide powerful tools for protein analysis. Cytochromes P450 (CYPs), the most important drug metabolic enzymes, always exhibit sex-dependent expression patterns and metabolic activities. To date, analysis of CYPs based on mass spectrometry is still facing critical technical challenges due to the complexity and diversity of CYP isoforms besides lack of corresponding standards. The aim of present work consisted in developing a label-free qualitative and quantitative strategy for endogenous proteins, and then applying to the gender-difference study for CYPs in rat liver microsomes (RLMs). Initially, trypsin digested RLM specimens were analyzed by the nanoLC-LTQ-Orbitrap MS/MS. Skyline, an open source and freely available software for targeted proteomics research, was then used to screen the main CYP isoforms in RLMs under a series of criteria automatically, and a total of 40 and 39 CYP isoforms were identified in male and female RLMs, respectively. More importantly, a robust quantitative method in a tandem mass spectrometry-multiple reaction mode (MS/MS-MRM) was built and optimized under the help of Skyline, and successfully applied into the CYP gender difference study in RLMs. In this process, a simple and accurate approach named 'Standard Curve Slope" (SCS) was established based on the difference of standard curve slopes of CYPs between female and male RLMs in order to assess the gender difference of CYPs in RLMs. This presently developed methodology and approach could be widely used in the protein regulation study during drug pharmacological mechanism research. Copyright © 2017 Elsevier B.V. All rights reserved.
Godoy, Bibiane A; Gomes-Gouvêa, Michele S; Zagonel-Oliveira, Marcelo; Alvarado-Mora, Mónica V; Salzano, Francisco M; Pinho, João R R; Fagundes, Nelson J R
2016-09-01
Native American populations present the highest prevalence of Hepatitis B Virus (HBV) infection in the Americas, which may be associated to severe disease outcomes. Ten HBV genotypes (A–J) have been described, displaying a remarkable geographic structure, which most likely reflects historic patterns of human migrations. In this study, we characterize the HBV strains circulating in a historical sample of Native South Americans to characterize the historical viral dynamics in this population. The sample consisted of 1070 individuals belonging to 38 populations collected between 1965 and 1997. Presence of HBV DNA was checked by quantitative real-time PCR, and determination of HBV genotypes and subgenotypes was performed through sequencing and phylogenetic analysis of a fragment including part of HBsAg and Pol coding regions (S/Pol). A Bayesian Skyline Plot analysis was performed to compare the viral population dynamics of HBV/A1 strains found in Native Americans and in the general Brazilian population. A total of 109 individuals were positive for HBV DNA (~ 10%), and 70 samples were successfully sequenced and genotyped. Subgenotype A1 (HBV/A1), related to African populations and the African slave trade, was the most prevalent (66–94%). The Skyline Plot analysis showed a marked population expansion of HBV/A1 in Native Americans occurring more recently (1945–1965) than in the general Brazilian population. Our results suggest that historic processes that contributed to formation of HBV/A1 circulating in Native American are related with more recent migratory waves towards the Amazon basin, which generated a different viral dynamics in this region.
NASA Astrophysics Data System (ADS)
Titus, Benjamin M.; Daly, Marymegan
2017-03-01
Specialist and generalist life histories are expected to result in contrasting levels of genetic diversity at the population level, and symbioses are expected to lead to patterns that reflect a shared biogeographic history and co-diversification. We test these assumptions using mtDNA sequencing and a comparative phylogeographic approach for six co-occurring crustacean species that are symbiotic with sea anemones on western Atlantic coral reefs, yet vary in their host specificities: four are host specialists and two are host generalists. We first conducted species discovery analyses to delimit cryptic lineages, followed by classic population genetic diversity analyses for each delimited taxon, and then reconstructed the demographic history for each taxon using traditional summary statistics, Bayesian skyline plots, and approximate Bayesian computation to test for signatures of recent and concerted population expansion. The genetic diversity values recovered here contravene the expectations of the specialist-generalist variation hypothesis and classic population genetics theory; all specialist lineages had greater genetic diversity than generalists. Demography suggests recent population expansions in all taxa, although Bayesian skyline plots and approximate Bayesian computation suggest the timing and magnitude of these events were idiosyncratic. These results do not meet the a priori expectation of concordance among symbiotic taxa and suggest that intrinsic aspects of species biology may contribute more to phylogeographic history than extrinsic forces that shape whole communities. The recovery of two cryptic specialist lineages adds an additional layer of biodiversity to this symbiosis and contributes to an emerging pattern of cryptic speciation in the specialist taxa. Our results underscore the differences in the evolutionary processes acting on marine systems from the terrestrial processes that often drive theory. Finally, we continue to highlight the Florida Reef Tract as an important biodiversity hotspot.
Multi-Bit Quantum Private Query
NASA Astrophysics Data System (ADS)
Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing
2015-09-01
Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.
Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo
2015-09-18
A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.
Estimating Missing Features to Improve Multimedia Information Retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bagherjeiran, A; Love, N S; Kamath, C
Retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The retrieval results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features.more » In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.« less
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
A Framework for WWW Query Processing
NASA Technical Reports Server (NTRS)
Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)
2000-01-01
Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).
QBIC project: querying images by content, using color, texture, and shape
NASA Astrophysics Data System (ADS)
Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel
1993-04-01
In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.
Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy
2014-01-01
Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).
a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries
NASA Astrophysics Data System (ADS)
Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.
2017-10-01
Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.
NASA Astrophysics Data System (ADS)
Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG
2018-01-01
Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2016-01-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D
2015-04-01
The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.
A high performance, ad-hoc, fuzzy query processing system for relational databases
NASA Technical Reports Server (NTRS)
Mansfield, William H., Jr.; Fleischman, Robert M.
1992-01-01
Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.
Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo
2015-01-01
A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613
Systems and methods for an extensible business application framework
NASA Technical Reports Server (NTRS)
Bell, David G. (Inventor); Crawford, Michael (Inventor)
2012-01-01
Method and systems for editing data from a query result include requesting a query result using a unique collection identifier for a collection of individual files and a unique identifier for a configuration file that specifies a data structure for the query result. A query result is generated that contains a plurality of fields as specified by the configuration file, by combining each of the individual files associated with a unique identifier for a collection of individual files. The query result data is displayed with a plurality of labels as specified in the configuration file. Edits can be performed by querying a collection of individual files using the configuration file, editing a portion of the query result, and transmitting only the edited information for storage back into a data repository.
Analysis of Information Needs of Users of MEDLINEplus, 2002 – 2003
Scott-Wright, Alicia; Crowell, Jon; Zeng, Qing; Bates, David W.; Greenes, Robert
2006-01-01
We analyzed query logs from use of MEDLINEplus to answer the questions: Are consumers’ health information needs stable over time? and To what extent do users’ queries change over time? To determine log stability, we assessed an Overlap Rate (OR) defined as the number of unique queries common to two adjacent months divided by the total number of unique queries in those months. All exactly matching queries were considered as one unique query. We measured ORs for the top 10 and 100 unique queries of a month and compared these to ORs for the following month. Over ten months, users submitted 12,234,737 queries; only 2,179,571 (17.8%) were unique and these had a mean word count of 2.73 (S.D., 0.24); 121 of 137 (88.3%) unique queries each comprised of exactly matching search term(s) used at least 5000 times were of only one word. We could predict with 95% confidence that the monthly OR for the top 100 unique queries would lie between 67% – 87% when compared with the top 100 from the previous month. The mean month-to-month OR for top 10 queries was 62% (S.D., 20%) indicating significant variability; the lowest OR of 33% between the top 10 in Mar. compared to Apr. was likely due to “new” interest in information about SARS pneumonia in Apr. 2003. Consumers’ health information needs are relatively stable and the 100 most common unique queries are about 77% the same from month to month. Website sponsors should provide a broad range of information about a relatively stable number of topics. Analyses of log similarity may identify media-induced, cyclical, or seasonal changes in areas of consumer interest. PMID:17238431
Big Data and Dysmenorrhea: What Questions Do Women and Men Ask About Menstrual Pain?
Chen, Chen X; Groves, Doyle; Miller, Wendy R; Carpenter, Janet S
2018-04-30
Menstrual pain is highly prevalent among women of reproductive age. As the general public increasingly obtains health information online, Big Data from online platforms provide novel sources to understand the public's perspectives and information needs about menstrual pain. The study's purpose was to describe salient queries about dysmenorrhea using Big Data from a question and answer platform. We performed text-mining of 1.9 billion queries from ChaCha, a United States-based question and answer platform. Dysmenorrhea-related queries were identified by using keyword searching. Each relevant query was split into token words (i.e., meaningful words or phrases) and stop words (i.e., not meaningful functional words). Word Adjacency Graph (WAG) modeling was used to detect clusters of queries and visualize the range of dysmenorrhea-related topics. We constructed two WAG models respectively from queries by women of reproductive age and bymen. Salient themes were identified through inspecting clusters of WAG models. We identified two subsets of queries: Subset 1 contained 507,327 queries from women aged 13-50 years. Subset 2 contained 113,888 queries from men aged 13 or above. WAG modeling revealed topic clusters for each subset. Between female and male subsets, topic clusters overlapped on dysmenorrhea symptoms and management. Among female queries, there were distinctive topics on approaching menstrual pain at school and menstrual pain-related conditions; while among male queries, there was a distinctive cluster of queries on menstrual pain from male's perspectives. Big Data mining of the ChaCha ® question and answer service revealed a series of information needs among women and men on menstrual pain. Findings may be useful in structuring the content and informing the delivery platform for educational interventions.
Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.
ERIC Educational Resources Information Center
Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand
2003-01-01
Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…
Relational Algebra and SQL: Better Together
ERIC Educational Resources Information Center
McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart
2013-01-01
In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…
A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.
Khennak, Ilyes; Drias, Habiba
2016-11-01
The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.
RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks
Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng
2017-01-01
Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197
Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes
NASA Astrophysics Data System (ADS)
Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel
RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.
Mining the SDSS SkyServer SQL queries log
NASA Astrophysics Data System (ADS)
Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani
2016-05-01
SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.
Applying Query Structuring in Cross-language Retrieval.
ERIC Educational Resources Information Center
Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo
2003-01-01
Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…
Querying and Ranking XML Documents.
ERIC Educational Resources Information Center
Schlieder, Torsten; Meuss, Holger
2002-01-01
Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
Advanced Query Formulation in Deductive Databases.
ERIC Educational Resources Information Center
Niemi, Timo; Jarvelin, Kalervo
1992-01-01
Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…
A Semantic Graph Query Language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaplan, I L
2006-10-16
Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.
Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.
2015-01-01
We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572
Query Language for Location-Based Services: A Model Checking Approach
NASA Astrophysics Data System (ADS)
Hoareau, Christian; Satoh, Ichiro
We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H
2012-11-06
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.
2013-01-01
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719
Beyond context to the skyline: thinking in 3D.
Hoagwood, Kimberly; Olin, Serene; Cleek, Andrew
2013-01-01
Sweeping and profound structural, regulatory, and fiscal changes are rapidly reshaping the contours of health and mental health practice. The community-based practice contexts described in the excellent review by Garland and colleagues are being fundamentally altered with different business models, regional networks, accountability standards, and incentive structures. If community-based mental health services are to remain viable, the two-dimensional and flat research and practice paradigm has to be replaced with three-dimensional thinking. Failure to take seriously the changes that are happening to the larger healthcare context and respond actively through significant system redesign will lead to the demise of specialty mental health services.
NASA Technical Reports Server (NTRS)
Jones, K. L.; Henshaw, M.; Mcmenomy, C.; Robles, A.; Scribner, P. C.; Wall, S. D.; Wilson, J. W.
1981-01-01
Images returned by the two Viking landers during the extended and continuation automatic phases of the Viking Mission are presented. Information describing the conditions under which the images were acquired is included with skyline drawings showing the images positioned in the field of view of the cameras. Subsets of the images are listed in a variety of sequences to aid in locating images of interest. The format and organization of the digital magnetic tape storage of the images are described. A brief description of the mission and the camera system is also included.
NASA Technical Reports Server (NTRS)
Jones, K. L.; Henshaw, M.; Mcmenomy, C.; Robles, A.; Scribner, P. C.; Wall, S. D.; Wilson, J. W.
1981-01-01
All images returned by Viking Lander 1 during the extended and continuation automatic phases of the Viking Mission are presented. Listings of supplemental information which describe the conditions under which the images were acquired are included together with skyline drawings which show where the images are positioned in the field of view of the cameras. Subsets of the images are listed in a variety of sequences to aid in locating images of interest. The format and organization of the digital magnetic tape storage of the images are described as well as the mission and the camera system.
Query Expansion and Query Translation as Logical Inference.
ERIC Educational Resources Information Center
Nie, Jian-Yun
2003-01-01
Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…
End-User Use of Data Base Query Language: Pros and Cons.
ERIC Educational Resources Information Center
Nicholes, Walter
1988-01-01
Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)
Information Retrieval Using UMLS-based Structured Queries
Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith
2001-01-01
During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
A Relational Algebra Query Language for Programming Relational Databases
ERIC Educational Resources Information Center
McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole
2011-01-01
In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…
An Ensemble Approach for Expanding Queries
2012-11-01
0.39 pain^0.39 Hospital 15094 0.82 hospital^0.82 Miscarriage 45 3.35 miscarriage ^3.35 Radiotherapy 53 3.28 radiotherapy^3.28 Hypoaldosteronism 3...negated query is the expansion of the original query with negation terms preceding each word. For example, the negated version of “ miscarriage ^3.35...includes “no miscarriage ”^3.35 and “not miscarriage ”^3.35. If a document is the result of both original query and negated query, its score is
A novel adaptive Cuckoo search for optimal query plan generation.
Gomathi, Ramalingam; Sharmila, Dhandapani
2014-01-01
The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Query-Based Outlier Detection in Heterogeneous Information Networks.
Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei
2015-03-01
Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.
Query-Based Outlier Detection in Heterogeneous Information Networks
Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei
2015-01-01
Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397
Querying and Extracting Timeline Information from Road Traffic Sensor Data
Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen
2016-01-01
The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900
Policy Compliance of Queries for Private Information Retrieval
2010-11-01
SPARQL, unfortunately, is not in RDF and so we had to develop tools to translate SPARQL queries into RDF to be used by our policy compliance prototype...policy-assurance/sparql2n3.py) that accepts SPARQL queries and returns the translated query in our simplified ontology. An example of a translated
Knowledge Query Language (KQL)
2016-02-12
Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions
NASA Astrophysics Data System (ADS)
Skotniczny, Zbigniew
1989-12-01
The Query by Forms (QbF) system is a user-oriented interactive tool for querying large relational database with minimal queries difinition cost. The system was worked out under the assumption that user's time and effort for defining needed queries is the most severe bottleneck. The system may be applied in any Rdb/VMS databases system and is recommended for specific information systems of any project where end-user queries cannot be foreseen. The tool is dedicated to specialist of an application domain who have to analyze data maintained in database from any needed point of view, who do not need to know commercial databases languages. The paper presents the system developed as a compromise between its functionality and usability. User-system communication via a menu-driven "tree-like" structure of screen-forms which produces a query difinition and execution is discussed in detail. Output of query results (printed reports and graphics) is also discussed. Finally the paper shows one application of QbF to a HERA-project.
Hybrid ontology for semantic information retrieval model using keyword matching indexing system.
Uthayan, K R; Mala, G S Anandha
2015-01-01
Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.
Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System
Uthayan, K. R.; Anandha Mala, G. S.
2015-01-01
Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851
Multidimensional indexing structure for use with linear optimization queries
NASA Technical Reports Server (NTRS)
Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)
2002-01-01
Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.
The role of economics in the QUERI program: QUERI Series
Smith, Mark W; Barnett, Paul G
2008-01-01
Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199
The role of economics in the QUERI program: QUERI Series.
Smith, Mark W; Barnett, Paul G
2008-04-22
The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.
Processing SPARQL queries with regular expressions in RDF databases
2011-01-01
Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225
Processing SPARQL queries with regular expressions in RDF databases.
Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon
2011-03-29
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L
2000-01-01
The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo
2016-01-01
Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028
Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo
2016-01-01
Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.
Searching for cancer information on the internet: analyzing natural language search queries.
Bader, Judith L; Theofanos, Mary Frances
2003-12-11
Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.
Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
Theofanos, Mary Frances
2003-01-01
Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659
Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.
ERIC Educational Resources Information Center
Choi, Youngok; Rasmussen, Edie M.
2003-01-01
Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…
Searching and Filtering Tweets: CSIRO at the TREC 2012 Microblog Track
2012-11-01
stages. We first evaluate the effect of tweet corpus pre- processing in vanilla runs (no query expansion), and then assess the effect of query expansion...Effect of a vanilla run on D4 index (both realtime and non-real-time), and query expansion methods based on the submitted runs for two sets of queries
Knowledge Query Language (KQL)
2016-02-01
unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in
System, method and apparatus for conducting a keyterm search
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
System, method and apparatus for conducting a phrase search
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
Targeted exploration and analysis of large cross-platform human transcriptomic compendia
Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.
2016-01-01
We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801
An index-based algorithm for fast on-line query processing of latent semantic analysis
Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.
De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning
2015-10-01
Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
An index-based algorithm for fast on-line query processing of latent semantic analysis.
Zhang, Mingxi; Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.
Khennak, Ilyes; Drias, Habiba
2017-02-01
With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
2013-01-01
Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556
Luo, Yuan; Szolovits, Peter
2016-01-01
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.
Luo, Yuan; Szolovits, Peter
2016-01-01
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379
Executing SPARQL Queries over the Web of Linked Data
NASA Astrophysics Data System (ADS)
Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph
The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.
A Natural Language Interface Concordant with a Knowledge Base.
Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young
2016-01-01
The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.
Saying What You're Looking For: Linguistics Meets Video Search.
Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark
2016-10-01
We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.
Context-Aware Online Commercial Intention Detection
NASA Astrophysics Data System (ADS)
Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng
With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.
Research on presentation and query service of geo-spatial data based on ontology
NASA Astrophysics Data System (ADS)
Li, Hong-wei; Li, Qin-chao; Cai, Chang
2008-10-01
The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.
In-context query reformulation for failing SPARQL queries
NASA Astrophysics Data System (ADS)
Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James
2017-05-01
Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.
Model-based query language for analyzing clinical processes.
Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris
2013-01-01
Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.
AQBE — QBE Style Queries for Archetyped Data
NASA Astrophysics Data System (ADS)
Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash
Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.
StarView: The object oriented design of the ST DADS user interface
NASA Technical Reports Server (NTRS)
Williams, J. D.; Pollizzi, J. A.
1992-01-01
StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen; Lueth, Christoph
2012-01-01
We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.
Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.
Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie
2017-03-01
Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
CUFID-query: accurate network querying through random walk based network flow estimation.
Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun
2017-12-28
Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.
Improving overlay control through proper use of multilevel query APC
NASA Astrophysics Data System (ADS)
Conway, Timothy H.; Carlson, Alan; Crow, David A.
2003-06-01
Many state-of-the-art fabs are operating with increasingly diversified product mixes. For example, at Cypress Semiconductor, it is not unusual to be concurrently running multiple technologies and many devices within each technology. This diverse product mix significantly increases the difficulty of manually controlling overlay process corrections. As a result, automated run-to-run feedforward-feedback control has become a necessary and vital component of manufacturing. However, traditional run-to-run controllers rely on highly correlated historical events to forecast process corrections. For example, the historical process events typically are constrained to match the current event for exposure tool, device, process level and reticle ID. This narrowly defined process stream can result in insufficient data when applied to lowvolume or new-release devices. The run-to-run controller implemented at Cypress utilizes a multi-level query (Level-N) correlation algorithm, where each subsequent level widens the search criteria for available historical data. The paper discusses how best to widen the search criteria and how to determine and apply a known bias to account for tool-to-tool and device-to-device differences. Specific applications include offloading lots from one tool to another when the first tool is down for preventive maintenance, utilizing related devices to determine a default feedback vector for new-release devices, and applying bias values to account for known reticle-to-reticle differences. In this study, we will show how historical data can be leveraged from related devices or tools to overcome the limitations of narrow process streams. In particular, this paper discusses how effectively handling narrow process streams allows Cypress to offload lots from a baseline tool to an alternate tool.
Querying graphs in protein-protein interactions networks using feedback vertex set.
Blin, Guillaume; Sikora, Florian; Vialette, Stéphane
2010-01-01
Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.
Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai
2017-03-01
The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
Occam's razor: supporting visual query expression for content-based image queries
NASA Astrophysics Data System (ADS)
Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.
2005-01-01
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Geometric Representations of Condition Queries on Three-Dimensional Vector Fields
NASA Technical Reports Server (NTRS)
Henze, Chris
1999-01-01
Condition queries on distributed data ask where particular conditions are satisfied. It is possible to represent condition queries as geometric objects by plotting field data in various spaces derived from the data, and by selecting loci within these derived spaces which signify the desired conditions. Rather simple geometric partitions of derived spaces can represent complex condition queries because much complexity can be encapsulated in the derived space mapping itself A geometric view of condition queries provides a useful conceptual unification, allowing one to intuitively understand many existing vector field feature detection algorithms -- and to design new ones -- as variations on a common theme. A geometric representation of condition queries also provides a simple and coherent basis for computer implementation, reducing a wide variety of existing and potential vector field feature detection techniques to a few simple geometric operations.
Occam"s razor: supporting visual query expression for content-based image queries
NASA Astrophysics Data System (ADS)
Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.
2004-12-01
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Retrieval feedback in MEDLINE.
Srinivasan, P
1996-01-01
OBJECTIVE: To investigate a new approach for query expansion based on retrieval feedback. The first objective in this study was to examine alternative query-expansion methods within the same retrieval-feedback framework. The three alternatives proposed are: expansion on the MeSH query field alone, expansion on the free-text field alone, and expansion on both the MeSH and the free-text fields. The second objective was to gain further understanding of retrieval feedback by examining possible dependencies on relevant documents during the feedback cycle. DESIGN: Comparative study of retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a MEDLINE test collection of 75 queries and 2,334 MEDLINE citations. MEASUREMENTS: Retrieval effectivenesses of the original unexpanded and the alternative expanded queries were compared using 11-point-average precision scores (11-AvgP). These are averages of precision scores obtained at 11 standard recall points. RESULTS: All three expansion strategies significantly improved the original queries in terms of retrieval effectiveness. Expansion on MeSH alone was equivalent to expansion on both MeSH and the free-text fields. Expansion on the free-text field alone improved the queries significantly less than did the other two strategies. The second part of the study indicated that retrieval-feedback-based expansion yields significant performance improvements independent of the availability of relevant documents for feedback information. CONCLUSIONS: Retrieval feedback offers a robust procedure for query expansion that is most effective for MEDLINE when applied to the MeSH field. PMID:8653452
Query Expansion Using SNOMED-CT and Weighing Schemes
2014-11-01
For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS
ERIC Educational Resources Information Center
Chung, EunKyung; Yoon, JungWon
2009-01-01
Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…
Design Recommendations for Query Languages
1980-09-01
DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system
Agent-Based Framework for Discrete Entity Simulations
2006-11-01
Postgres database server for environment queries of neighbors and continuum data. As expected for raw database queries (no database optimizations in...form. Eventually the code was ported to GNU C++ on the same single Intel Pentium 4 CPU running RedHat Linux 9.0 and Postgres database server...Again Postgres was used for environmental queries, and the tool remained relatively slow because of the immense number of queries necessary to assess
Akce, Abdullah; Norton, James J S; Bretl, Timothy
2015-09-01
This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.
Query construction, entropy, and generalization in neural-network models
NASA Astrophysics Data System (ADS)
Sollich, Peter
1994-05-01
We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.
Spatial information semantic query based on SPARQL
NASA Astrophysics Data System (ADS)
Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang
2009-10-01
How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.
Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung
2015-01-01
Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665
VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans
NASA Astrophysics Data System (ADS)
Wang, Song; Gupta, Chetan; Mehta, Abhay
There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.
NASA Astrophysics Data System (ADS)
Warren, Z.; Shahriar, M. S.; Tripathi, R.; Pati, G. S.
2018-02-01
A repeated query technique has been demonstrated as a new interrogation method in pulsed coherent population trapping for producing single-peaked Ramsey interference with high contrast. This technique enhances the contrast of the central Ramsey fringe by nearly 1.5 times and significantly suppresses the side fringes by using more query pulses ( >10) in the pulse cycle. Theoretical models have been developed to simulate Ramsey interference and analyze the characteristics of the Ramsey spectrum produced by the repeated query technique. Experiments have also been carried out employing a repeated query technique in a prototype rubidium clock to study its frequency stability performance.
Nadkarni, P M
1997-08-01
Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.
Evolution of Query Optimization Methods
NASA Astrophysics Data System (ADS)
Hameurlain, Abdelkader; Morvan, Franck
Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).
An alternative database approach for management of SNOMED CT and improved patient data queries.
Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R
2015-10-01
SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.
Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths
NASA Astrophysics Data System (ADS)
Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka
2010-11-01
Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.
Autocorrelation and Regularization of Query-Based Information Retrieval Scores
2008-02-01
of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation
Query Log Analysis of an Electronic Health Record Search Engine
Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.
2011-01-01
We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150
Efficient hemodynamic event detection utilizing relational databases and wavelet analysis
NASA Technical Reports Server (NTRS)
Saeed, M.; Mark, R. G.
2001-01-01
Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.
A Fuzzy Query Mechanism for Human Resource Websites
NASA Astrophysics Data System (ADS)
Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih
Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.
Sampri, Alexia; Sypsa, Karla; Tsagarakis, Konstantinos P
2018-01-01
Background With the internet’s penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma—a common respiratory disease—is present. Objective This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Methods Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term “asthma,” forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Results Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years’ queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=–.82, P=.001) and current asthma (r=–.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=–.78, P=.003) and current asthma (r=–.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Conclusions Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma. PMID:29530839
Menopause and big data: Word Adjacency Graph modeling of menopause-related ChaCha data.
Carpenter, Janet S; Groves, Doyle; Chen, Chen X; Otte, Julie L; Miller, Wendy R
2017-07-01
To detect and visualize salient queries about menopause using Big Data from ChaCha. We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.
Fast Inbound Top-K Query for Random Walk with Restart.
Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei
2015-09-01
Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.
Sleep-wake time perception varies by direct or indirect query.
Alameddine, Y; Ellenbogen, J M; Bianchi, M T
2015-01-15
The diagnosis of insomnia rests on self-report of difficulty initiating or maintaining sleep. However, subjective reports may be unreliable, and possibly may vary by the method of inquiry. We investigated this possibility by comparing within-individual response to direct versus indirect time queries after overnight polysomnography. We obtained self-reported sleep-wake times via morning questionnaires in 879 consecutive adult diagnostic polysomnograms. Responses were compared within subjects (direct versus indirect query) and across groups defined by apnea-hypopnea index and by self-reported insomnia symptoms in pre-sleep questionnaires. Direct queries required a time duration response, while indirect queries required clock times from which we calculated time durations. Direct and indirect queries of sleep latency were the same in only 41% of cases, and total sleep time queries matched in only 5.4%. For both latency and total sleep, the most common discrepancy involved the indirect value being larger than the direct response. The discrepancy between direct and indirect queries was not related to objective sleep metrics. The degree of discrepancy was not related to the presence of insomnia symptoms, although patients reporting insomnia symptoms showed underestimation of total sleep duration by direct response. Self-reported sleep latency and total sleep time are often internally inconsistent when comparing direct and indirect survey queries of each measure. These discrepancies represent substantive challenges to effective clinical practice, particularly when diagnosis and management depends on self-reported sleep patterns, as with insomnia. Although self-reported sleep-wake times remains fundamental to clinical practice, objective measures provide clinically relevant adjunctive information. © 2015 American Academy of Sleep Medicine.
Secure and Efficient k-NN Queries⋆
Asif, Hafiz; Vaidya, Jaideep; Shafiq, Basit; Adam, Nabil
2017-01-01
Given the morass of available data, ranking and best match queries are often used to find records of interest. As such, k-NN queries, which give the k closest matches to a query point, are of particular interest, and have many applications. We study this problem in the context of the financial sector, wherein an investment portfolio database is queried for matching portfolios. Given the sensitivity of the information involved, our key contribution is to develop a secure k-NN computation protocol that can enable the computation k-NN queries in a distributed multi-party environment while taking domain semantics into account. The experimental results show that the proposed protocols are extremely efficient. PMID:29218333
Nearest private query based on quantum oblivious key distribution
NASA Astrophysics Data System (ADS)
Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan
2017-12-01
Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.
Cognitive issues in searching images with visual queries
NASA Astrophysics Data System (ADS)
Yu, ByungGu; Evens, Martha W.
1999-01-01
In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.
Blind Seer: A Scalable Private DBMS
2014-05-01
searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower...than MySQL , although some queries are costlier). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges...index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL
Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval
2014-11-01
the same score, another singal will be used to rank these documents to break the ties , but the relative orders of other documents against these...documents remain the same. The tie- breaking step above is repeatedly applied to further break ties until all candidate signals are applied and the ranking...searched it on the Yahoo! search engine, which returned some query sug- gestions for the query. The original queries as well as their query suggestions
Multi-field query expansion is effective for biomedical dataset retrieval.
Bouadjenek, Mohamed Reda; Verspoor, Karin
2017-01-01
In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.
Multi-field query expansion is effective for biomedical dataset retrieval
2017-01-01
Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457
Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong
2016-09-01
For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.
78 FR 20473 - National Practitioner Data Bank
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-05
... may self-query. Information under the HCQIA is reported by medical malpractice payers, state medical... Organizations (QIOs). Individual health care practitioners and entities may self-query. Information under... have access to this information. Individual practitioners, providers, and suppliers may self-query the...
Phylodynamics of classical swine fever virus with emphasis on Ecuadorian strains.
Garrido Haro, A D; Barrera Valle, M; Acosta, A; J Flores, F
2018-06-01
Classic swine fever virus (CSFV) is a Pestivirus from the Flaviviridae family that affects pigs worldwide and is endemic in several Latin American countries. However, there are still some countries in the region, including Ecuador, for which CSFV molecular information is lacking. To better understand the epidemiology of CSFV in the Americas, sequences from CSFVs from Ecuador were generated and a phylodynamic analysis of the virus was performed. Sequences for the full-length glycoprotein E2 gene of twenty field isolates were obtained and, along with sequences from strains previously described in the Americas and from the most representative strains worldwide, were used to analyse the phylodynamics of the virus. Bayesian methods were used to test several molecular clock and demographic models. A calibrated ultrametric tree and a Bayesian skyline were constructed, and codons associated with positive selection involving immune scape were detected. The best model according to Bayes factors was the strict molecular clock and Bayesian skyline model, which shows that CSFV has an evolution rate of 3.2 × 10 -4 substitutions per site per year. The model estimates the origin of CSFV in the mid-1500s. There is a strong spatial structure for CSFV in the Americas, indicating that the virus is moving mainly through neighbouring countries. The genetic diversity of CSFV has increased constantly since its appearance, with a slight decrease in mid-twentieth century, which coincides, with eradication campaigns in North America. Even though there is no evidence of strong directional evolution of the E2 gene in CSFV, codons 713, 761, 762 and 975 appear to be selected positively and could be related to virulence or pathogenesis. These results reveal how CSFV has spread and evolved since it first appeared in the Americas and provide important information for attaining the goal of eradication of this virus in Latin America. © 2018 Blackwell Verlag GmbH.
Query by example video based on fuzzy c-means initialized by fixed clustering center
NASA Astrophysics Data System (ADS)
Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar
2012-04-01
Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.
NASA Technical Reports Server (NTRS)
Friedman, S. Z.; Walker, R. E.; Aitken, R. B.
1986-01-01
The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Wu, Kesheng
The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less
Köhler, M J; Springer, S; Kaatz, M
2014-09-01
The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.
HodDB: Design and Analysis of a Query Processor for Brick.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fierro, Gabriel; Culler, David
Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less
NASA Astrophysics Data System (ADS)
Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng
2016-05-01
Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.
Representation and alignment of sung queries for music information retrieval
NASA Astrophysics Data System (ADS)
Adams, Norman H.; Wakefield, Gregory H.
2005-09-01
The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.
Concept-based query language approach to enterprise information systems
NASA Astrophysics Data System (ADS)
Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo
2014-01-01
In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.
Relativistic quantum private database queries
NASA Astrophysics Data System (ADS)
Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou
2015-04-01
Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.
Walter User’s Manual (Version 1.0).
1987-09-01
queries and/or commands. 1.2 - How Walter Uses the Screen As shown in Figure 1-1, Walter divides the screen of your terminal into five separate areas...our attention to queries and how to submit them to the database. 1.3.1 - Submitting Queries A query is an expression consisting of words, parentheses...dates, but also with ranges of dates, such as "oct 15 : nov 15". Waiter recognizes three kinds of dates: * Specific dates of the form [date <month> <day
Flexible Decision Support in Device-Saturated Environments
2003-10-01
also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar
2016-07-25
Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
A distributed query execution engine of big attributed graphs.
Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif
2016-01-01
A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags
NASA Astrophysics Data System (ADS)
Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji
We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.
Quantum algorithms on Walsh transform and Hamming distance for Boolean functions
NASA Astrophysics Data System (ADS)
Xie, Zhengwei; Qiu, Daowen; Cai, Guangya
2018-06-01
Walsh spectrum or Walsh transform is an alternative description of Boolean functions. In this paper, we explore quantum algorithms to approximate the absolute value of Walsh transform W_f at a single point z0 (i.e., |W_f(z0)|) for n-variable Boolean functions with probability at least 8/π 2 using the number of O(1/|W_f(z_{0)|ɛ }) queries, promised that the accuracy is ɛ , while the best known classical algorithm requires O(2n) queries. The Hamming distance between Boolean functions is used to study the linearity testing and other important problems. We take advantage of Walsh transform to calculate the Hamming distance between two n-variable Boolean functions f and g using O(1) queries in some cases. Then, we exploit another quantum algorithm which converts computing Hamming distance between two Boolean functions to quantum amplitude estimation (i.e., approximate counting). If Ham(f,g)=t≠0, we can approximately compute Ham( f, g) with probability at least 2/3 by combining our algorithm and {Approx-Count(f,ɛ ) algorithm} using the expected number of Θ( √{N/(\\lfloor ɛ t\\rfloor +1)}+√{t(N-t)}/\\lfloor ɛ t\\rfloor +1) queries, promised that the accuracy is ɛ . Moreover, our algorithm is optimal, while the exact query complexity for the above problem is Θ(N) and the query complexity with the accuracy ɛ is O(1/ɛ 2N/(t+1)) in classical algorithm, where N=2n. Finally, we present three exact quantum query algorithms for two promise problems on Hamming distance using O(1) queries, while any classical deterministic algorithm solving the problem uses Ω(2n) queries.
Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S
2011-05-01
A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.
Parasol: An Architecture for Cross-Cloud Federated Graph Querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa
2014-06-22
Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less
NASA Astrophysics Data System (ADS)
Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros
SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
Plaisant, Catherine; Lam, Stanley; Shneiderman, Ben; Smith, Mark S.; Roseman, David; Marchand, Greg; Gillam, Michael; Feied, Craig; Handler, Jonathan; Rappaport, Hank
2008-01-01
As electronic health records (EHR) become more widespread, they enable clinicians and researchers to pose complex queries that can benefit immediate patient care and deepen understanding of medical treatment and outcomes. However, current query tools make complex temporal queries difficult to pose, and physicians have to rely on computer professionals to specify the queries for them. This paper describes our efforts to develop a novel query tool implemented in a large operational system at the Washington Hospital Center (Microsoft Amalga, formerly known as Azyxxi). We describe our design of the interface to specify temporal patterns and the visual presentation of results, and report on a pilot user study looking for adverse reactions following radiology studies using contrast. PMID:18999158
Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment
Anwar, Nadia; Hunt, Ela
2009-01-01
Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482
SPARQL Assist language-neutral query composer
2012-01-01
Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327
SPARQL assist language-neutral query composer.
McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark
2012-01-25
SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.
NASA Technical Reports Server (NTRS)
Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh
2014-01-01
Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.
Parallel Index and Query for Large Scale Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chou, Jerry; Wu, Kesheng; Ruebel, Oliver
2011-07-18
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
Information Network Model Query Processing
NASA Astrophysics Data System (ADS)
Song, Xiaopu
Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.
Using Bitmap Indexing Technology for Combined Numerical and TextQueries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng
2006-10-16
In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less
Mavragani, Amaryllis; Sampri, Alexia; Sypsa, Karla; Tsagarakis, Konstantinos P
2018-03-12
With the internet's penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma-a common respiratory disease-is present. This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term "asthma," forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years' queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=-.82, P=.001) and current asthma (r=-.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=-.78, P=.003) and current asthma (r=-.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma. ©Amaryllis Mavragani, Alexia Sampri, Karla Sypsa, Konstantinos P Tsagarakis. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 12.03.2018.
BioFed: federated query processing over life sciences linked open data.
Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich
2017-03-15
Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.
An adaptable architecture for patient cohort identification from diverse data sources.
Bache, Richard; Miles, Simon; Taweel, Adel
2013-12-01
We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
The Profile-Query Relationship.
ERIC Educational Resources Information Center
Shepherd, Michael A.; Phillips, W. J.
1986-01-01
Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…
ERIC Educational Resources Information Center
Bosc, P.; Lietard, L.; Pivert, O.
2003-01-01
Considers flexible querying of relational databases. Highlights include SQL languages and basic aggregate operators; Sugeno's fuzzy integral; evaluation examples; and how and under what conditions other aggregate functions could be applied to fuzzy sets in a flexible query. (Author/LRW)
Massive Query Resolution for Rapid Selective Dissemination of Information.
ERIC Educational Resources Information Center
Cohen, Jonathan D.
1999-01-01
Outlines an efficient approach to performing query resolution which, when matched with a keyword scanner, offers rapid selecting and routing for massive Boolean queries, and which is suitable for implementation on a desktop computer. Demonstrates the system's operation with large examples in a practical setting. (AEF)
Querying Proofs (Work in Progress)
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen; Lueth, Christoph
2011-01-01
We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.
Matching health information seekers' queries to medical terms
2012-01-01
Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records. PMID:23095521
Comparison of two matrix data structures for advanced CSM testbed applications
NASA Technical Reports Server (NTRS)
Regelbrugge, M. E.; Brogan, F. A.; Nour-Omid, B.; Rankin, C. C.; Wright, M. A.
1989-01-01
The first section describes data storage schemes presently used by the Computational Structural Mechanics (CSM) testbed sparse matrix facilities and similar skyline (profile) matrix facilities. The second section contains a discussion of certain features required for the implementation of particular advanced CSM algorithms, and how these features might be incorporated into the data storage schemes described previously. The third section presents recommendations, based on the discussions of the prior sections, for directing future CSM testbed development to provide necessary matrix facilities for advanced algorithm implementation and use. The objective is to lend insight into the matrix structures discussed and to help explain the process of evaluating alternative matrix data structures and utilities for subsequent use in the CSM testbed.
A second level of the Saint Petersburg skyline
NASA Astrophysics Data System (ADS)
Krasnopolsky, Andrey; Bolotin, Sergey
2018-03-01
The article considers the history of the residential development in Saint Petersburg and states corresponding landmark dates. In recent years, changes in the altitude range of the residential development are noted, the influence of this factor on the formation of the city's silhouette is assessed. Reasons for such changes are identified. Attractiveness of high-rise residential complexes for living is assessed. Conclusions are made of tendencies in further housing construction development in terms of its altitude range. It is noted that it is possible to locate multi-storied buildings in the periphery of the city, taking into account specific visual characteristics of the construction site and silhouette of erected buildings; as for central districts, strict regulations regarding the altitude range are needed.
Interpreting megalithic tomb orientation and siting within broader cultural contexts
NASA Astrophysics Data System (ADS)
Prendergast, Frank
2016-02-01
This paper assesses the measured axial orientations and siting of Irish passage tombs. The distribution of monuments with passages/entrances directed at related tombs/cairns is shown. Where this phenomenon occurs, the targeted structure is invariably located at a higher elevation on the skyline and this could suggest a symbolic and hierarchical relationship in their relative siting in the landscape. Additional analysis of astronomical declinations at a national scale has identified tombs with an axial alignment towards the rising and setting positions of the Sun at the winter and summer solstices. A criteria-based framework is developed which potentially allows for these types of data to be more meaningfully considered and culturally interpreted within broader archaeological and social anthropological contexts.
Query-Time Optimization Techniques for Structured Queries in Information Retrieval
ERIC Educational Resources Information Center
Cartright, Marc-Allen
2013-01-01
The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…
Locality in Search Engine Queries and Its Implications for Caching
2001-05-01
in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have
Flexible Querying of Lifelong Learner Metadata
ERIC Educational Resources Information Center
Poulovassilis, A.; Selmer, P.; Wood, P. T.
2012-01-01
This paper discusses the provision of flexible querying facilities over heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities is to allow learners to identify possible choices for their future learning and professional development by seeing what others have done. We motivate and…
A Comparison of Two Methods for Boolean Query Relevancy Feedback.
ERIC Educational Resources Information Center
Salton, G.; And Others
1984-01-01
Evaluates and compares two recently proposed automatic methods for relevance feedback of Boolean queries (Dillon method, which uses probabilistic approach as basis, and disjunctive normal form method). Conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment. Nineteen references are included. (EJS)
Consistent Query Answering of Conjunctive Queries under Primary Key Constraints
ERIC Educational Resources Information Center
Pema, Enela
2014-01-01
An inconsistent database is a database that violates one or more of its integrity constraints. In reality, violations of integrity constraints arise frequently under several different circumstances. Inconsistent databases have long posed the challenge to develop suitable tools for meaningful query answering. A principled approach for querying…
28 CFR 25.7 - Querying records in the system.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Querying records in the system. 25.7 Section 25.7 Judicial Administration DEPARTMENT OF JUSTICE DEPARTMENT OF JUSTICE INFORMATION SYSTEMS The National Instant Criminal Background Check System § 25.7 Querying records in the system. (a) The following...
76 FR 9295 - Privacy Act; Exempt Record System
Federal Register 2010, 2011, 2012, 2013, 2014
2011-02-17
... entities can self-query. One of the primary purposes of these data will be use of this information by a... information on all other queries to the data bank, disclosure of law enforcement queries could compromise ongoing investigation activities. The premature disclosure of the existence of a law enforcement activity...
A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transfor...
Experimental quantum private queries with linear optics
NASA Astrophysics Data System (ADS)
de Martini, Francesco; Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo; Nagali, Eleonora; Sansoni, Linda; Sciarrino, Fabio
2009-07-01
The quantum private query is a quantum cryptographic protocol to recover information from a database, preserving both user and data privacy: the user can test whether someone has retained information on which query was asked and the database provider can test the amount of information released. Here we discuss a variant of the quantum private query algorithm that admits a simple linear optical implementation: it employs the photon’s momentum (or time slot) as address qubits and its polarization as bus qubit. A proof-of-principle experimental realization is implemented.
NASA Technical Reports Server (NTRS)
Dominick, Wayne D. (Editor); Liu, I-Hsiung
1985-01-01
The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.
Provenance Storage, Querying, and Visualization in PBase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo
2015-01-01
We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.
A Semantic Basis for Proof Queries and Transformations
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen W.; Luth, Christoph
2013-01-01
We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that
iSMART: Ontology-based Semantic Query of CDA Documents
Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue
2009-01-01
The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883
Singh, Karmpaul; Brown, Richard J
2016-09-01
The current study aimed to explore the phenomenon of disease-related 'query escalation' in high/low health anxious Internet users (N = 40). During a 15-minute health-related Internet search, participants rated their anxiety and the perceived seriousness of information on each page. Post-search interviews determined the reasons for, and effects of, escalating queries to consider serious diseases. Both groups were found to be significantly more anxious after escalating queries. The high group was significantly more likely to escalate queries. Evaluating personal relevance of material was the main reason for escalations and moderated anxiety post-escalation. We conclude that searching for online disease information can increase anxiety, particularly for people worried about their health. © The Author(s) 2015.
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.
Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy
2017-11-05
This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.
An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring
Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy
2017-01-01
This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073
Jeong, Hyundoo; Yoon, Byung-Jun
2017-03-14
Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .
Labeling RDF Graphs for Linear Time and Space Querying
NASA Astrophysics Data System (ADS)
Furche, Tim; Weinzierl, Antonius; Bry, François
Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
A New Publicly Available Chemical Query Language, CSRML ...
A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan
An approach for heterogeneous and loosely coupled geospatial data distributed computing
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui
2010-07-01
Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.
2005-01-01
A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938
SeqWare Query Engine: storing and searching sequence data in the cloud.
O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F
2010-12-21
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
Child pornography in peer-to-peer networks.
Steel, Chad M S
2009-08-01
The presence of child pornography in peer-to-peer networks is not disputed, but there has been little effort done to quantify and analyze the distribution and nature of that content to-date. By performing an analysis of queries and query hits on the largest peer-to-peer network, we are able to both quantify and describe the nature of querying by child pornographers as well as the content they are sharing. Child pornography related content was identified and analyzed in 235,513 user queries and 194,444 query hits. The research confirmed a large amount of peer-to-peer traffic is dedicated to child pornography, but supply and demand must be separated for a better understanding. The most prevalent query and the top two most prevalent filenames returned as query hits were child pornography related. However, it would be inaccurate to state child pornography dominates peer-to-peer as 1% of all queries were related to child pornography and 1.45% of all query hits (unique filenames) were related to child pornography, consistent with a smaller study (Hughes et al., 2008). In addition to the above, research indicates that the median age searched for was 13 years old, and the majority of queries were gender-neutral, but of those with gender-related terms, 79% were female-oriented. Distribution-wise, the vast majority of content-specific searches are for movies at 99%, though images are still the most prevalent in availability. There is no shortage of child pornography supply and demand on peer-to-peer networks and by analyzing how consumers seek and distributors advertise content we can better understand their motivations. Understanding the behavior of child pornographers and how they search for content when contrasted with those sharing content provides a basis for finding and combating that behavior. For law enforcement, knowing the specific terms used allows more timely and accurate forensics and better identification of those seeking and distributing child pornography. For Internet researchers, better filtering and monitoring is possible. For mental health professionals, understanding the preferences and behaviors of those searching supports more effective treatment.
SeqWare Query Engine: storing and searching sequence data in the cloud
2010-01-01
Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.
Mosa, Abu Saleh Mohammad; Yoo, Illhoi
2013-01-09
The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.
A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log
2013-01-01
Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604
STARS 2.0: 2nd-generation open-source archiving and query software
NASA Astrophysics Data System (ADS)
Winegar, Tom
2008-07-01
The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.
Federated queries of clinical data repositories: the sum of the parts does not equal the whole
Weber, Griffin M
2013-01-01
Background and objective In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors. Methods We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user. Results Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query. Conclusions Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture. PMID:23349080
An adaptable architecture for patient cohort identification from diverse data sources
Bache, Richard; Miles, Simon; Taweel, Adel
2013-01-01
Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442
Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries
Lev-Ran, Shaul
2017-01-01
Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469
A weight based genetic algorithm for selecting views
NASA Astrophysics Data System (ADS)
Talebian, Seyed H.; Kareem, Sameem A.
2013-03-01
Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.
How Do Children Reformulate Their Search Queries?
ERIC Educational Resources Information Center
Rutter, Sophie; Ford, Nigel; Clough, Paul
2015-01-01
Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…
Improving Web Search for Difficult Queries
ERIC Educational Resources Information Center
Wang, Xuanhui
2009-01-01
Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…
Overview of the TREC 2014 Session Track
2014-11-01
except all of them have length mi = 1 and thus they have no current/final query. Participants were to run the 1,021 current queries against their search ... engines under each of the following three conditions separately: RL1 ignoring the session prior to this query RL2 considering all the items (1), (2) and
A "Simple Query Interface" Adapter for the Discovery and Exchange of Learning Resources
ERIC Educational Resources Information Center
Massart, David
2006-01-01
Developed as part of CEN/ISSS Workshop on Learning Technology efforts to improve interoperability between learning resource repositories, the Simple Query Interface (SQI) is an Application Program Interface (API) for querying heterogeneous repositories of learning resource metadata. In the context of the ProLearn Network of Excellence, SQI is used…
An Experimental Investigation of Complexity in Database Query Formulation Tasks
ERIC Educational Resources Information Center
Casterella, Gretchen Irwin; Vijayasarathy, Leo
2013-01-01
Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…
Cognitive search model and a new query paradigm
NASA Astrophysics Data System (ADS)
Xu, Zhonghui
2001-06-01
This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.
Secure Nearest Neighbor Query on Crowd-Sensing Data
Cheng, Ke; Wang, Liangmin; Zhong, Hong
2016-01-01
Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253
Evaluation methodology for query-based scene understanding systems
NASA Astrophysics Data System (ADS)
Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.
2015-05-01
In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.
NASA Astrophysics Data System (ADS)
Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin
2014-12-01
Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.
SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.
Chiba, Hirokazu; Uchiyama, Ikuo
2017-02-08
Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .
Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman
2014-01-01
Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380
Secure Nearest Neighbor Query on Crowd-Sensing Data.
Cheng, Ke; Wang, Liangmin; Zhong, Hong
2016-09-22
Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.
Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin
2014-01-01
Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary. PMID:25518810
The StarView intelligent query mechanism
NASA Technical Reports Server (NTRS)
Semmel, R. D.; Silberberg, D. P.
1993-01-01
The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.
SP2Bench: A SPARQL Performance Benchmark
NASA Astrophysics Data System (ADS)
Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg
A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
Object-Oriented Query Language For Events Detection From Images Sequences
NASA Astrophysics Data System (ADS)
Ganea, Ion Eugen
2015-09-01
In this paper is presented a method to represent the events extracted from images sequences and the query language used for events detection. Using an object oriented model the spatial and temporal relationships between salient objects and also between events are stored and queried. This works aims to unify the storing and querying phases for video events processing. The object oriented language syntax used for events processing allow the instantiation of the indexes classes in order to improve the accuracy of the query results. The experiments were performed on images sequences provided from sport domain and it shows the reliability and the robustness of the proposed language. To extend the language will be added a specific syntax for constructing the templates for abnormal events and for detection of the incidents as the final goal of the research.
Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.
2011-01-01
Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308
Novel Surveillance of Psychological Distress during the Great Recession
Ayers, John W.; Althouse, Benjamin M.; Allem, Jon-Patrick; Childers, Matthew A.; Zafar, Waleed; Latkin, Carl; Ribisl, Kurt M.; Brownstein, John S.
2015-01-01
Background Economic stressors have been retrospectively associated with net population increases in nonspecific psychological distress (PD). However, no sentinels exist to evaluate contemporaneous associations. Aggregate Internet search query surveillance was used to monitor population changes in PD around the United States’ Great Recession. Methods Monthly PD query trends were compared with unemployment, underemployment, homes in delinquency and foreclosure, median home value or sale prices, and S&P 500 trends for 2004–2010. Time series analyses, where economic indicators predicted PD one to seven months into the future, were performed in 2011. Results PD queries surpassed 1,000,000 per month, of which 300,000 may be attributable to the Great Recession. A one percentage point increase in mortgage delinquencies and foreclosures was associated with a 16% (95%CI, 9–24) increase in PD queries one-month, and 11% (95%CI, 3–18) four months later, in reference to a pre-Great Recession mean. Unemployment and underemployment had similar associations half and one-quarter the intensity. “Anxiety disorder,” “what is depression,” “signs of depression,” “depression symptoms,” and “symptoms of depression” were the queries exhibiting the strongest associations with mortgage delinquencies and foreclosures, unemployment or underemployment. Housing prices and S&P 500 trends were not associated with PD queries. Limitations A non-traditional measure of PD was used. It is unclear if actual clinically significant depression or anxiety increased during the Great Recession. Alternative explanations for strong associations between the Great Recession and PD queries, such as media, were explored and rejected. Conclusions Because the economy is constantly changing, this work not only provides a snapshot of recent associations between the economy and PD queries but also a framework and toolkit for real-time surveillance going forward. Health resources, clinician screening patterns, and policy debate may potentially be informed by changes in PD query trends. PMID:22835843
Novel surveillance of psychological distress during the great recession.
Ayers, John W; Althouse, Benjamin M; Allem, Jon-Patrick; Childers, Matthew A; Zafar, Waleed; Latkin, Carl; Ribisl, Kurt M; Brownstein, John S
2012-12-15
Economic stressors have been retrospectively associated with net population increases in nonspecific psychological distress (PD). However, no sentinels exist to evaluate contemporaneous associations. Aggregate Internet search query surveillance was used to monitor population changes in PD around the United States' Great Recession. Monthly PD query trends were compared with unemployment, underemployment, homes in delinquency and foreclosure, median home value or sale prices, and S&P 500 trends for 2004-2010. Time series analyses, where economic indicators predicted PD one to seven months into the future, were performed in 2011. PD queries surpassed 1,000,000 per month, of which 300,000 may be attributable to the Great Recession. A one percentage point increase in mortgage delinquencies and foreclosures was associated with a 16% (95%CI, 9-24) increase in PD queries one-month, and 11% (95%CI, 3-18) four months later, in reference to a pre-Great Recession mean. Unemployment and underemployment had similar associations half and one-quarter the intensity. "Anxiety disorder", "what is depression", "signs of depression", "depression symptoms", and "symptoms of depression" were the queries exhibiting the strongest associations with mortgage delinquencies and foreclosures, unemployment or underemployment. Housing prices and S&P 500 trends were not associated with PD queries. A non-traditional measure of PD was used. It is unclear if actual clinically significant depression or anxiety increased during the Great Recession. Alternative explanations for strong associations between the Great Recession and PD queries, such as media, were explored and rejected. Because the economy is constantly changing, this work not only provides a snapshot of recent associations between the economy and PD queries but also a framework and toolkit for real-time surveillance going forward. Health resources, clinician screening patterns, and policy debate may be informed by changes in PD query trends. Copyright © 2012 Elsevier B.V. All rights reserved.
A natural language interface plug-in for cooperative query answering in biological databases.
Jamil, Hasan M
2012-06-11
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
UGent Participation in the Microblog Track 2012
2012-11-01
original one. A counter example is the query “ Steve Jobs ’ health” in topic 106. For this query, user wants to know about the health situation of Steve ...5x), “destroyer” (2x), “creator” (3x), “composed” (2x), and “products” (2x). These terms can be formed another query like: “ Steve Jobs is... Jobs ’. However, the returned title terms of top 30 tweets for this query are “steve” (7x), “jobs” (7x), “is” (2x), “of” (3x), “health” (1x), “apple
Hybrid Schema Matching for Deep Web
NASA Astrophysics Data System (ADS)
Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng
Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gu, Shengyin; Anderson, Iain; Kunin, Victor
2007-05-07
Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
Development of a Carbon Sequestration Visualization Tool using Google Earth Pro
NASA Astrophysics Data System (ADS)
Keating, G. N.; Greene, M. K.
2008-12-01
The Big Sky Carbon Sequestration Partnership seeks to prepare organizations throughout the western United States for a possible carbon-constrained economy. Through the development of CO2 capture and subsurface sequestration technology, the Partnership is working to enable the region to cleanly utilize its abundant fossil energy resources. The intent of the Los Alamos National Laboratory Big Sky Visualization tool is to allow geochemists, geologists, geophysicists, project managers, and other project members to view, identify, and query the data collected from CO2 injection tests using a single data source platform, a mission to which Google Earth Pro is uniquely and ideally suited . The visualization framework enables fusion of data from disparate sources and allows investigators to fully explore spatial and temporal trends in CO2 fate and transport within a reservoir. 3-D subsurface wells are projected above ground in Google Earth as the KML anchor points for the presentation of various surface subsurface data. This solution is the most integrative and cost-effective possible for the variety of users in the Big Sky community.
Classification of Automated Search Traffic
NASA Astrophysics Data System (ADS)
Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.
As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.
Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.
Kim, Sun; Yeganova, Lana; Wilbur, W John
2016-10-01
Medical Subject Headings (MeSH(®)) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed(®) for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries. https://www.ncbi.nlm.nih.gov/IRET/MESHABLE/ CONTACT: sun.kim@nih.gov Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Web queries as a source for syndromic surveillance.
Hulth, Anette; Rydevik, Gustaf; Linde, Annika
2009-01-01
In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.
ERIC Educational Resources Information Center
Hancock-Beaulieu, Micheline; And Others
1995-01-01
An online library catalog was used to evaluate an interactive query expansion facility based on relevance feedback for the Okapi, probabilistic, term weighting, retrieval system. A graphical user interface allowed searchers to select candidate terms extracted from relevant retrieved items to reformulate queries. Results suggested that the…
A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805
ERIC Educational Resources Information Center
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
2011-01-01
Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…
Query Classification and Study of University Students' Search Trends
ERIC Educational Resources Information Center
Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.
2012-01-01
Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…
Searching the Web: The Public and Their Queries.
ERIC Educational Resources Information Center
Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko
2001-01-01
Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…
The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.
ERIC Educational Resources Information Center
Peat, Helen J.; Willett, Peter
1991-01-01
Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…
Computing health quality measures using Informatics for Integrating Biology and the Bedside.
Klann, Jeffrey G; Murphy, Shawn N
2013-04-19
The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.
Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside
Murphy, Shawn N
2013-01-01
Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227
Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.
Yom-Tov, Elad; Lev-Ran, Shaul
2017-10-26
Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. ©Elad Yom-Tov, Shaul Lev-Ran. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 26.10.2017.
An SQL query generator for CLIPS
NASA Technical Reports Server (NTRS)
Snyder, James; Chirica, Laurian
1990-01-01
As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.
A Coding Method for Efficient Subgraph Querying on Vertex- and Edge-Labeled Graphs
Zhu, Lei; Song, Qinbao; Guo, Yuchen; Du, Lei; Zhu, Xiaoyan; Wang, Guangtao
2014-01-01
Labeled graphs are widely used to model complex data in many domains, so subgraph querying has been attracting more and more attention from researchers around the world. Unfortunately, subgraph querying is very time consuming since it involves subgraph isomorphism testing that is known to be an NP-complete problem. In this paper, we propose a novel coding method for subgraph querying that is based on Laplacian spectrum and the number of walks. Our method follows the filtering-and-verification framework and works well on graph databases with frequent updates. We also propose novel two-step filtering conditions that can filter out most false positives and prove that the two-step filtering conditions satisfy the no-false-negative requirement (no dismissal in answers). Extensive experiments on both real and synthetic graphs show that, compared with six existing counterpart methods, our method can effectively improve the efficiency of subgraph querying. PMID:24853266
Graham, Ian D; Tetroe, Jacqueline
2009-01-01
As the recent collection of papers from the Quality Enhancement Research Initiative (QUERI) Series indicates, knowledge is leading to considerable action in the United States (U.S.) Department of Veterans Affairs (VA). The QUERI Series offers clinical researchers, implementation scientists, health systems, and health research funders from around the globe a unique window into the both the practice and science of implementation or knowledge translation (KT) in the VA. By describing successes and challenges as well as setbacks and disappointments, the QUERI Series is all the more useful. From the vantage point of Canadian KT researchers and officials at a national health research funding agency, we offer a number of observations and lessons that can be learned from QUERI. "Knowledge, if it does not determine action, is dead to us." Plotinus (Roman philosopher 205AD-270AD) PMID:19267920
QRFXFreeze: Queryable Compressor for RFX.
Senthilkumar, Radha; Nandagopal, Gomathi; Ronald, Daphne
2015-01-01
The verbose nature of XML has been mulled over again and again and many compression techniques for XML data have been excogitated over the years. Some of the techniques incorporate support for querying the XML database in its compressed format while others have to be decompressed before they can be queried. XML compression in which querying is directly supported instantaneously with no compromise over time is forced to compromise over space. In this paper, we propose the compressor, QRFXFreeze, which not only reduces the space of storage but also supports efficient querying. The compressor does this without decompressing the compressed XML file. The compressor supports all kinds of XML documents along with insert, update, and delete operations. The forte of QRFXFreeze is that the textual data are semantically compressed and are indexed to reduce the querying time. Experimental results show that the proposed compressor performs much better than other well-known compressors.
Practical private database queries based on a quantum-key-distribution protocol
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jakobi, Markus; Humboldt-Universitaet zu Berlin, D-10117 Berlin; Simon, Christoph
2011-02-15
Private queries allow a user, Alice, to learn an element of a database held by a provider, Bob, without revealing which element she is interested in, while limiting her information about the other elements. We propose to implement private queries based on a quantum-key-distribution protocol, with changes only in the classical postprocessing of the key. This approach makes our scheme both easy to implement and loss tolerant. While unconditionally secure private queries are known to be impossible, we argue that an interesting degree of security can be achieved by relying on fundamental physical principles instead of unverifiable security assumptions inmore » order to protect both the user and the database. We think that the scope exists for such practical private queries to become another remarkable application of quantum information in the footsteps of quantum key distribution.« less
Data Processing on Database Management Systems with Fuzzy Query
NASA Astrophysics Data System (ADS)
Şimşek, Irfan; Topuz, Vedat
In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.
An intelligent user interface for browsing satellite data catalogs
NASA Technical Reports Server (NTRS)
Cromp, Robert F.; Crook, Sharon
1989-01-01
A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.
ESTminer: a Web interface for mining EST contig and cluster databases.
Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R
2005-03-01
ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.
Wang, Amy Y; Lancaster, William J; Wyatt, Matthew C; Rasmussen, Luke V; Fort, Daniel G; Cimino, James J
2017-01-01
A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: "easy" (supporting automated queries), mixed (initial automated querying with manual review), "hard" (fully manual record review), and "impossible" or "point of enrollment" (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals.
Wang, Amy Y.; Lancaster, William J.; Wyatt, Matthew C.; Rasmussen, Luke V.; Fort, Daniel G.; Cimino, James J.
2017-01-01
A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: “easy” (supporting automated queries), mixed (initial automated querying with manual review), “hard” (fully manual record review), and “impossible” or “point of enrollment” (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals. PMID:29854246
Use of controlled vocabularies to improve biomedical information retrieval tasks.
Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian
2013-01-01
The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.
Building a Smart Portal for Astronomy
NASA Astrophysics Data System (ADS)
Derriere, S.; Boch, T.
2011-07-01
The development of a portal for accessing astronomical resources is not an easy task. The ever-increasing complexity of the data products can result in very complex user interfaces, requiring a lot of effort and learning from the user in order to perform searches. This is often a design choice, where the user must explicitly set many constraints, while the portal search logic remains simple. We investigated a different approach, where the query interface is kept as simple as possible (ideally, a simple text field, like for Google search), and the search logic is made much more complex to interpret the query in a relevant manner. We will present the implications of this approach in terms of interpretation and categorization of the query parameters (related to astronomical vocabularies), translation (mapping) of these concepts into the portal components metadata, identification of query schemes and use cases matching the input parameters, and delivery of query results to the user.
Producing approximate answers to database queries
NASA Technical Reports Server (NTRS)
Vrbsky, Susan V.; Liu, Jane W. S.
1993-01-01
We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.
Content-aware network storage system supporting metadata retrieval
NASA Astrophysics Data System (ADS)
Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun
2008-12-01
Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.
Remembrance of inferences past: Amortization in human hypothesis generation.
Dasgupta, Ishita; Schulz, Eric; Goodman, Noah D; Gershman, Samuel J
2018-05-21
Bayesian models of cognition assume that people compute probability distributions over hypotheses. However, the required computations are frequently intractable or prohibitively expensive. Since people often encounter many closely related distributions, selective reuse of computations (amortized inference) is a computationally efficient use of the brain's limited resources. We present three experiments that provide evidence for amortization in human probabilistic reasoning. When sequentially answering two related queries about natural scenes, participants' responses to the second query systematically depend on the structure of the first query. This influence is sensitive to the content of the queries, only appearing when the queries are related. Using a cognitive load manipulation, we find evidence that people amortize summary statistics of previous inferences, rather than storing the entire distribution. These findings support the view that the brain trades off accuracy and computational cost, to make efficient use of its limited cognitive resources to approximate probabilistic inference. Copyright © 2018 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Lyall-Wilson, Jennifer Rae
2013-01-01
The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases
NASA Technical Reports Server (NTRS)
Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)
2017-01-01
The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
Digitizing Consumption Across the Operational Spectrum
2014-09-01
Figure 14. Java -implemented Dictionary and Query: Result ............................................22 Figure 15. Global Database Architecture...format. Figure 14 is an illustration of the query submitted in Java and the result which would be shown using the data shown in Figure 13. Figure...13. NoSQL (key, value) Dictionary Example 22 Figure 14. Java -implemented Dictionary and Query: Result While a
Pilot Study on the Prevalence of Imposed Queries in a School Library Media Center.
ERIC Educational Resources Information Center
Gross, Melissa
1997-01-01
Discussion of information-seeking behavior focuses on a study of the imposed query, as opposed to self-generated queries, in an elementary school library media center in order to quantify its presence, to record characteristics of the users that carry them, and to identify the persons imposing them. The coding sheet is appended. Contains one table…
Self-enforcing Private Inference Control
NASA Astrophysics Data System (ADS)
Yang, Yanjiang; Li, Yingjiu; Weng, Jian; Zhou, Jianying; Bao, Feng
Private inference control enables simultaneous enforcement of inference control and protection of users' query privacy. Private inference control is a useful tool for database applications, especially when users are increasingly concerned about individual privacy nowadays. However, protection of query privacy on top of inference control is a double-edged sword: without letting the database server know the content of user queries, users can easily launch DoS attacks. To assuage DoS attacks in private inference control, we propose the concept of self-enforcing private inference control, whose intuition is to force users to only make inference-free queries by enforcing inference control themselves; otherwise, penalty will inflict upon the violating users.
Usage of the Jess Engine, Rules and Ontology to Query a Relational Database
NASA Astrophysics Data System (ADS)
Bak, Jaroslaw; Jedrzejek, Czeslaw; Falkowski, Maciej
We present a prototypical implementation of a library tool, the Semantic Data Library (SDL), which integrates the Jess (Java Expert System Shell) engine, rules and ontology to query a relational database. The tool extends functionalities of previous OWL2Jess with SWRL implementations and takes full advantage of the Jess engine, by separating forward and backward reasoning. The optimization of integration of all these technologies is an advancement over previous tools. We discuss the complexity of the query algorithm. As a demonstration of capability of the SDL library, we execute queries using crime ontology which is being developed in the Polish PPBW project.
IQARIS : a tool for the intelligent querying, analysis, and retrieval from information systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hummel, J. R.; Silver, R. B.
Information glut is one of the primary characteristics of the electronic age. Managing such large volumes of information (e.g., keeping track of the types, where they are, their relationships, who controls them, etc.) can be done efficiently with an intelligent, user-oriented information management system. The purpose of this paper is to describe a concept for managing information resources based on an intelligent information technology system developed by the Argonne National Laboratory for managing digital libraries. The Argonne system, Intelligent Query (IQ), enables users to query digital libraries and view the holdings that match the query from different perspectives.
Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.
Combi, C.; Missora, L.; Pinciroli, F.
1996-01-01
Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722
Path querying system on mobile devices
NASA Astrophysics Data System (ADS)
Lin, Xing; Wang, Yifei; Tian, Yuan; Wu, Lun
2006-01-01
Traditional approaches to path querying problems are not efficient and convenient under most circumstances. A more convenient and reliable approach to this problem has to be found. This paper is devoted to a path querying solution on mobile devices. By using an improved Dijkstra's shortest path algorithm and a natural language translating module, this system can help people find the shortest path between two places through their cell phones or other mobile devices. The chosen path is prompted in text of natural language, as well as a map picture. This system would be useful in solving best path querying problems and have potential to be a profitable business system.
Characterizing Listener Engagement with Popular Songs Using Large-Scale Music Discovery Data
Kaneshiro, Blair; Ruan, Feng; Baker, Casey W.; Berger, Jonathan
2017-01-01
Music discovery in everyday situations has been facilitated in recent years by audio content recognition services such as Shazam. The widespread use of such services has produced a wealth of user data, specifying where and when a global audience takes action to learn more about music playing around them. Here, we analyze a large collection of Shazam queries of popular songs to study the relationship between the timing of queries and corresponding musical content. Our results reveal that the distribution of queries varies over the course of a song, and that salient musical events drive an increase in queries during a song. Furthermore, we find that the distribution of queries at the time of a song's release differs from the distribution following a song's peak and subsequent decline in popularity, possibly reflecting an evolution of user intent over the “life cycle” of a song. Finally, we derive insights into the data size needed to achieve consistent query distributions for individual songs. The combined findings of this study suggest that music discovery behavior, and other facets of the human experience of music, can be studied quantitatively using large-scale industrial data. PMID:28386241
Advances in nowcasting influenza-like illness rates using search query logs
NASA Astrophysics Data System (ADS)
Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian
2015-08-01
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin
2014-12-18
Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message a1a2···al from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each a(i) contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.
Semantic-based surveillance video retrieval.
Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve
2007-04-01
Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Measuring Up: Implementing a Dental Quality Measure in the Electronic Health Record Context
Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F
2015-01-01
Background Quality improvement requires quality measures that are validly implementable. In this work, we assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure (percentage of children who received fluoride varnish). Methods We defined how to implement the automated measure queries in a dental electronic health record (EHR). Within records identified through automated query, we manually reviewed a subsample to assess the performance of the query. Results The automated query found 71.0% of patients to have had fluoride varnish compared to 77.6% found using the manual chart review. The automated quality measure performance was 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. Conclusions Our findings support the feasibility of automated dental quality measure queries in the context of sufficient structured data. Information noted only in the free text rather than in structured data would require natural language processing approaches to effectively query. Practical Implications To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation in order to support near-term automated calculation of quality measures. PMID:26562736
Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram
2013-01-01
The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over amore » reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.« less
Advances in nowcasting influenza-like illness rates using search query logs.
Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian
2015-08-03
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset
NASA Astrophysics Data System (ADS)
Liu, Haicheng; Xiao, Xiao
2016-11-01
Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.
Accessing the public MIMIC-II intensive care relational database for clinical research.
Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G
2013-01-10
The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.
Teng, Rui; Zhang, Bing
2011-01-01
On-demand information retrieval enables users to query and collect up-to-date sensing information from sensor nodes. Since high energy efficiency is required in a sensor network, it is desirable to disseminate query messages with small traffic overhead and to collect sensing data with low energy consumption. However, on-demand query messages are generally forwarded to sensor nodes in network-wide broadcasts, which create large traffic overhead. In addition, since on-demand information retrieval may introduce intermittent and spatial data collections, the construction and maintenance of conventional aggregation structures such as clusters and chains will be at high cost. In this paper, we propose an on-demand information retrieval approach that exploits the name resolution of data queries according to the attribute and location of each sensor node. The proposed approach localises each query dissemination and enable localised data collection with maximised aggregation. To illustrate the effectiveness of the proposed approach, an analytical model that describes the criteria of sink proxy selection is provided. The evaluation results reveal that the proposed scheme significantly reduces energy consumption and improves the balance of energy consumption among sensor nodes by alleviating heavy traffic near the sink.
Characterizing Listener Engagement with Popular Songs Using Large-Scale Music Discovery Data.
Kaneshiro, Blair; Ruan, Feng; Baker, Casey W; Berger, Jonathan
2017-01-01
Music discovery in everyday situations has been facilitated in recent years by audio content recognition services such as Shazam. The widespread use of such services has produced a wealth of user data, specifying where and when a global audience takes action to learn more about music playing around them. Here, we analyze a large collection of Shazam queries of popular songs to study the relationship between the timing of queries and corresponding musical content. Our results reveal that the distribution of queries varies over the course of a song, and that salient musical events drive an increase in queries during a song. Furthermore, we find that the distribution of queries at the time of a song's release differs from the distribution following a song's peak and subsequent decline in popularity, possibly reflecting an evolution of user intent over the "life cycle" of a song. Finally, we derive insights into the data size needed to achieve consistent query distributions for individual songs. The combined findings of this study suggest that music discovery behavior, and other facets of the human experience of music, can be studied quantitatively using large-scale industrial data.
Generalized query-based active learning to identify differentially methylated regions in DNA.
Haque, Md Muksitul; Holder, Lawrence B; Skinner, Michael K; Cook, Diane J
2013-01-01
Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.
Shuttle-Data-Tape XML Translator
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2005-01-01
JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.
Kawazoe, Yoshimasa; Imai, Takeshi; Ohe, Kazuhiko
2016-04-05
Health level seven version 2.5 (HL7 v2.5) is a widespread messaging standard for information exchange between clinical information systems. By applying Semantic Web technologies for handling HL7 v2.5 messages, it is possible to integrate large-scale clinical data with life science knowledge resources. Showing feasibility of a querying method over large-scale resource description framework (RDF)-ized HL7 v2.5 messages using publicly available drug databases. We developed a method to convert HL7 v2.5 messages into the RDF. We also converted five kinds of drug databases into RDF and provided explicit links between the corresponding items among them. With those linked drug data, we then developed a method for query expansion to search the clinical data using semantic information on drug classes along with four types of temporal patterns. For evaluation purpose, medication orders and laboratory test results for a 3-year period at the University of Tokyo Hospital were used, and the query execution times were measured. Approximately 650 million RDF triples for medication orders and 790 million RDF triples for laboratory test results were converted. Taking three types of query in use cases for detecting adverse events of drugs as an example, we confirmed these queries were represented in SPARQL Protocol and RDF Query Language (SPARQL) using our methods and comparison with conventional query expressions were performed. The measurement results confirm that the query time is feasible and increases logarithmically or linearly with the amount of data and without diverging. The proposed methods enabled query expressions that separate knowledge resources and clinical data, thereby suggesting the feasibility for improving the usability of clinical data by enhancing the knowledge resources. We also demonstrate that when HL7 v2.5 messages are automatically converted into RDF, searches are still possible through SPARQL without modifying the structure. As such, the proposed method benefits not only our hospitals, but also numerous hospitals that handle HL7 v2.5 messages. Our approach highlights a potential of large-scale data federation techniques to retrieve clinical information, which could be applied as applications of clinical intelligence to improve clinical practices, such as adverse drug event monitoring and cohort selection for a clinical study as well as discovering new knowledge from clinical information.
Constraining soil C cycling with strategic, adaptive action for data and model reporting
NASA Astrophysics Data System (ADS)
Harden, J. W.; Swanston, C.; Hugelius, G.
2015-12-01
Regional to global carbon assessments include a variety of models, data sets, and conceptual structures. This includes strategies for representing the role and capacity of soils to sequester, release, and store carbon. Traditionally, many soil carbon data sets emerged from agricultural missions focused on mapping and classifying soils to enhance and protect production of food and fiber. More recently, soil carbon assessments have allowed for more strategic measurement to address the functional and spatially explicit role that soils play in land-atmosphere carbon exchange. While soil data sets are increasingly inter-comparable and increasingly sampled to accommodate global assessments, soils remain poorly constrained or understood with regard to their role in spatio-temporal variations in carbon exchange. A more deliberate approach to rapid improvement in our understanding involves a community-based activity than embraces both a nimble data repository and a dynamic structure for prioritization. Data input and output can be transparent and retrievable as data-derived products, while also being subjected to rigorous queries for merging and harmonization into a searchable, comprehensive, transparent database. Meanwhile, adaptive action groups can prioritize data and modeling needs that emerge through workshops, meta-data analyses or model testing. Our continual renewal of priorities should address soil processes, mechanisms, and feedbacks that significantly influence global C budgets and/or significantly impact the needs and services of regional soil resources that are impacted by C management. In order to refine the International Soil Carbon Network, we welcome suggestions for such groups to be led on topics such as but not limited to manipulation experiments, extreme climate events, post-disaster C management, past climate-soil interactions, or water-soil-carbon linkages. We also welcome ideas for a business model that can foster and promote idea and data sharing.
Towards computational improvement of DNA database indexing and short DNA query searching.
Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska
2014-09-03
In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.
muBLASTP: database-indexed protein sequence search on multicore CPUs.
Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun
2016-11-04
The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
GO2PUB: Querying PubMed with semantic expansion of gene ontology terms
2012-01-01
Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org. PMID:22958570
Experimental evaluation of certification trails using abstract data type validation
NASA Technical Reports Server (NTRS)
Wilson, Dwight S.; Sullivan, Gregory F.; Masson, Gerald M.
1993-01-01
Certification trails are a recently introduced and promising approach to fault-detection and fault-tolerance. Recent experimental work reveals many cases in which a certification-trail approach allows for significantly faster program execution time than a basic time-redundancy approach. Algorithms for answer-validation of abstract data types allow a certification trail approach to be used for a wide variety of problems. An attempt to assess the performance of algorithms utilizing certification trails on abstract data types is reported. Specifically, this method was applied to the following problems: heapsort, Hullman tree, shortest path, and skyline. Previous results used certification trails specific to a particular problem and implementation. The approach allows certification trails to be localized to 'data structure modules,' making the use of this technique transparent to the user of such modules.
VizieR Online Data Catalog: Investigating Tully-Fisher relation with KMOS3D (Ubler+,
NASA Astrophysics Data System (ADS)
Ubler, H.; Forster Schreiber, N. M.; Genzel, R.; Wisnioski, E.; Wuyts, S.; Lang, P.; Naab, T.; Burkert, A.; van Dokkum, P. G.; Tacconi, L. J.; Wilman, D. J.; Fossati, M.; Mendel, J. T.; Beifiori, A.; Belli, S.; Bender, R.; Brammer, G. B.; Chan, J.; Davies, R.; Fabricius, M.; Galametz, A.; Lutz, D.; Momcheva, I. G.; Nelson, E. J.; Saglia, R. P.; Seitz, S.; Tadaki, K.
2018-02-01
This work is based on the first 3yr of observations of KMOS3D multiyear near-infrared (near-IR) IFS survey of more than 600 mass-selected star-forming galaxies (SFGs) at 0.6<~z<~2.6 with the K-band Multi Object Spectrograph (KMOS; Sharples+ 2013Msngr.151...21S) on the Very Large Telescope. The KMOS3D survey and data reduction are described in detail by Wisnioski et al. 2015ApJ...799..209W The results presented in this paper build on the KMOS3D sample as of 2016 January, with 536 observed galaxies. Of these, 316 are detected in, and have spatially resolved, Hα emission free from skyline contamination from which two-dimensional velocity and dispersion maps are produced. (1 data file).
Closer Look: Majestic Mountains and Frozen Plains
2015-09-17
Just 15 minutes after its closest approach to Pluto on July 14, 2015, NASA's New Horizons spacecraft looked back toward the sun and captured a near-sunset view of the rugged, icy mountains and flat ice plains extending to Pluto's horizon. The smooth expanse of the informally named Sputnik Planum (right) is flanked to the west (left) by rugged mountains up to 11,000 feet (3,500 meters) high, including the informally named Norgay Montes in the foreground and Hillary Montes on the skyline. The backlighting highlights more than a dozen layers of haze in Pluto's tenuous but distended atmosphere. The image was taken from a distance of 11,000 miles (18,000 kilometers) to Pluto; the scene is 230 miles (380 kilometers) across. http://photojournal.jpl.nasa.gov/catalog/PIA19947
Preparing to Test for Deep Space
2015-07-15
A structural steel section is lifted into place atop the B-2 Test Stand at NASA’s Stennis Space Center as part of modification work to prepare for testing the core stage of NASA’s new Space Launch System. The section is part of the Main Propulsion Test Article (MPTA) framework, which will support the SLS core stage for testing. The existing framework was installed on the stand in the late 1970s to test the shuttle MPTA. However, that framework had to be repositioned and modified to accommodate the larger SLS stage. About 1 million pounds of structural steel has been added, extending the framework about 100 feet higher and providing a new look to the Stennis skyline. Stennis will test the actual flight core stage for the first uncrewed SLS mission, Exploration Mission-1.
Active Wiki Knowledge Repository
2012-10-01
data using SPARQL queries or RESTful web-services; ‘gardening’ tools for examining the semantically tagged content in the wiki; high-level language tool...Tagging & RDF triple-store Fusion and inferences for collaboration Tools for Consuming Data SPARQL queries or RESTful WS Inference & Gardening tools...other stores using AW SPARQL queries and rendering templates; and 4) Interactively share maps and other content using annotation tools to post notes
Horvath, Dragos; Marcou, Gilles; Varnek, Alexandre
2013-07-22
This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter α. Both initial "guesses" of α = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html , comparing favorably to Tanimoto in terms of "scaffold hopping" propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by "pro-query" Tversky scores at α > 0.5. Among simpler queries, one may distinguish between "growable" (allowing for active analogs with additional features), and a few "conservative" queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of "pro-candidate" biased Tversky scores at α < 0.5.
Federated ontology-based queries over cancer data
2012-01-01
Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043
The role of organizational research in implementing evidence-based practice: QUERI Series
Yano, Elizabeth M
2008-01-01
Background Health care organizations exert significant influence on the manner in which clinicians practice and the processes and outcomes of care that patients experience. A greater understanding of the organizational milieu into which innovations will be introduced, as well as the organizational factors that are likely to foster or hinder the adoption and use of new technologies, care arrangements and quality improvement (QI) strategies are central to the effective implementation of research into practice. Unfortunately, much implementation research seems to not recognize or adequately address the influence and importance of organizations. Using examples from the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI), we describe the role of organizational research in advancing the implementation of evidence-based practice into routine care settings. Methods Using the six-step QUERI process as a foundation, we present an organizational research framework designed to improve and accelerate the implementation of evidence-based practice into routine care. Specific QUERI-related organizational research applications are reviewed, with discussion of the measures and methods used to apply them. We describe these applications in the context of a continuum of organizational research activities to be conducted before, during and after implementation. Results Since QUERI's inception, various approaches to organizational research have been employed to foster progress through QUERI's six-step process. We report on how explicit integration of the evaluation of organizational factors into QUERI planning has informed the design of more effective care delivery system interventions and enabled their improved "fit" to individual VA facilities or practices. We examine the value and challenges in conducting organizational research, and briefly describe the contributions of organizational theory and environmental context to the research framework. Conclusion Understanding the organizational context of delivering evidence-based practice is a critical adjunct to efforts to systematically improve quality. Given the size and diversity of VA practices, coupled with unique organizational data sources, QUERI is well-positioned to make valuable contributions to the field of implementation science. More explicit accommodation of organizational inquiry into implementation research agendas has helped QUERI researchers to better frame and extend their work as they move toward regional and national spread activities. PMID:18510749
Developing A Web-based User Interface for Semantic Information Retrieval
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Keller, Richard M.
2003-01-01
While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
NASA Astrophysics Data System (ADS)
Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo
2008-06-01
We propose a cheat sensitive quantum protocol to perform a private search on a classical database which is efficient in terms of communication complexity. It allows a user to retrieve an item from the database provider without revealing which item he or she retrieved: if the provider tries to obtain information on the query, the person querying the database can find it out. The protocol ensures also perfect data privacy of the database: the information that the user can retrieve in a single query is bounded and does not depend on the size of the database. With respect to the known (quantum and classical) strategies for private information retrieval, our protocol displays an exponential reduction in communication complexity and in running-time computational complexity.
Sexual information seeking on web search engines.
Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles
2004-02-01
Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
Querying databases of trajectories of differential equations: Data structures for trajectories
NASA Technical Reports Server (NTRS)
Grossman, Robert
1989-01-01
One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.
System and method for responding to ground and flight system malfunctions
NASA Technical Reports Server (NTRS)
Anderson, Julie J. (Inventor); Fussell, Ronald M. (Inventor)
2010-01-01
A system for on-board anomaly resolution for a vehicle has a data repository. The data repository stores data related to different systems, subsystems, and components of the vehicle. The data stored is encoded in a tree-based structure. A query engine is coupled to the data repository. The query engine provides a user and automated interface and provides contextual query to the data repository. An inference engine is coupled to the query engine. The inference engine compares current anomaly data to contextual data stored in the data repository using inference rules. The inference engine generates a potential solution to the current anomaly by referencing the data stored in the data repository.
Huebner-Bloder, Gudrun; Duftschmid, Georg; Kohler, Michael; Rinner, Christoph; Saboor, Samrend; Ammenwerth, Elske
2012-01-01
Cross-institutional longitudinal Electronic Health Records (EHR), as introduced in Austria at the moment, increase the challenge of information overload of healthcare professionals. We developed an innovative cross-institutional EHR query prototype that offers extended query options, including searching for specific information items or sets of information items. The available query options were derived from a systematic analysis of information needs of diabetes specialists during patient encounters. The prototype operates in an IHE-XDS-based environment where ISO/EN 13606-structured documents are available. We conducted a controlled study with seven diabetes specialists to assess the feasibility and impact of this EHR query prototype on efficient retrieving of patient information to answer typical clinical questions. The controlled study showed that the specialists were quicker and more successful (measured in percentage of expected information items found) in finding patient information compared to the standard full-document search options. The participants also appreciated the extended query options. PMID:23304308
NASA Astrophysics Data System (ADS)
Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge
The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.
NASA Astrophysics Data System (ADS)
Tan, Kian Lam; Lim, Chen Kim
2017-10-01
With the explosive growth of online information such as email messages, news articles, and scientific literature, many institutions and museums are converting their cultural collections from physical data to digital format. However, this conversion resulted in the issues of inconsistency and incompleteness. Besides, the usage of inaccurate keywords also resulted in short query problem. Most of the time, the inconsistency and incompleteness are caused by the aggregation fault in annotating a document itself while the short query problem is caused by naive user who has prior knowledge and experience in cultural heritage domain. In this paper, we presented an approach to solve the problem of inconsistency, incompleteness and short query by incorporating the Term Similarity Matrix into the Language Model. Our approach is tested on the Cultural Heritage in CLEF (CHiC) collection which consists of short queries and documents. The results show that the proposed approach is effective and has improved the accuracy in retrieval time.
Markó, K; Schulz, S; Hahn, U
2005-01-01
We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain. Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents. We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline. Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contra-distinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.
Active Learning by Querying Informative and Representative Examples.
Huang, Sheng-Jun; Jin, Rong; Zhou, Zhi-Hua
2014-10-01
Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.
A Dimensional Bus model for integrating clinical and research data.
Wade, Ted D; Hum, Richard C; Murphy, James R
2011-12-01
Many clinical research data integration platforms rely on the Entity-Attribute-Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Borrowing concepts from Entity-Attribute-Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. The design provides a workable way to manage and query mixed schemas in a data warehouse.
A Comparison of Query-by-Example Methods for Spoken Term Detection
2009-09-01
consistent “errors” between the in- dex and the query. Few query terms have more than one pro- nunciation (avg. 1.1 prons . per term), as a result, there is... pron lex. one dict entry (llr) 73.01 47.66 21.11 all dict entries (avg+llr) 73.99 48.16 20.92 all dict entries (max+llr) 74.27 48.26 20.93 Table 1