document retrieval: Topics by Science.gov

Sample records for document retrieval

The present status and problems in document retrieval system : document input type retrieval system

NASA Astrophysics Data System (ADS)

Inagaki, Hirohito

The office-automation (OA) made many changes. Many documents were begun to maintained in an electronic filing system. Therefore, it is needed to establish efficient document retrieval system to extract useful information. Current document retrieval systems are using simple word-matching, syntactic-matching, semantic-matching to obtain high retrieval efficiency. On the other hand, the document retrieval systems using special hardware devices, such as ISSP, were developed for aiming high speed retrieval. Since these systems can accept a single sentence or keywords as input, it is difficult to explain searcher's request. We demonstrated document input type retrieval system, which can directly accept document as an input, and can search similar documents from document data-base.
Document image retrieval through word shape coding.

PubMed

Lu, Shijian; Li, Linlin; Tan, Chew Lim

2008-11-01

This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
Information Retrieval: A Sequential Learning Process.

ERIC Educational Resources Information Center

Bookstein, Abraham

1983-01-01

Presents decision-theoretic models which intrinsically include retrieval of multiple documents whereby system responds to request by presenting documents to patron in sequence, gathering feedback, and using information to modify future retrievals. Document independence model, set retrieval model, sequential retrieval model, learning model,…
Analyzing Document Retrievability in Patent Retrieval Settings

NASA Astrophysics Data System (ADS)

Bashir, Shariq; Rauber, Andreas

Most information retrieval settings, such as web search, are typically precision-oriented, i.e. they focus on retrieving a small number of highly relevant documents. However, in specific domains, such as patent retrieval or law, recall becomes more relevant than precision: in these cases the goal is to find all relevant documents, requiring algorithms to be tuned more towards recall at the cost of precision. This raises important questions with respect to retrievability and search engine bias: depending on how the similarity between a query and documents is measured, certain documents may be more or less retrievable in certain systems, up to some documents not being retrievable at all within common threshold settings. Biases may be oriented towards popularity of documents (increasing weight of references), towards length of documents, favour the use of rare or common words; rely on structural information such as metadata or headings, etc. Existing accessibility measurement techniques are limited as they measure retrievability with respect to all possible queries. In this paper, we improve accessibility measurement by considering sets of relevant and irrelevant queries for each document. This simulates how recall oriented users create their queries when searching for relevant information. We evaluate retrievability scores using a corpus of patents from US Patent and Trademark Office.
INFORMATION STORAGE AND RETRIEVAL, REPORTS ON EVALUATION PROCEDURES AND RESULTS 1965-1967.

ERIC Educational Resources Information Center

SALTON, GERALD

A DETAILED ANALYSIS OF THE RETRIEVAL EVALUATION RESULTS OBTAINED WITH THE AUTOMATIC SMART DOCUMENT RETRIEVAL SYSTEM FOR DOCUMENT COLLECTIONS IN THE FIELDS OF AERODYNAMICS, COMPUTER SCIENCE, AND DOCUMENTATION IS GIVEN IN THIS REPORT. THE VARIOUS COMPONENTS OF FULLY AUTOMATIC DOCUMENT RETRIEVAL SYSTEMS ARE DISCUSSED IN DETAIL, INCLUDING THE FORMS OF…
An Intelligent System for Document Retrieval in Distributed Office Environments.

ERIC Educational Resources Information Center

Mukhopadhyay, Uttam; And Others

1986-01-01

MINDS (Multiple Intelligent Node Document Servers) is a distributed system of knowledge-based query engines for efficiently retrieving multimedia documents in an office environment of distributed workstations. By learning document distribution patterns and user interests and preferences during system usage, it customizes document retrievals for…
A Re-Unification of Two Competing Models for Document Retrieval.

ERIC Educational Resources Information Center

Bodoff, David

1999-01-01

Examines query-oriented versus document-oriented information retrieval and feedback learning. Highlights include a reunification of the two approaches for probabilistic document retrieval and for vector space model (VSM) retrieval; learning in VSM and in probabilistic models; multi-dimensional scaling; and ongoing field studies. (LRW)
Document image database indexing with pictorial dictionary

NASA Astrophysics Data System (ADS)

Akbari, Mohammad; Azimi, Reza

2010-02-01

In this paper we introduce a new approach for information retrieval from Persian document image database without using Optical Character Recognition (OCR).At first an attribute called subword upper contour label is defined then, a pictorial dictionary is constructed based on this attribute for the subwords. By this approach we address two issues in document image retrieval: keyword spotting and retrieval according to the document similarities. The proposed methods have been evaluated on a Persian document image database. The results have proved the ability of this approach in document image information retrieval.
MorphoSaurus--design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain.

PubMed

Markó, K; Schulz, S; Hahn, U

2005-01-01

We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain. Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents. We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline. Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contra-distinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.
Query Expansion for Noisy Legal Documents

DTIC Science & Technology

2008-11-01

9] G. Salton (ed). The SMART retrieval system experiments in automatic document processing. 1971. [10] H. Schutze and J . Pedersen. A cooccurrence...Language Modeling and Information Retrieval. http://www.lemurproject.org. [2] J . Baron, D. Lewis, and D. Oard. TREC 2006 legal track overview. In...Retrieval, 1993. [8] J . Rocchio. Relevance feedback in information retrieval. In The SMART retrieval system experiments in automatic document processing, 1971
Cognitive Process as a Basis for Intelligent Retrieval Systems Design.

ERIC Educational Resources Information Center

Chen, Hsinchun; Dhar, Vasant

1991-01-01

Two studies of the cognitive processes involved in online document-based information retrieval were conducted. These studies led to the development of five computational models of online document retrieval which were incorporated into the design of an "intelligent" document-based retrieval system. Both the system and the broader implications of…
Content Recognition and Context Modeling for Document Analysis and Retrieval

ERIC Educational Resources Information Center

Zhu, Guangyu

2009-01-01

The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval.…
Using Induction to Refine Information Retrieval Strategies

NASA Technical Reports Server (NTRS)

Baudin, Catherine; Pell, Barney; Kedar, Smadar

1994-01-01

Conceptual information retrieval systems use structured document indices, domain knowledge and a set of heuristic retrieval strategies to match user queries with a set of indices describing the document's content. Such retrieval strategies increase the set of relevant documents retrieved (increase recall), but at the expense of returning additional irrelevant documents (decrease precision). Usually in conceptual information retrieval systems this tradeoff is managed by hand and with difficulty. This paper discusses ways of managing this tradeoff by the application of standard induction algorithms to refine the retrieval strategies in an engineering design domain. We gathered examples of query/retrieval pairs during the system's operation using feedback from a user on the retrieved information. We then fed these examples to the induction algorithm and generated decision trees that refine the existing set of retrieval strategies. We found that (1) induction improved the precision on a set of queries generated by another user, without a significant loss in recall, and (2) in an interactive mode, the decision trees pointed out flaws in the retrieval and indexing knowledge and suggested ways to refine the retrieval strategies.
VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choo, Jaegul; Kim, Hannah; Clarkson, Edward

In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents,more » VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.« less
VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data

DOE PAGES

Choo, Jaegul; Kim, Hannah; Clarkson, Edward; ...

2018-01-31

In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents,more » VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.« less
An overview of selected information storage and retrieval issues in computerized document processing

NASA Technical Reports Server (NTRS)

Dominick, Wayne D. (Editor); Ihebuzor, Valentine U.

1984-01-01

The rapid development of computerized information storage and retrieval techniques has introduced the possibility of extending the word processing concept to document processing. A major advantage of computerized document processing is the relief of the tedious task of manual editing and composition usually encountered by traditional publishers through the immense speed and storage capacity of computers. Furthermore, computerized document processing provides an author with centralized control, the lack of which is a handicap of the traditional publishing operation. A survey of some computerized document processing techniques is presented with emphasis on related information storage and retrieval issues. String matching algorithms are considered central to document information storage and retrieval and are also discussed.
75 FR 72829 - Los Alamos Historical Document Retrieval and Assessment (LAHDRA) Project

Federal Register 2010, 2011, 2012, 2013, 2014

2010-11-26

... Historical Document Retrieval and Assessment (LAHDRA) Project The Centers for Disease Control and Prevention... release of the Final Report of the Los Alamos Historical Document Retrieval and Assessment (LAHDRA)Project... information about historical chemical or radionuclide releases from facilities at the Los Alamos National...
Predicting Document Retrieval System Performance: An Expected Precision Measure.

ERIC Educational Resources Information Center

Losee, Robert M., Jr.

1987-01-01

Describes an expected precision (EP) measure designed to predict document retrieval performance. Highlights include decision theoretic models; precision and recall as measures of system performance; EP graphs; relevance feedback; and computing the retrieval status value of a document for two models, the Binary Independent Model and the Two Poisson…
On-Line Retrieval System Design; Part V of Scientific Report No. ISR-18, Information Storage and Retrieval...

ERIC Educational Resources Information Center

Cornell Univ., Ithaca, NY. Dept. of Computer Science.

On-line retrieval system design is discussed in the two papers which make up Part Five of this report on Salton's Magical Automatic Retriever of Texts (SMART) project report. The first paper: "A Prototype On-Line Document Retrieval System" by D. Williamson and R. Williamson outlines a design for a SMART on-line document retrieval system…
Scalable ranked retrieval using document images

NASA Astrophysics Data System (ADS)

Jain, Rajiv; Oard, Douglas W.; Doermann, David

2013-12-01

Despite the explosion of text on the Internet, hard copy documents that have been scanned as images still play a significant role for some tasks. The best method to perform ranked retrieval on a large corpus of document images, however, remains an open research question. The most common approach has been to perform text retrieval using terms generated by optical character recognition. This paper, by contrast, examines whether a scalable segmentation-free image retrieval algorithm, which matches sub-images containing text or graphical objects, can provide additional benefit in satisfying a user's information needs on a large, real world dataset. Results on 7 million scanned pages from the CDIP v1.0 test collection show that content based image retrieval finds a substantial number of documents that text retrieval misses, and that when used as a basis for relevance feedback can yield improvements in retrieval effectiveness.

Autocorrelation and Regularization of Query-Based Information Retrieval Scores

DTIC Science & Technology

2008-02-01

of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation
Semi-Automated Methods for Refining a Domain-Specific Terminology Base

DTIC Science & Technology

2011-02-01

only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National
Computer-Assisted Search Of Large Textual Data Bases

NASA Technical Reports Server (NTRS)

Driscoll, James R.

1995-01-01

"QA" denotes high-speed computer system for searching diverse collections of documents including (but not limited to) technical reference manuals, legal documents, medical documents, news releases, and patents. Incorporates previously available and emerging information-retrieval technology to help user intelligently and rapidly locate information found in large textual data bases. Technology includes provision for inquiries in natural language; statistical ranking of retrieved information; artificial-intelligence implementation of semantics, in which "surface level" knowledge found in text used to improve ranking of retrieved information; and relevance feedback, in which user's judgements of relevance of some retrieved documents used automatically to modify search for further information.
Scaling Up High-Value Retrieval to Medium-Volume Data

NASA Astrophysics Data System (ADS)

Cunningham, Hamish; Hanbury, Allan; Rüger, Stefan

We summarise the scientific work presented at the first Information Retrieval Facility Conference [3] and argue that high-value retrieval with medium-volume data, exemplified by patent search, is a thriving topic in a multidisciplinary area that sits between Information Retrieval, Natural Language Processing and Semantic Web Technologies. We analyse the parameters that condition choices of retrieval technology for different sizes and values of document space, and we present the patent document space and some of its characteristics for retrieval work.
Current Research into Chemical and Textual Information Retrieval at the Department of Information Studies, University of Sheffield.

ERIC Educational Resources Information Center

Lynch, Michael F.; Willett, Peter

1987-01-01

Discusses research into chemical information and document retrieval systems at the University of Sheffield. Highlights include the use of cluster analysis methods for document retrieval and drug design, representation and searching of files of generic chemical structures, and the application of parallel computer hardware to information retrieval.…
Information retrieval for a document writing assistance program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corral, M.L.; Simon, A.; Julien, C.

This paper presents an Information Retrieval mechanism to facilitate the writing of technical documents in the space domain. To address the need for document exchange between partners in a given project, documents are standardized. The writing of a new document requires the re-use of existing documents or parts thereof. These parts can be identified by {open_quotes}tagging{close_quotes} the logical structure of documents and restored by means of a purpose-built Information Retrieval System (I.R.S.). The I.R.S. implemented in our writing assistance tool uses natural language queries and is based on a statistical linguistic approach which is enhanced by the use of documentmore » structure module.« less
Recent Experiments with INQUERY

DTIC Science & Technology

1995-11-01

were conducted with version of the INQUERY information retrieval system INQUERY is based on the Bayesian inference network retrieval model It is...corpus based query expansion For TREC a subset of of the adhoc document set was used to build the InFinder database None of the...experiments that showed signi cant improvements in retrieval eectiveness when document rankings based on the entire document text are combined with
Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation

ERIC Educational Resources Information Center

Rajagopal, Prabha; Ravana, Sri Devi

2017-01-01

Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
Embedding Term Similarity and Inverse Document Frequency into a Logical Model of Information Retrieval.

ERIC Educational Resources Information Center

Losada, David E.; Barreiro, Alvaro

2003-01-01

Proposes an approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. Highlights include document representation and matching; incorporating term similarity into the measure of distance; new algorithms for implementation; inverse document frequency; and logical versus classical models of…
Techniques of Document Management: A Review of Text Retrieval and Related Technologies.

ERIC Educational Resources Information Center

Veal, D. C.

2001-01-01

Reviews present and possible future developments in the techniques of electronic document management, the major ones being text retrieval and scanning and OCR (optical character recognition). Also addresses document acquisition, indexing and thesauri, publishing and dissemination standards, impact of the Internet, and the document management…
A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)

DTIC Science & Technology

2008-11-01

retrieve relevant documents. For the Opinion Retrieval subtask, we propose a hybrid model of lexicon-based approach and machine learning approach for...estimating and ranking the opinionated documents. For the Polarized Opinion Retrieval subtask, we employ machine learning for predicting the polarity...and linear combination technique for ranking polar documents. The hybrid model which utilize both lexicon-based approach and machine learning approach
Exploiting salient semantic analysis for information retrieval

NASA Astrophysics Data System (ADS)

Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui

2016-11-01

Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.
Dynamic "inline" images: context-sensitive retrieval and integration of images into Web documents.

PubMed

Kahn, Charles E

2008-09-01

Integrating relevant images into web-based information resources adds value for research and education. This work sought to evaluate the feasibility of using "Web 2.0" technologies to dynamically retrieve and integrate pertinent images into a radiology web site. An online radiology reference of 1,178 textual web documents was selected as the set of target documents. The ARRS GoldMiner image search engine, which incorporated 176,386 images from 228 peer-reviewed journals, retrieved images on demand and integrated them into the documents. At least one image was retrieved in real-time for display as an "inline" image gallery for 87% of the web documents. Each thumbnail image was linked to the full-size image at its original web site. Review of 20 randomly selected Collaborative Hypertext of Radiology documents found that 69 of 72 displayed images (96%) were relevant to the target document. Users could click on the "More" link to search the image collection more comprehensively and, from there, link to the full text of the article. A gallery of relevant radiology images can be inserted easily into web pages on any web server. Indexing by concepts and keywords allows context-aware image retrieval, and searching by document title and subject metadata yields excellent results. These techniques allow web developers to incorporate easily a context-sensitive image gallery into their documents.
Categorizing document by fuzzy C-Means and K-nearest neighbors approach

NASA Astrophysics Data System (ADS)

Priandini, Novita; Zaman, Badrus; Purwanti, Endah

2017-08-01

Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.
Topology of Document Retrieval Systems.

ERIC Educational Resources Information Center

Everett, Daniel M.; Cater, Steven C.

1992-01-01

Explains the use of a topological structure to examine the closeness between documents in retrieval systems and analyzes the topological structure of a vector-space model, a fuzzy-set model, an extended Boolean model, a probabilistic model, and a TIRS (Topological Information Retrieval System) model. Proofs for the results are appended. (17…
Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.

PubMed

Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin

2017-05-18

Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.
Strong Similarity Measures for Ordered Sets of Documents in Information Retrieval.

ERIC Educational Resources Information Center

Egghe, L.; Michel, Christine

2002-01-01

Presents a general method to construct ordered similarity measures in information retrieval based on classical similarity measures for ordinary sets. Describes a test of some of these measures in an information retrieval system that extracted ranked document sets and discuses the practical usability of the ordered similarity measures. (Author/LRW)
Performance Considerations for an Optical Jukebox in Document Archival/Retrieval Applications.

ERIC Educational Resources Information Center

Spenser, Peter

1991-01-01

Discusses the use of an optical jukebox in a retrieval-intensive application--i.e., for a law firm's litigation support--and examines factors affecting the performance of the jukebox. The imaging system's configuration is explained, document access from workstations is described, and expectations of retrieval times are discussed. (LRW)
Information Retrieval and Graph Analysis Approaches for Book Recommendation.

PubMed

Benkoussas, Chahinez; Bellot, Patrice

2015-01-01

A combination of multiple information retrieval approaches is proposed for the purpose of book recommendation. In this paper, book recommendation is based on complex user's query. We used different theoretical retrieval models: probabilistic as InL2 (Divergence from Randomness model) and language model and tested their interpolated combination. Graph analysis algorithms such as PageRank have been successful in Web environments. We consider the application of this algorithm in a new retrieval approach to related document network comprised of social links. We called Directed Graph of Documents (DGD) a network constructed with documents and social information provided from each one of them. Specifically, this work tackles the problem of book recommendation in the context of INEX (Initiative for the Evaluation of XML retrieval) Social Book Search track. A series of reranking experiments demonstrate that combining retrieval models yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments.
Information Retrieval and Graph Analysis Approaches for Book Recommendation

PubMed Central

Benkoussas, Chahinez; Bellot, Patrice

2015-01-01

A combination of multiple information retrieval approaches is proposed for the purpose of book recommendation. In this paper, book recommendation is based on complex user's query. We used different theoretical retrieval models: probabilistic as InL2 (Divergence from Randomness model) and language model and tested their interpolated combination. Graph analysis algorithms such as PageRank have been successful in Web environments. We consider the application of this algorithm in a new retrieval approach to related document network comprised of social links. We called Directed Graph of Documents (DGD) a network constructed with documents and social information provided from each one of them. Specifically, this work tackles the problem of book recommendation in the context of INEX (Initiative for the Evaluation of XML retrieval) Social Book Search track. A series of reranking experiments demonstrate that combining retrieval models yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments. PMID:26504899

Automated search and retrieval of information from imaged documents using optical correlation techniques

NASA Astrophysics Data System (ADS)

Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.

1999-10-01

Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provides the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited; e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.
Robust keyword retrieval method for OCRed text

NASA Astrophysics Data System (ADS)

Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu

2011-01-01

Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
Information Storage and Retrieval. Reports on Analysis, Search, and Iterative Retrieval.

ERIC Educational Resources Information Center

Salton, Gerard

As the fourteenth report in a series describing research in automatic information storage and retrieval, this document covers work carried out on the SMART project for approximately one year (summer 1967 to summer 1968). The document is divided into four main parts: (1) SMART systems design, (2) analysis and search experiments, (3) user feedback…
Computer program and user documentation medical data tape retrieval system

NASA Technical Reports Server (NTRS)

Anderson, J.

1971-01-01

This volume provides several levels of documentation for the program module of the NASA medical directorate mini-computer storage and retrieval system. A biomedical information system overview describes some of the reasons for the development of the mini-computer storage and retrieval system. It briefly outlines all of the program modules which constitute the system.
Document retrieval on repetitive string collections.

PubMed

Gagie, Travis; Hartikainen, Aleksi; Karhu, Kalle; Kärkkäinen, Juha; Navarro, Gonzalo; Puglisi, Simon J; Sirén, Jouni

2017-01-01

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists , that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top- k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.
Signature detection and matching for document image retrieval.

PubMed

Zhu, Guangyu; Zheng, Yefeng; Doermann, David; Jaeger, Stefan

2009-11-01

As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.
Mining knowledge from corpora: an application to retrieval and indexing.

PubMed

Soualmia, Lina F; Dahamna, Badisse; Darmoni, Stéfan

2008-01-01

The present work aims at discovering new associations between medical concepts to be exploited as input in retrieval and indexing. Association rules method is applied to documents. The process is carried out on three major document categories referring to e-health information consumers: health professionals, students and lay people. Association rules evaluation is founded on statistical measures combined with domain knowledge. Association rules represent existing relations between medical concepts (60.62%) and new knowledge (54.21%). Based on observations, 463 expert rules are defined by medical librarians for retrieval and indexing. Association rules bear out existing relations, produce new knowledge and support users and indexers in document retrieval and indexing.
Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.

ERIC Educational Resources Information Center

Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald

2001-01-01

Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…
75 FR 1793 - Study Team for the Los Alamos Historical Document Retrieval and Assessment (LAHDRA) Project

Federal Register 2010, 2011, 2012, 2013, 2014

2010-01-13

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention Study Team for the Los Alamos Historical Document Retrieval and Assessment (LAHDRA) Project The Centers for Disease... the following meeting. Name: Public Meeting of the Study Team for the Los Alamos Historical Document...
A semantic medical multimedia retrieval approach using ontology information hiding.

PubMed

Guo, Kehua; Zhang, Shigeng

2013-01-01

Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches.
AP-102/104 Retrieval control system qualification test procedure

DOE Office of Scientific and Technical Information (OSTI.GOV)

RIECK, C.A.

1999-05-18

This Qualification Test Procedure documents the results of the qualification testing that was performed on the Project W-211, ''Initial Tank Retrieval Systems,'' retrieval control system (RCS) for tanks 241-AP-102 and 241-AP-104. The results confirm that the RCS has been programmed correctly and that the two related hardware enclosures have been assembled in accordance with the design documents.
Content-based retrieval of historical Ottoman documents stored as textual images.

PubMed

Saykol, Ediz; Sinop, Ali Kemal; Güdükbay, Ugur; Ulusoy, Ozgür; Cetin, A Enis

2004-03-01

There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
Topological Aspects of Information Retrieval.

ERIC Educational Resources Information Center

Egghe, Leo; Rousseau, Ronald

1998-01-01

Discusses topological aspects of theoretical information retrieval, including retrieval topology; similarity topology; pseudo-metric topology; document spaces as topological spaces; Boolean information retrieval as a subsystem of any topological system; and proofs of theorems. (LRW)
Support Vector Machines: Relevance Feedback and Information Retrieval.

ERIC Educational Resources Information Center

Drucker, Harris; Shahrary, Behzad; Gibbon, David C.

2002-01-01

Compares support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred. Includes nine tables. (Contains 24…
A Semantic Medical Multimedia Retrieval Approach Using Ontology Information Hiding

PubMed Central

Guo, Kehua; Zhang, Shigeng

2013-01-01

Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches. PMID:24082915
A tutorial on information retrieval: basic terms and concepts

PubMed Central

Zhou, Wei; Smalheiser, Neil R; Yu, Clement

2006-01-01

This informal tutorial is intended for investigators and students who would like to understand the workings of information retrieval systems, including the most frequently used search engines: PubMed and Google. Having a basic knowledge of the terms and concepts of information retrieval should improve the efficiency and productivity of searches. As well, this knowledge is needed in order to follow current research efforts in biomedical information retrieval and text mining that are developing new systems not only for finding documents on a given topic, but extracting and integrating knowledge across documents. PMID:16722601
NoSQL: collection document and cloud by using a dynamic web query form

NASA Astrophysics Data System (ADS)

Abdalla, Hemn B.; Lin, Jinzhao; Li, Guoquan

2015-07-01

Mongo-DB (from "humongous") is an open-source document database and the leading NoSQL database. A NoSQL (Not Only SQL, next generation databases, being non-relational, deal, open-source and horizontally scalable) presenting a mechanism for storage and retrieval of documents. Previously, we stored and retrieved the data using the SQL queries. Here, we use the MonogoDB that means we are not utilizing the MySQL and SQL queries. Directly importing the documents into our Drives, retrieving the documents on that drive by not applying the SQL queries, using the IO BufferReader and Writer, BufferReader for importing our type of document files to my folder (Drive). For retrieving the document files, the usage is BufferWriter from the particular folder (or) Drive. In this sense, providing the security for those storing files for what purpose means if we store the documents in our local folder means all or views that file and modified that file. So preventing that file, we are furnishing the security. The original document files will be changed to another format like in this paper; Binary format is used. Our documents will be converting to the binary format after that direct storing in one of our folder, that time the storage space will provide the private key for accessing that file. Wherever any user tries to discover the Document files means that file data are in the binary format, the document's file owner simply views that original format using that personal key from receive the secret key from the cloud.
Combining approaches to on-line handwriting information retrieval

NASA Astrophysics Data System (ADS)

Peña Saldarriaga, Sebastián; Viard-Gaudin, Christian; Morin, Emmanuel

2010-01-01

In this work, we propose to combine two quite different approaches for retrieving handwritten documents. Our hypothesis is that different retrieval algorithms should retrieve different sets of documents for the same query. Therefore, significant improvements in retrieval performances can be expected. The first approach is based on information retrieval techniques carried out on the noisy texts obtained through handwriting recognition, while the second approach is recognition-free using a word spotting algorithm. Results shows that for texts having a word error rate (WER) lower than 23%, the performances obtained with the combined system are close to the performances obtained on clean digital texts. In addition, for poorly recognized texts (WER > 52%), an improvement of nearly 17% can be observed with respect to the best available baseline method.
The "Generality" Effect and the Retrieval Evaluation for Large Collections

ERIC Educational Resources Information Center

Salton, Gerard

1972-01-01

The role of the generality effect in retrieval system evaluation is assessed, and evaluation results are given for the comparison of several document collections of distinct size and generality in the areas of documentation and aerodynamics. (14 references) (Author)
Web Mining for Web Image Retrieval.

ERIC Educational Resources Information Center

Chen, Zheng; Wenyin, Liu; Zhang, Feng; Li, Mingjing; Zhang, Hongjiang

2001-01-01

Presents a prototype system for image retrieval from the Internet using Web mining. Discusses the architecture of the Web image retrieval prototype; document space modeling; user log mining; and image retrieval experiments to evaluate the proposed system. (AEF)

Substance use disorders in Arab countries: research activity and bibliometric analysis

PubMed Central

2014-01-01

Background Substance use disorders, which include substance abuse and substance dependence, are present in all regions of the world including Middle Eastern Arab countries. Bibliometric analysis is an increasingly used tool for research assessment. The main objective of this study was to assess research productivity in the field of substance use disorders in Arab countries using bibliometric indicators. Methodology Original or review research articles authored or co-authored by investigators from Arab countries about substance use disorders during the period 1900 – 2013 were retrieved using the ISI Web of Science database. Research activity was assessed by analyzing the annual research productivity, contribution of each Arab country, names of journals, citations, and types of abused substances. Results Four hundred and thirteen documents in substance use disorders were retrieved. Annual research productivity was low but showed a significant increase in the last few years. In terms of quantity, Kingdom of Saudi Arabia (83 documents) ranked first in research about substance use disorders while Lebanon (17.4 documents per million) ranked first in terms of number of documents published per million inhabitants. Retrieved documents were found in different journal titles and categories, mostly in Drug and Alcohol Dependence Journal. Authors from USA appeared in 117 documents published by investigators from Arab countries. Citation analysis of retrieved documents showed that the average citation per document was 10.76 and the h - index was 35. The majority of retrieved documents were about tobacco and smoking (175 documents) field while alcohol consumption and abuse research was the least with 69 documents. Conclusion The results obtained suggest that research in this field was largely neglected in the past. However, recent research interest was observed. Research output on tobacco and smoking was relatively high compared to other substances of abuse like illicit drugs and medicinal agents. Governmental funding for academics and mental health graduate programs to do research in the field of substance use disorders is highly recommended. PMID:25148888
Development of the Defense Documentation Center Remote On-Line Retrieval System - Past, Present and Future.

ERIC Educational Resources Information Center

Bennertz, Richard K.

The document highlights in nontechnical language the development of the Defense Documentation Center (DDC) Remote On-Line Retrieval System from its inception in 1967 to what is planned. It describes in detail the current operating system, equipment configuration and associated costs, user training and system evaluation and may be of value to other…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Guyer, H.B.; McChesney, C.A.

The overall primary Objective of HDAR is to create a repository of historical personnel security documents and provide the functionality needed for archival and retrieval use by other software modules and application users of the DISS/ET system. The software product to be produced from this specification is the Historical Document Archival and Retrieval Subsystem The product will provide the functionality to capture, retrieve and manage documents currently contained in the personnel security folders in DOE Operations Offices vaults at various locations across the United States. The long-term plan for DISS/ET includes the requirement to allow for capture and storage ofmore » arbitrary, currently undefined, clearance-related documents that fall outside the scope of the ``cradle-to-grave`` electronic processing provided by DISS/ET. However, this requirement is not within the scope of the requirements specified in this document.« less
Information Retrieval in Biomedical Research: From Articles to Datasets

ERIC Educational Resources Information Center

Wei, Wei

2017-01-01

Information retrieval techniques have been applied to biomedical research for a variety of purposes, such as textual document retrieval and molecular data retrieval. As biomedical research evolves over time, information retrieval is also constantly facing new challenges, including the growing number of available data, the emerging new data types,…
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

DTIC Science & Technology

2006-01-01

The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina Demner-Fushman Department of Computer Science... dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that...in which the documents are written. In dictionary -based CLIR techniques, the princi- pal source of translation knowledge is a translation lexicon
Development of a full-text information retrieval system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keizo Oyama; AKira Miyazawa, Atsuhiro Takasu; Kouji Shibano

The authors have executed a project to realize a full-text information retrieval system. The system is designed to deal with a document database comprising full text of a large number of documents such as academic papers. The document structures are utilized in searching and extracting appropriate information. The concept of structure handling and the configuration of the system are described in this paper.
Third Annual Symposium on Document Analysis and Information Retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

This document presents papers of the Third Annual Symposium on Document Analysis and Information Retrieval at the Information Science Research-l Institute at the University of Nevada, Las Vegas (UNLV/ISRI). Of the 60 papers submitted, 25 were accepted for oral presentation and 9 as poster papers. Both oral presentations and poster papers are included in these Proceedings. The individual papers have been cataloged separately.
Challenges and methodology for indexing the computerized patient record.

PubMed

Ehrler, Frédéric; Ruch, Patrick; Geissbuhler, Antoine; Lovis, Christian

2007-01-01

Patient records contain most crucial documents for managing the treatments and healthcare of patients in the hospital. Retrieving information from these records in an easy, quick and safe way helps care providers to save time and find important facts about their patient's health. This paper presents the scalability issues induced by the indexing and the retrieval of the information contained in the patient records. For this study, EasyIR, an information retrieval tool performing full text queries and retrieving the related documents has been used. An evaluation of the performance reveals that the indexing process suffers from overhead consequence of the particular structure of the patient records. Most IR tools are designed to manage very large numbers of documents in a single index whereas in our hypothesis, one index per record, which usually implies few documents, has been imposed. As the number of modifications and creations of patient records are significant in a day, using a specialized and efficient indexation tool is required.
A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering.

PubMed

Sarrouti, Mourad; Ouatik El Alaoui, Said

2017-04-01

Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering (QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts. In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones. Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current state-of-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP). We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision. Copyright © 2017 Elsevier Inc. All rights reserved.
An Approach to a Digital Library of Newspapers.

ERIC Educational Resources Information Center

Arambura Cabo, Maria Jose; Berlanga Llavori, Rafael

1997-01-01

Presents a new application for retrieving news from a large electronic bank of newspapers that is intended to manage past issues of newspapers. Highlights include a data model for newspapers, including metadata and metaclasses; document definition language; document retrieval language; and memory organization and indexes. (Author/LRW)
The JPL Library information retrieval system

NASA Technical Reports Server (NTRS)

Walsh, J.

1975-01-01

The development, capabilities, and products of the computer-based retrieval system of the Jet Propulsion Laboratory Library are described. The system handles books and documents, produces a book catalog, and provides a machine search capability. Programs and documentation are available to the public through NASA's computer software dissemination program.
Querying and Ranking XML Documents.

ERIC Educational Resources Information Center

Schlieder, Torsten; Meuss, Holger

2002-01-01

Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
Document Indexing for Image-Based Optical Information Systems.

ERIC Educational Resources Information Center

Thiel, Thomas J.; And Others

1991-01-01

Discussion of image-based information retrieval systems focuses on indexing. Highlights include computerized information retrieval; multimedia optical systems; optical mass storage and personal computers; and a case study that describes an optical disk system which was developed to preserve, access, and disseminate military documents. (19…
Documents Similarity Measurement Using Field Association Terms.

ERIC Educational Resources Information Center

Atlam, El-Sayed; Fuketa, M.; Morita, K.; Aoe, Jun-ichi

2003-01-01

Discussion of text analysis and information retrieval and measurement of document similarity focuses on a new text manipulation system called FA (field association)-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. Discusses recall and precision, automatic indexing…
Automation of the CAS Document Delivery Service.

ERIC Educational Resources Information Center

Steensland, M. C.; Soukup, K. M.

1986-01-01

The automation of online order retrieval for Chemical Abstracts Service Document Delivery Service was accomplished by shifting to an order retrieval/dispatch process linked to a Unix network. The Unix-based environment, its terminal emulation, page-break, and user-friendly interface software, and later enhancements are reviewed. Resultant increase…
A Vector Space Model for Automatic Indexing.

ERIC Educational Resources Information Center

Salton, G.; And Others

In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other, or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; that is, retrieval performance correlates inversely…
Document Storage and Retrieval in the Electronic Office.

ERIC Educational Resources Information Center

Ashford, John

1985-01-01

Proposals are made for practical approaches to the design of electronic office systems to provide for the effective storage and retrieval of the documents that they generate. Problems of records management and requirements to be met by the designer of an electronic office system are highlighted. Nineteen references are cited. (EJS)
Information Retrieval Using UMLS-based Structured Queries

PubMed Central

Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

2001-01-01

During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
ASSOCIATIVE ADJUSTMENTS TO REDUCE ERRORS IN DOCUMENT SEARCHING.

ERIC Educational Resources Information Center

BRYANT, EDWARD C.; AND OTHERS

ASSOCIATIVE ADJUSTMENTS TO A DOCUMENT FILE ARE CONSIDERED AS A MEANS FOR IMPROVING RETRIEVAL. A THEORETICAL INVESTIGATION OF THE STATISTICAL PROPERTIES OF A GENERALIZED MISMATCH MEASURE WAS CARRIED OUT AND IMPROVEMENTS IN RETRIEVAL RESULTING FROM PERFORMING ASSOCIATIVE REGRESSION ADJUSTMENTS ON DATA FILE WERE EXAMINED BOTH FROM THE THEORETICAL AND…
Case retrieval in medical databases by fusing heterogeneous information.

PubMed

Quellec, Gwénolé; Lamard, Mathieu; Cazuguel, Guy; Roux, Christian; Cochener, Béatrice

2011-01-01

A novel content-based heterogeneous information retrieval framework, particularly well suited to browse medical databases and support new generation computer aided diagnosis (CADx) systems, is presented in this paper. It was designed to retrieve possibly incomplete documents, consisting of several images and semantic information, from a database; more complex data types such as videos can also be included in the framework. The proposed retrieval method relies on image processing, in order to characterize each individual image in a document by their digital content, and information fusion. Once the available images in a query document are characterized, a degree of match, between the query document and each reference document stored in the database, is defined for each attribute (an image feature or a metadata). A Bayesian network is used to recover missing information if need be. Finally, two novel information fusion methods are proposed to combine these degrees of match, in order to rank the reference documents by decreasing relevance for the query. In the first method, the degrees of match are fused by the Bayesian network itself. In the second method, they are fused by the Dezert-Smarandache theory: the second approach lets us model our confidence in each source of information (i.e., each attribute) and take it into account in the fusion process for a better retrieval performance. The proposed methods were applied to two heterogeneous medical databases, a diabetic retinopathy database and a mammography screening database, for computer aided diagnosis. Precisions at five of 0.809 ± 0.158 and 0.821 ± 0.177, respectively, were obtained for these two databases, which is very promising.

Sampling criteria in multicollection searching.

NASA Astrophysics Data System (ADS)

Gilio, A.; Scozzafava, R.; Marchetti, P. G.

In the first stage of the document retrieval process, no information concerning relevance of a particular document is available. On the other hand, computer implementation requires that the analysis be made only for a sample of retrieved documents. This paper addresses the significance and suitability of two different sampling criteria for a multicollection online search facility. The inevitability of resorting to a logarithmic criterion in order to achieve a "spread of representativeness" from the multicollection is demonstrated.
36 CFR 1238.12 - What documentation is required for microfilmed records?

Code of Federal Regulations, 2011 CFR

2011-07-01

... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR § 1238.12 - What documentation is required for microfilmed records?

Code of Federal Regulations, 2013 CFR

2013-07-01

... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR 1238.12 - What documentation is required for microfilmed records?

Code of Federal Regulations, 2014 CFR

2014-07-01

... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR 1238.12 - What documentation is required for microfilmed records?

Code of Federal Regulations, 2012 CFR

2012-07-01

... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR 1238.12 - What documentation is required for microfilmed records?

Code of Federal Regulations, 2010 CFR

2010-07-01

... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
Information Storage and Retrieval, Scientific Report No. ISR-15.

ERIC Educational Resources Information Center

Salton, Gerard

Several algorithms were investigated which would allow a user to interact with an automatic document retrieval system by requesting relevance judgments on selected sets of documents. Two viewpoints were taken in evaluation. One measured the movement of queries toward the optimum query as defined by Rocchio; the other measured the retrieval…
The Future of Amphibious Operations: Shaping the Expeditionary Strike Group to Fight in the Joint Task Force

DTIC Science & Technology

2010-02-01

1 Charles E. Wilhelm, Expeditionary Warfare.marine corps gazette, 79(6), 28-30. Retrieved October 15, 2009, from Career and Technical Education . (Document...Expeditionary warfare.marine corps gazette, 79(6), 28- 30. Retrieved October 15, 2009, from Career and Technical Education . (Document ID: 4455650
On the Delusiveness of Adopting a Common Space for Modeling IR Objects: Are Queries Documents?

ERIC Educational Resources Information Center

Bollmann-Sdorra, Peter; Raghavan, Vjay V.

1993-01-01

Proposes that document space and query space have different structures in information retrieval and discusses similarity measures, term independence, and linear structure. Examples are given using the retrieval functions of dot-product, the cosine measure, the coefficient of Jaccard, and the overlap function. (Contains 28 references.) (LRW)
An Optical Disk-Based Information Retrieval System.

ERIC Educational Resources Information Center

Bender, Avi

1988-01-01

Discusses a pilot project by the Nuclear Regulatory Commission to apply optical disk technology to the storage and retrieval of documents related to its high level waste management program. Components and features of the microcomputer-based system which provides full-text and image access to documents are described. A sample search is included.…
An automatic indexing method for medical documents.

PubMed Central

Wagner, M. M.

1991-01-01

This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported. PMID:1807564
iSMART: Ontology-based Semantic Query of CDA Documents

PubMed Central

Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue

2009-01-01

The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883
Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

DTIC Science & Technology

2014-11-01

the same score, another singal will be used to rank these documents to break the ties , but the relative orders of other documents against these...documents remain the same. The tie- breaking step above is repeatedly applied to further break ties until all candidate signals are applied and the ranking...searched it on the Yahoo! search engine, which returned some query sug- gestions for the query. The original queries as well as their query suggestions
Improving biomedical information retrieval by linear combinations of different query expansion techniques.

PubMed

Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

2016-07-25

Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
The Effect of Indexing Exhaustivity on Retrieval Performance.

ERIC Educational Resources Information Center

Burgin, Robert

1991-01-01

Describes results of a study that investigated the effect of variations in indexing exhaustivity on retrieval performance in a vector space retrieval system. The test collection of documents in the National Library of Medicine's Medline file indexed under cystic fibrosis is described, and use of the SMART information retrieval system is discussed.…
Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines.

ERIC Educational Resources Information Center

Pathak, Praveen; Gordon, Michael

1999-01-01

Describes a study that examined the effectiveness of eight search engines for the World Wide Web. Calculated traditional information-retrieval measures of recall and precision at varying numbers of retrieved documents to use as the bases for statistical comparisons of retrieval effectiveness. Also examined the overlap between search engines.…
STATISTICAL DATA ON CHEMICAL COMPOUNDS.

DTIC Science & Technology

DATA STORAGE SYSTEMS, FEASIBILITY STUDIES, COMPUTERS, STATISTICAL DATA , DOCUMENTS, ARMY...CHEMICAL COMPOUNDS, INFORMATION RETRIEVAL), (*INFORMATION RETRIEVAL, CHEMICAL COMPOUNDS), MOLECULAR STRUCTURE, BIBLIOGRAPHIES, DATA PROCESSING
The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data.

ERIC Educational Resources Information Center

Popovic, Mirko; Willett, Peter

1992-01-01

Reports on the use of stemming for Slovene language documents and queries in free-text retrieval systems and demonstrates that an appropriate stemming algorithm results in an increase in retrieval effectiveness when compared with nonstemming processing. A comparison is made with stemming of English versions of the same documents and queries. (24…
The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.

ERIC Educational Resources Information Center

Peat, Helen J.; Willett, Peter

1991-01-01

Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…
Imaged Document Optical Correlation and Conversion System (IDOCCS)

NASA Astrophysics Data System (ADS)

Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.

1999-03-01

Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). In addition, many organizations are converting their paper archives to electronic images, which are stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources. The Imaged Document Optical Correlation and Conversion System (IDOCCS) provides a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provides the search and retrieval capability of document images. The IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and can even determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo, or documents with a particular individual's signature block, can be singled out. With this dual capability, IDOCCS outperforms systems that rely on optical character recognition as a basis for indexing and storing only the textual content of documents for later retrieval.

A comparison of Boolean-based retrieval to the WAIS system for retrieval of aeronautical information

NASA Technical Reports Server (NTRS)

Marchionini, Gary; Barlow, Diane

1994-01-01

An evaluation of an information retrieval system using a Boolean-based retrieval engine and inverted file architecture and WAIS, which uses a vector-based engine, was conducted. Four research questions in aeronautical engineering were used to retrieve sets of citations from the NASA Aerospace Database which was mounted on a WAIS server and available through Dialog File 108 which served as the Boolean-based system (BBS). High recall and high precision searches were done in the BBS and terse and verbose queries were used in the WAIS condition. Precision values for the WAIS searches were consistently above the precision values for high recall BBS searches and consistently below the precision values for high precision BBS searches. Terse WAIS queries gave somewhat better precision performance than verbose WAIS queries. In every case, a small number of relevant documents retrieved by one system were not retrieved by the other, indicating the incomplete nature of the results from either retrieval system. Relevant documents in the WAIS searches were found to be randomly distributed in the retrieved sets rather than distributed by ranks. Advantages and limitations of both types of systems are discussed.
Improve Biomedical Information Retrieval using Modified Learning to Rank Methods.

PubMed

Xu, Bo; Lin, Hongfei; Lin, Yuan; Ma, Yunlong; Yang, Liang; Wang, Jian; Yang, Zhihao

2016-06-14

In these years, the number of biomedical articles has increased exponentially, which becomes a problem for biologists to capture all the needed information manually. Information retrieval technologies, as the core of search engines, can deal with the problem automatically, providing users with the needed information. However, it is a great challenge to apply these technologies directly for biomedical retrieval, because of the abundance of domain specific terminologies. To enhance biomedical retrieval, we propose a novel framework based on learning to rank. Learning to rank is a series of state-of-the-art information retrieval techniques, and has been proved effective in many information retrieval tasks. In the proposed framework, we attempt to tackle the problem of the abundance of terminologies by constructing ranking models, which focus on not only retrieving the most relevant documents, but also diversifying the searching results to increase the completeness of the resulting list for a given query. In the model training, we propose two novel document labeling strategies, and combine several traditional retrieval models as learning features. Besides, we also investigate the usefulness of different learning to rank approaches in our framework. Experimental results on TREC Genomics datasets demonstrate the effectiveness of our framework for biomedical information retrieval.
Natural language information retrieval in digital libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.

In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Crain, Steven P.; Yang, Shuang-Hong; Zha, Hongyuan

Access to health information by consumers is ham- pered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, com- mon and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations us- ing consumermore » questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.« less
Indexing and Retrieval for the Web.

ERIC Educational Resources Information Center

Rasmussen, Edie M.

2003-01-01

Explores current research on indexing and ranking as retrieval functions of search engines on the Web. Highlights include measuring search engine stability; evaluation of Web indexing and retrieval; Web crawlers; hyperlinks for indexing and ranking; ranking for metasearch; document structure; citation indexing; relevance; query evaluation;…
Theoretical and Philosophical Aspects of Knowledge Management (SIG KM).

ERIC Educational Resources Information Center

Day, Ronald E.

2000-01-01

This session abstract discusses the history, philosophy, and theories of knowledge management to better understand its social and organizational potentials and limitations. Topics include determinacy of sense, information retrieval, and the Data Retrieval Model versus the Document Retrieval Model; discussions about knowledge; and surplus…
Advanced Feedback Methods in Information Retrieval.

ERIC Educational Resources Information Center

Salton, G.; And Others

1985-01-01

In this study, automatic feedback techniques are applied to Boolean query statements in online information retrieval to generate improved query statements based on information contained in previously retrieved documents. Feedback operations are carried out using conventional Boolean logic and extended logic. Experimental output is included to…
Initial retrieval sequence and blending strategy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pemwell, D.L.; Grenard, C.E.

1996-09-01

This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.
Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

DOEpatents

Chew, Peter A; Bader, Brett W

2012-10-16

A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
Words, concepts, or both: optimal indexing units for automated information retrieval.

PubMed Central

Hersh, W. R.; Hickam, D. H.; Leone, T. J.

1992-01-01

What is the best way to represent the content of documents in an information retrieval system? This study compares the retrieval effectiveness of five different methods for automated (machine-assigned) indexing using three test collections. The consistently best methods are those that use indexing based on the words that occur in the available text of each document. Methods used to map text into concepts from a controlled vocabulary showed no advantage over the word-based methods. This study also looked at an approach to relevance feedback which showed benefit for both word-based and concept-based methods. PMID:1482951
Clustering document fragments using background color and texture information

NASA Astrophysics Data System (ADS)

Chanda, Sukalpa; Franke, Katrin; Pal, Umapada

2012-01-01

Forensic analysis of questioned documents sometimes can be extensively data intensive. A forensic expert might need to analyze a heap of document fragments and in such cases to ensure reliability he/she should focus only on relevant evidences hidden in those document fragments. Relevant document retrieval needs finding of similar document fragments. One notion of obtaining such similar documents could be by using document fragment's physical characteristics like color, texture, etc. In this article we propose an automatic scheme to retrieve similar document fragments based on visual appearance of document paper and texture. Multispectral color characteristics using biologically inspired color differentiation techniques are implemented here. This is done by projecting document color characteristics to Lab color space. Gabor filter-based texture analysis is used to identify document texture. It is desired that document fragments from same source will have similar color and texture. For clustering similar document fragments of our test dataset we use a Self Organizing Map (SOM) of dimension 5×5, where the document color and texture information are used as features. We obtained an encouraging accuracy of 97.17% from 1063 test images.
Learned Vector-Space Models for Document Retrieval.

ERIC Educational Resources Information Center

Caid, William R.; And Others

1995-01-01

The Latent Semantic Indexing and MatchPlus systems examine similar contexts in which words appear and create representational models that capture the similarity of meaning of terms and then use the representation for retrieval. Text Retrieval Conference experiments using these systems demonstrate the computational feasibility of using…
QCS : a system for querying, clustering, and summarizing documents.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dunlavy, Daniel M.

2006-08-01

Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence ''trimming'', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less
QCS: a system for querying, clustering and summarizing documents.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dunlavy, Daniel M.; Schlesinger, Judith D.; O'Leary, Dianne P.

2006-10-01

Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence 'trimming', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less
Seminal nanotechnology literature: a review.

PubMed

Kostoff, Ronald N; Koytcheff, Raymond G; Lau, Clifford G Y

2009-11-01

This paper uses complementary text mining techniques to identify and retrieve the high impact (seminal) nanotechnology literature over a span of time. Following a brief scientometric analysis of the seminal articles retrieved, these seminal articles are then used as a basis for a comprehensive literature survey of nanoscience and nanotechnology. The paper ends with a global analysis of the relation of seminal nanotechnology document production to total nanotechnology document production.
Automated storage and retrieval of data obtained in the Interkosmos project

NASA Technical Reports Server (NTRS)

Ziolkovski, K.; Pakholski, V.

1975-01-01

The formation of a data bank and information retrieval system for scientific data is described. The stored data can be digital or documentation data. Data classification methods are discussed along with definition and compilation of the dictionary utilized, definition of the indexing scheme, and definition of the principles used in constructing a file for documents, data blocks, and tapes. Operating principles are also presented.
Cross-language information retrieval using PARAFAC2.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bader, Brett William; Chew, Peter; Abdelali, Ahmed

A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language-independently', so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar tomore » other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the 'concepts' in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in cross-language information retrieval.« less
Font adaptive word indexing of modern printed documents.

PubMed

Marinai, Simone; Marino, Emanuele; Soda, Giovanni

2006-08-01

We propose an approach for the word-level indexing of modern printed documents which are difficult to recognize using current OCR engines. By means of word-level indexing, it is possible to retrieve the position of words in a document, enabling queries involving proximity of terms. Web search engines implement this kind of indexing, allowing users to retrieve Web pages on the basis of their textual content. Nowadays, digital libraries hold collections of digitized documents that can be retrieved either by browsing the document images or relying on appropriate metadata assembled by domain experts. Word indexing tools would therefore increase the access to these collections. The proposed system is designed to index homogeneous document collections by automatically adapting to different languages and font styles without relying on OCR engines for character recognition. The approach is based on three main ideas: the use of Self Organizing Maps (SOM) to perform unsupervised character clustering, the definition of one suitable vector-based word representation whose size depends on the word aspect-ratio, and the run-time alignment of the query word with indexed words to deal with broken and touching characters. The most appropriate applications are for processing modern printed documents (17th to 19th centuries) where current OCR engines are less accurate. Our experimental analysis addresses six data sets containing documents ranging from books of the 17th century to contemporary journals.
The Effects of Noisy Data on Text Retrieval.

ERIC Educational Resources Information Center

Taghva, Kazem; And Others

1994-01-01

Discusses the use of optical character recognition (OCR) for inputting documents in an information retrieval system and describes a study that used an OCR-generated database and its corresponding corrected version to examine query evaluation in the presence of noisy data. Scanning technology, recognition technology, and retrieval technology are…
Experiments in Multi-Lingual Information Retrieval.

ERIC Educational Resources Information Center

Salton, Gerard

A comparison was made of the performance in an automatic information retrieval environment of user queries and document abstracts available in natural language form in both English and French. The results obtained indicate that the automatic indexing and retrieval techniques actually used appear equally effective in handling the query and document…

40 CFR 792.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2012 CFR

2012-07-01

....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 33 2012-07-01 2012-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
40 CFR 792.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2011 CFR

2011-07-01

....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 32 2011-07-01 2011-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
40 CFR 792.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2013 CFR

2013-07-01

....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 33 2013-07-01 2013-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
40 CFR 792.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2014 CFR

2014-07-01

....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 32 2014-07-01 2014-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
GUI-Based Document Access via SATCOMMS: Online Electronic Document Retrieval at the European Telecommunications Satellite Organization EUTELSAT.

ERIC Educational Resources Information Center

Burton, Adrian P.

1995-01-01

Discusses accessing online electronic documents at the European Telecommunications Satellite Organization (EUTELSAT). Highlights include off-site paper document storage, the document management system, benefits, the EUTELSAT Standard IBM Access software, implementation, the development process, and future enhancements. (AEF)
Document Ranking Based upon Markov Chains.

ERIC Educational Resources Information Center

Danilowicz, Czeslaw; Balinski, Jaroslaw

2001-01-01

Considers how the order of documents in information retrieval responses are determined and introduces a method that uses a probabilistic model of a document set where documents are regarded as states of a Markov chain and where transition probabilities are directly proportional to similarities between documents. (Author/LRW)
A model for enhancing Internet medical document retrieval with "medical core metadata".

PubMed

Malet, G; Munoz, F; Appleyard, R; Hersh, W

1999-01-01

Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and MEDLINE-type content descriptions. The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines.
A Model for Enhancing Internet Medical Document Retrieval with “Medical Core Metadata”

PubMed Central

Malet, Gary; Munoz, Felix; Appleyard, Richard; Hersh, William

1999-01-01

Objective: Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. Design: The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and Medline-type content descriptions. Results: The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. Conclusions: The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines. PMID:10094069
Using Crowdsourced Geospatial Data to Aid in Nuclear Proliferation Monitoring

DTIC Science & Technology

2016-12-01

M. Stephens, and Ronald D. Bonnell, “DAI for Document Retrieval: The MINDS Project,” in Distributed Artificial Intelligence , ed. Michael N. Huhns...Ronald D. Bonnell. “DAI for Document Retrieval: The MINDS Project,” In Distributed Artificial Intelligence , edited by Michael N. Huhns, 249–283...was for the director of National Intelligence to explore ways that crowdsourced geospatial imagery technologies could aid existing governmental
Intelligent search and retrieval of a large multimedia knowledgebase for the Hubble Space Telescope

NASA Technical Reports Server (NTRS)

Clapis, Paul J.; Byers, William S.

1990-01-01

A document-retrieval assistant (DRA) in a microcomputer format is described which incorporates hypertext and natural language capabilities. Hypertext is used to introduce an intelligent search capability, and the natural-language interface permits access to specific data without the use of keywords. The DRA can be used to access and 'browse' the large multimedia database that is composed of project documentation from the HST.
The Electronic Documentation Project in the NASA mission control center environment

NASA Technical Reports Server (NTRS)

Wang, Lui; Leigh, Albert

1994-01-01

NASA's space programs like many other technical programs of its magnitude is supported by a large volume of technical documents. These documents are not only diverse but also abundant. Management, maintenance, and retrieval of these documents is a challenging problem by itself; but, relating and cross-referencing this wealth of information when it is all on a medium of paper is an even greater challenge. The Electronic Documentation Project (EDP) is to provide an electronic system capable of developing, distributing and controlling changes for crew/ground controller procedures and related documents. There are two primary motives for the solution. The first motive is to reduce the cost of maintaining the current paper based method of operations by replacing paper documents with electronic information storage and retrieval. And, the other is to improve the efficiency and provide enhanced flexibility in document usage. Initially, the current paper based system will be faithfully reproduced in an electronic format to be used in the document viewing system. In addition, this metaphor will have hypertext extensions. Hypertext features support basic functions such as full text searches, key word searches, data retrieval, and traversal between nodes of information as well as speeding up the data access rate. They enable related but separate documents to have relationships, and allow the user to explore information naturally through non-linear link traversals. The basic operational requirements of the document viewing system are to: provide an electronic corollary to the current method of paper based document usage; supplement and ultimately replace paper-based documents; maintain focused toward control center operations such as Flight Data File, Flight Rules and Console Handbook viewing; and be available NASA wide.
Facilitating access to information in large documents with an intelligent hypertext system

NASA Technical Reports Server (NTRS)

Mathe, Nathalie

1993-01-01

Retrieving specific information from large amounts of documentation is not an easy task. It could be facilitated if information relevant in the current problem solving context could be automatically supplied to the user. As a first step towards this goal, we have developed an intelligent hypertext system called CID (Computer Integrated Documentation) and tested it on the Space Station Freedom requirement documents. The CID system enables integration of various technical documents in a hypertext framework and includes an intelligent context-sensitive indexing and retrieval mechanism. This mechanism utilizes on-line user information requirements and relevance feedback either to reinforce current indexing in case of success or to generate new knowledge in case of failure. This allows the CID system to provide helpful responses, based on previous usage of the documentation, and to improve its performance over time.
The Document Management Alliance.

ERIC Educational Resources Information Center

Fay, Chuck

1998-01-01

Describes the Document Management Alliance, a standards effort for document management systems that manages and tracks changes to electronic documents created and used by collaborative teams, provides secure access, and facilitates online information retrieval via the Internet and World Wide Web. Future directions are also discussed. (LRW)
Jaccard Similarity Leads to the Marczewski-Steinhaus Topology for Information Retrieval.

ERIC Educational Resources Information Center

Rousseau, Ronald

1998-01-01

Demonstrates that if the similarity function of a retrieval system leads to a (pseudo-) metric, the retrieval, similarity and Everett-Cater metric topology coincide and are different from the discrete topology; this is the case if documents are represented by lists, using the Jaccard similarity measure. The corresponding metric is the…
Historical Note: The Past Thirty Years in Information Retrieval.

ERIC Educational Resources Information Center

Salton, Gerard

1987-01-01

Briefly reviews early work in documentation and text processing, and predictions that were made about the creative role of computers in information retrieval. An attempt is made to explain why these predictions were not fulfilled and conclusions are drawn regarding the limits of computer power in text retrieval applications. (Author/CLB)
Engineering a Multi-Purpose Test Collection for Web Retrieval Experiments.

ERIC Educational Resources Information Center

Bailey, Peter; Craswell, Nick; Hawking, David

2003-01-01

Describes a test collection that was developed as a multi-purpose testbed for experiments on the Web in distributed information retrieval, hyperlink algorithms, and conventional ad hoc retrieval. Discusses inter-server connectivity, integrity of server holdings, inclusion of documents related to a wide spread of likely queries, and distribution of…
21 CFR 58.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 21 Food and Drugs 1 2013-04-01 2013-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 21 Food and Drugs 1 2014-04-01 2014-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 21 Food and Drugs 1 2012-04-01 2012-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 21 Food and Drugs 1 2010-04-01 2010-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...

21 CFR 58.190 - Storage and retrieval of records and data.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 21 Food and Drugs 1 2011-04-01 2011-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
Logic-Based Retrieval: Technology for Content-Oriented and Analytical Querying of Patent Data

NASA Astrophysics Data System (ADS)

Klampanos, Iraklis Angelos; Wu, Hengzhi; Roelleke, Thomas; Azzam, Hany

Patent searching is a complex retrieval task. An initial document search is only the starting point of a chain of searches and decisions that need to be made by patent searchers. Keyword-based retrieval is adequate for document searching, but it is not suitable for modelling comprehensive retrieval strategies. DB-like and logical approaches are the state-of-the-art techniques to model strategies, reasoning and decision making. In this paper we present the application of logical retrieval to patent searching. The two grand challenges are expressiveness and scalability, where high degree of expressiveness usually means a loss in scalability. In this paper we report how to maintain scalability while offering the expressiveness of logical retrieval required for solving patent search tasks. We present logical retrieval background, and how to model data-source selection and results' fusion. Moreover, we demonstrate the modelling of a retrieval strategy, a technique by which patent professionals are able to express, store and exchange their strategies and rationales when searching patents or when making decisions. An overview of the architecture and technical details complement the paper, while the evaluation reports preliminary results on how query processing times can be guaranteed, and how quality is affected by trading off responsiveness.
KISTI at TREC 2014 Clinical Decision Support Track: Concept-based Document Re-ranking to Biomedical Information Retrieval

DTIC Science & Technology

2014-11-01

sematic type. Injury or Poisoning inpo T037 Anatomical Abnormality anab T190 Given a document D, a concept vector = {1, 2, … , ...integrating biomedical terminology . Nucleic acids research 32, Database issue (2004), 267–270. 5. Chapman, W.W., Hillert, D., Velupillai, S., et...Conference (TREC), (2011). 9. Koopman, B. and Zuccon, G. Understanding negation and family history to improve clinical information retrieval. Proceedings
An integrated information retrieval and document management system

NASA Technical Reports Server (NTRS)

Coles, L. Stephen; Alvarez, J. Fernando; Chen, James; Chen, William; Cheung, Lai-Mei; Clancy, Susan; Wong, Alexis

1993-01-01

This paper describes the requirements and prototype development for an intelligent document management and information retrieval system that will be capable of handling millions of pages of text or other data. Technologies for scanning, Optical Character Recognition (OCR), magneto-optical storage, and multiplatform retrieval using a Standard Query Language (SQL) will be discussed. The semantic ambiguity inherent in the English language is somewhat compensated-for through the use of coefficients or weighting factors for partial synonyms. Such coefficients are used both for defining structured query trees for routine queries and for establishing long-term interest profiles that can be used on a regular basis to alert individual users to the presence of relevant documents that may have just arrived from an external source, such as a news wire service. Although this attempt at evidential reasoning is limited in comparison with the latest developments in AI Expert Systems technology, it has the advantage of being commercially available.
High-speed data search

NASA Technical Reports Server (NTRS)

Driscoll, James N.

1994-01-01

The high-speed data search system developed for KSC incorporates existing and emerging information retrieval technology to help a user intelligently and rapidly locate information found in large textual databases. This technology includes: natural language input; statistical ranking of retrieved information; an artificial intelligence concept called semantics, where 'surface level' knowledge found in text is used to improve the ranking of retrieved information; and relevance feedback, where user judgements about viewed information are used to automatically modify the search for further information. Semantics and relevance feedback are features of the system which are not available commercially. The system further demonstrates focus on paragraphs of information to decide relevance; and it can be used (without modification) to intelligently search all kinds of document collections, such as collections of legal documents medical documents, news stories, patents, and so forth. The purpose of this paper is to demonstrate the usefulness of statistical ranking, our semantic improvement, and relevance feedback.
Effects of Information Access Cost and Accountability on Medical Residents' Information Retrieval Strategy and Performance During Prehandover Preparation: Evidence From Interview and Simulation Study.

PubMed

Yang, X Jessie; Wickens, Christopher D; Park, Taezoon; Fong, Liesel; Siah, Kewin T H

2015-12-01

We aimed to examine the effects of information access cost and accountability on medical residents' information retrieval strategy and performance during prehandover preparation. Prior studies observing doctors' prehandover practices witnessed the use of memory-intensive strategies when retrieving patient information. These strategies impose potential threats to patient safety as human memory is prone to errors. Of interest in this work are the underlying determinants of information retrieval strategy and the potential impacts on medical residents' information preparation performance. A two-step research approach was adopted, consisting of semistructured interviews with 21 medical residents and a simulation-based experiment with 32 medical residents. The semistructured interviews revealed that a substantial portion of medical residents (38%) relied largely on memory for preparing handover information. The simulation-based experiment showed that higher information access cost reduced information access attempts and access duration on patient documents and harmed information preparation performance. Higher accountability led to marginally longer access to patient documents. It is important to understand the underlying determinants of medical residents' information retrieval strategy and performance during prehandover preparation. We noted the criticality of easy access to patient documents in prehandover preparation. In addition, accountability marginally influenced medical residents' information retrieval strategy. Findings from this research suggested that the cost of accessing information sources should be minimized in developing handover preparation tools. © 2015, Human Factors and Ergonomics Society.
Data collection and preparation of authoritative reviews on space food and nutrition research

NASA Technical Reports Server (NTRS)

1972-01-01

The collection and classification of information for a manually operated information retrieval system on the subject of space food and nutrition research are described. The system as it currently exists is designed for retrieval of documents, either in hard copy or on microfiche, from the technical files of the MSC Food and Nutrition Section by accession number, author, and/or subject. The system could readily be extended to include retrieval by affiliation, report and contract number, and sponsoring agency should the need arise. It can also be easily converted to computerized retrieval. At present the information retrieval system contains nearly 3000 documents which consist of technical papers, contractors' reports, and reprints obtained from the food and nutrition files at MSC, Technical Library, the library at the Texas Medical Center in Houston, the BMI Technical Libraries, Dr. E. B. Truitt at MBI, and the OSU Medical Libraries. Additional work was done to compile 18 selected bibliographies on subjects of immediate interest on the MSC Food and Nutrition Section.
Radiology-led Follow-up System for IVC Filters: Effects on Retrieval Rates and Times

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, L.; Taylor, J.; Munneke, G.

Purpose: Successful IVC filter retrieval rates fall with time. Serious complications have been reported following attempts to remove filters after 3-18 months. Failed retrieval may be associated with adverse clinical sequelae. This study explored whether retrieval rates are improved if interventional radiologists organize patient follow-up, rather than relying on the referring clinicians. Methods: Proactive follow-up of patients who undergo filter placement was implemented in May 2008. At the time of filter placement, a report was issued to the referring consultant notifying them of the advised timeframe for filter retrieval. Clinicians were contacted to arrange retrieval within 30 days. We comparedmore » this with our practice for the preceding year. Results: The numbers of filters inserted during the two time periods was similar, as were the numbers of retrieval attempts and the time scale at which they occurred. The rate of successful retrievals increased but not significantly. The major changes were better documentation of filter types and better clinical follow-up. After the change in practice, only one patient was lost to follow-up compared with six the preceding year. Conclusions: Although there was no significant improvement in retrieval rates, the proactive, radiology-led approach improved follow-up and documentation, ensuring that a clinical decision was made about how long the filter was required and whether retrieval should be attempted and ensuring that patients were not lost to follow-up.« less
Word Spotting for Indic Documents to Facilitate Retrieval

NASA Astrophysics Data System (ADS)

Bhardwaj, Anurag; Setlur, Srirangaraj; Govindaraju, Venu

With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script-specific keyword spotting for Devanagari documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script-independent keyword spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.
Extended Subject Access to Hypertext Online Documentation. Part III: The Document-Boundaries Problem.

ERIC Educational Resources Information Center

Girill, T. R.

1991-01-01

This article continues the description of DFT (Document, Find, Theseus), an online documentation system that provides computer-managed on-demand printing of software manuals as well as the interactive retrieval of reference passages. Document boundaries in the hypertext database are discussed, search vocabulary complexities are described, and text…
Automatic Dictionary Construction; Part II of Scientific Report No. ISR-18, Information Storage and Retrieval...

ERIC Educational Resources Information Center

Cornell Univ., Ithaca, NY. Dept. of Computer Science.

Part Two of the eighteenth report on Salton's Magical Automatic Retriever of Texts (SMART) project is composed of three papers: The first: "The Effect of Common Words and Synonyms on Retrieval Performance" by D. Bergmark discloses that removal of common words from the query and document vectors significantly increases precision and that…
The JPL Library Information Retrieval System

ERIC Educational Resources Information Center

Walsh, Josephine

1975-01-01

The development, capabilities, and products of the computer-based retrieval system of the Jet Propulsion Laboratory Library are described. The system handles books and documents, produces a book catalog, and provides a machine search capability. (Author)
An evaluation of information retrieval accuracy with simulated OCR output

DOE Office of Scientific and Technical Information (OSTI.GOV)

Croft, W.B.; Harding, S.M.; Taghva, K.

Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the difficulty of constructing test collections to obtain this data, we have carried out evaluation using simulated OCR output on a variety of databases. The results show that high quality OCR devices have little effect on the accuracy of retrieval, but low quality devices used with databases of short documents canmore » result in significant degradation.« less
Basic firefly algorithm for document clustering

NASA Astrophysics Data System (ADS)

Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza

2015-12-01

The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
An Abstraction-Based Data Model for Information Retrieval

NASA Astrophysics Data System (ADS)

McAllister, Richard A.; Angryk, Rafal A.

Language ontologies provide an avenue for automated lexical analysis that may be used to supplement existing information retrieval methods. This paper presents a method of information retrieval that takes advantage of WordNet, a lexical database, to generate paths of abstraction, and uses them as the basis for an inverted index structure to be used in the retrieval of documents from an indexed corpus. We present this method as a entree to a line of research on using ontologies to perform word-sense disambiguation and improve the precision of existing information retrieval techniques.
A LDA-based approach to promoting ranking diversity for genomics information retrieval.

PubMed

Chen, Yan; Yin, Xiaoshi; Li, Zhoujun; Hu, Xiaohua; Huang, Jimmy Xiangji

2012-06-11

In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search. Moreover, we proposes a distance measure to quantify how much a passage can increase topical diversity by considering both topical importance and topical coefficient by LDA, and the distance measure is a modified Euclidean distance.
FAPA: Faculty Appointment Policy Archive, 1998. [CD-ROM.

ERIC Educational Resources Information Center

Trower, C. Ann

This CD-ROM presents 220 documents collected in Harvard University's Faculty Appointment Policy Archive (FAPA), the ZyFIND search and retrieval system, and instructions for their use. The FAPA system and ZyFIND allow browsing through documents, inserting bookmarks in documents, attaching notes to documents without modifying them, and selecting…
A framework for biomedical figure segmentation towards image-based document retrieval

PubMed Central

2013-01-01

The figures included in many of the biomedical publications play an important role in understanding the biological experiments and facts described within. Recent studies have shown that it is possible to integrate the information that is extracted from figures in classical document classification and retrieval tasks in order to improve their accuracy. One important observation about the figures included in biomedical publications is that they are often composed of multiple subfigures or panels, each describing different methodologies or results. The use of these multimodal figures is a common practice in bioscience, as experimental results are graphically validated via multiple methodologies or procedures. Thus, for a better use of multimodal figures in document classification or retrieval tasks, as well as for providing the evidence source for derived assertions, it is important to automatically segment multimodal figures into subfigures and panels. This is a challenging task, however, as different panels can contain similar objects (i.e., barcharts and linecharts) with multiple layouts. Also, certain types of biomedical figures are text-heavy (e.g., DNA sequences and protein sequences images) and they differ from traditional images. As a result, classical image segmentation techniques based on low-level image features, such as edges or color, are not directly applicable to robustly partition multimodal figures into single modal panels. In this paper, we describe a robust solution for automatically identifying and segmenting unimodal panels from a multimodal figure. Our framework starts by robustly harvesting figure-caption pairs from biomedical articles. We base our approach on the observation that the document layout can be used to identify encoded figures and figure boundaries within PDF files. Taking into consideration the document layout allows us to correctly extract figures from the PDF document and associate their corresponding caption. We combine pixel-level representations of the extracted images with information gathered from their corresponding captions to estimate the number of panels in the figure. Thus, our approach simultaneously identifies the number of panels and the layout of figures. In order to evaluate the approach described here, we applied our system on documents containing protein-protein interactions (PPIs) and compared the results against a gold standard that was annotated by biologists. Experimental results showed that our automatic figure segmentation approach surpasses pure caption-based and image-based approaches, achieving a 96.64% accuracy. To allow for efficient retrieval of information, as well as to provide the basis for integration into document classification and retrieval systems among other, we further developed a web-based interface that lets users easily retrieve panels containing the terms specified in the user queries. PMID:24565394
Abstracts of SIG Sessions.

ERIC Educational Resources Information Center

Proceedings of the ASIS Annual Meeting, 1993

1993-01-01

Presents abstracts of 34 special interest group (SIG) sessions. Highlights include humanities scholars and electronic texts; information retrieval and indexing systems design; automated indexing; domain analysis; query expansion in document retrieval systems; thesauri; business intelligence; Americans with Disabilities Act; management;…
Automatic indexing and retrieval of encounter-specific evidence for point-of-care support.

PubMed

O'Sullivan, Dympna M; Wilk, Szymon A; Michalowski, Wojtek J; Farion, Ken J

2010-08-01

Evidence-based medicine relies on repositories of empirical research evidence that can be used to support clinical decision making for improved patient care. However, retrieving evidence from such repositories at local sites presents many challenges. This paper describes a methodological framework for automatically indexing and retrieving empirical research evidence in the form of the systematic reviews and associated studies from The Cochrane Library, where retrieved documents are specific to a patient-physician encounter and thus can be used to support evidence-based decision making at the point of care. Such an encounter is defined by three pertinent groups of concepts - diagnosis, treatment, and patient, and the framework relies on these three groups to steer indexing and retrieval of reviews and associated studies. An evaluation of the indexing and retrieval components of the proposed framework was performed using documents relevant for the pediatric asthma domain. Precision and recall values for automatic indexing of systematic reviews and associated studies were 0.93 and 0.87, and 0.81 and 0.56, respectively. Moreover, precision and recall for the retrieval of relevant systematic reviews and associated studies were 0.89 and 0.81, and 0.92 and 0.89, respectively. With minor modifications, the proposed methodological framework can be customized for other evidence repositories. Copyright 2010 Elsevier Inc. All rights reserved.

Bibliometric analysis of global migration health research in peer-reviewed literature (2000-2016).

PubMed

Sweileh, Waleed M; Wickramage, Kolitha; Pottie, Kevin; Hui, Charles; Roberts, Bayard; Sawalha, Ansam F; Zyoud, Saed H

2018-06-20

The health of migrants has become an important issue in global health and foreign policy. Assessing the current status of research activity and identifying gaps in global migration health (GMH) is an important step in mapping the evidence-base and on advocating health needs of migrants and mobile populations. The aim of this study was to analyze globally published peer-reviewed literature in GMH. A bibliometric analysis methodology was used. The Scopus database was used to retrieve documents in peer-reviewed journals in GMH for the study period from 2000 to 2016. A group of experts in GMH developed the needed keywords and validated the final search strategy. The number of retrieved documents was 21,457. Approximately one third (6878; 32.1%) of the retrieved documents were published in the last three years of the study period. In total, 5451 (25.4%) documents were about refugees and asylum seekers, while 1328 (6.2%) were about migrant workers, 440 (2.1%) were about international students, 679 (3.2%) were about victims of human trafficking/smuggling, 26 (0.1%) were about patients' mobility across international borders, and the remaining documents were about unspecified categories of migrants. The majority of the retrieved documents (10,086; 47.0%) were in psychosocial and mental health domain, while 2945 (13.7%) documents were in infectious diseases, 6819 (31.8%) documents were in health policy and systems, 2759 (12.8%) documents were in maternal and reproductive health, and 1918 (8.9%) were in non-communicable diseases. The contribution of authors and institutions in Asian countries, Latin America, Africa, Middle East, and Eastern European countries was low. Literature in GMH represents the perspectives of high-income migrant destination countries. Our heat map of research output shows that despite the ever-growing prominence of human mobility across the globe, and Sustainable Development Goals of leaving no one behind, research output on migrants' health is not consistent with the global migration pattern. A stronger evidence base is needed to enable authorities to make evidence-informed decisions on migration health policy and practice. Research collaboration and networks should be encouraged to prioritize research in GMH.
Retrieval feedback in MEDLINE.

PubMed Central

Srinivasan, P

1996-01-01

OBJECTIVE: To investigate a new approach for query expansion based on retrieval feedback. The first objective in this study was to examine alternative query-expansion methods within the same retrieval-feedback framework. The three alternatives proposed are: expansion on the MeSH query field alone, expansion on the free-text field alone, and expansion on both the MeSH and the free-text fields. The second objective was to gain further understanding of retrieval feedback by examining possible dependencies on relevant documents during the feedback cycle. DESIGN: Comparative study of retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a MEDLINE test collection of 75 queries and 2,334 MEDLINE citations. MEASUREMENTS: Retrieval effectivenesses of the original unexpanded and the alternative expanded queries were compared using 11-point-average precision scores (11-AvgP). These are averages of precision scores obtained at 11 standard recall points. RESULTS: All three expansion strategies significantly improved the original queries in terms of retrieval effectiveness. Expansion on MeSH alone was equivalent to expansion on both MeSH and the free-text fields. Expansion on the free-text field alone improved the queries significantly less than did the other two strategies. The second part of the study indicated that retrieval-feedback-based expansion yields significant performance improvements independent of the availability of relevant documents for feedback information. CONCLUSIONS: Retrieval feedback offers a robust procedure for query expansion that is most effective for MEDLINE when applied to the MeSH field. PMID:8653452
DARE: Unesco Computerized Data Retrieval System for Documentation in the Social and Human Sciences (Including an Analysis of the Present System).

ERIC Educational Resources Information Center

Vasarhelyi, Paul

The new data retrieval system for the social sciences which has recently been installed in the UNESCO Secretariat in Paris is described in this comprehensive report. The computerized system is designed to facilitate the existing storage systems in the circulation of information, data retrieval, and indexing services. Basically, this report…
Personalizing Information Retrieval Using Interaction Behaviors in Search Sessions in Different Types of Tasks

ERIC Educational Resources Information Center

Liu, Chang

2012-01-01

When using information retrieval (IR) systems, users often pose short and ambiguous query terms. It is critical for IR systems to obtain more accurate representation of users' information need, their document preferences, and the context they are working in, and then incorporate them into the design of the systems to tailor retrieval to…
SPIRES (Stanford Physics Information REtrieval System) 1969-70 Annual Report to the National Science Foundation (Office of Science Information Service).

ERIC Educational Resources Information Center

Parker, Edwin B.

The third annual report (covering the 18-month period from January 1969 to June 1970) of the Stanford Physics Information REtrieval System (SPIRES) project, which is developing an augmented bibliographic retrieval capability, is presented in this document. A first section describes the background of the project and its association with Project…
User centered and ontology based information retrieval system for life sciences.

PubMed

Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent

2012-01-25

Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
User centered and ontology based information retrieval system for life sciences

PubMed Central

2012-01-01

Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375
Aquarius Salinity Retrieval Algorithm: Final Pre-Launch Version

NASA Technical Reports Server (NTRS)

Wentz, Frank J.; Le Vine, David M.

2011-01-01

This document provides the theoretical basis for the Aquarius salinity retrieval algorithm. The inputs to the algorithm are the Aquarius antenna temperature (T(sub A)) measurements along with a number of NCEP operational products and pre-computed tables of space radiation coming from the galaxy and sun. The output is sea-surface salinity and many intermediate variables required for the salinity calculation. This revision of the Algorithm Theoretical Basis Document (ATBD) is intended to be the final pre-launch version.
Management of technical date in Nihon Doro kodan

NASA Astrophysics Data System (ADS)

Hanada, Jun'ichi

Nihon Doro Kodan Laboratory has collected and contributed technical data (microfiches, aerial photographs, books and literature) on plans, designs, constructions and maintenance of the national expressways and the ordinary toll roads since 1968. This work is systematized on computer to retrieve and contribute data faster. Now Laboratory operates Technical Data Management System which manages all of technical data and Technical Document Management System which manages technical documents. These systems stand on users' on-line retrieval and data accumuration by microfiches and optical disks.
Comment on "An Evaluation of Query Expansion by the Addition of Clustered Terms for a Document Retrieval System"

ERIC Educational Resources Information Center

Salton, G.

1972-01-01

The author emphasized that one cannot conclude from the experiments reported upon that term clusters (or equivalently, keyword classifications or thesauruses) are not useful in retrieval. (2 references) (Author)
CDAPubMed: a browser extension to retrieve EHR-based biomedical literature.

PubMed

Perez-Rey, David; Jimenez-Castellanos, Ana; Garcia-Remesal, Miguel; Crespo, Jose; Maojo, Victor

2012-04-05

Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
CDAPubMed: a browser extension to retrieve EHR-based biomedical literature

PubMed Central

2012-01-01

Background Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems. PMID:22480327
Let Documents Talk to Each Other: A Computer Model for Connection of Short Documents.

ERIC Educational Resources Information Center

Chen, Z.

1993-01-01

Discusses the integration of scientific texts through the connection of documents and describes a computer model that can connect short documents. Information retrieval and artificial intelligence are discussed; a prototype system of the model is explained; and the model is compared to other computer models. (17 references) (LRW)
Automated Management Of Documents

NASA Technical Reports Server (NTRS)

Boy, Guy

1995-01-01

Report presents main technical issues involved in computer-integrated documentation. Problems associated with automation of management and maintenance of documents analyzed from perspectives of artificial intelligence and human factors. Technologies that may prove useful in computer-integrated documentation reviewed: these include conventional approaches to indexing and retrieval of information, use of hypertext, and knowledge-based artificial-intelligence systems.
An introduction to information retrieval: applications in genomics

PubMed Central

Nadkarni, P M

2011-01-01

Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user’s query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics. PMID:12049181
Deep Question Answering for protein annotation

PubMed Central

Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

2015-01-01

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ PMID:26384372
Deep Question Answering for protein annotation.

PubMed

Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

2015-01-01

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.
On-Line Retrieval II.

ERIC Educational Resources Information Center

Kurtz, Peter; And Others

This report is concerned with the implementation of two interrelated computer systems: an automatic document analysis and classification package, and an on-line interactive information retrieval system which utilizes the information gathered during the automatic classification phase. Well-known techniques developed by Salton and Dennis have been…
An Experiment in Index Term Frequency

ERIC Educational Resources Information Center

Svenonius, Elaine

1972-01-01

The question is asked: Of index terms assigned to documents, which function most effectively in retrieval, the most used or popular terms, or those which are used relatively infrequently? The experiment is a retrieval experiment and uses the Cranfield-Salton data. (14 references) (Author)
WEBCAP: Web Scheduler for Distance Learning Multimedia Documents with Web Workload Considerations

ERIC Educational Resources Information Center

Habib, Sami; Safar, Maytham

2008-01-01

In many web applications, such as the distance learning, the frequency of refreshing multimedia web documents places a heavy burden on the WWW resources. Moreover, the updated web documents may encounter inordinate delays, which make it difficult to retrieve web documents in time. Here, we present an Internet tool called WEBCAP that can schedule…

Electronic Document Delivery: OCLC's Prototype System.

ERIC Educational Resources Information Center

Hickey, Thomas B.; Calabrese, Andrew M.

1986-01-01

Describes development of system for retrieval of documents from magnetic storage that uses stored font definition codes to control an inexpensive laser printer in the production of copies that closely resemble original document. Trends in information equipment and printing industries that will govern future application of this technology are…
Inverted File Compression through Document Identifier Reassignment.

ERIC Educational Resources Information Center

Shieh, Wann-Yun; Chen, Tien-Fu; Shann, Jean Jyh-Jiun; Chung, Chung-Ping

2003-01-01

Discusses the use of inverted files in information retrieval systems and proposes a document identifier reassignment method to reduce the average gap values in an inverted file. Highlights include the d-gap technique; document similarity; heuristic algorithms; file compression; and performance evaluation from a simulation environment. (LRW)
A Novel Navigation Paradigm for XML Repositories.

ERIC Educational Resources Information Center

Azagury, Alain; Factor, Michael E.; Maarek, Yoelle S.; Mandler, Benny

2002-01-01

Discusses data exchange over the Internet and describes the architecture and implementation of an XML document repository that promotes a navigation paradigm for XML documents based on content and context. Topics include information retrieval and semistructured documents; and file systems as information storage infrastructure, particularly XMLFS.…
Organ donation in the ICU: A document analysis of institutional policies, protocols, and order sets.

PubMed

Oczkowski, Simon J W; Centofanti, John E; Durepos, Pamela; Arseneau, Erika; Kelecevic, Julija; Cook, Deborah J; Meade, Maureen O

2018-04-01

To better understand how local policies influence organ donation rates. We conducted a document analysis of our ICU organ donation policies, protocols and order sets. We used a systematic search of our institution's policy library to identify documents related to organ donation. We used Mindnode software to create a publication timeline, basic statistics to describe document characteristics, and qualitative content analysis to extract document themes. Documents were retrieved from Hamilton Health Sciences, an academic hospital system with a high volume of organ donation, from database inception to October 2015. We retrieved 12 active organ donation documents, including six protocols, two policies, two order sets, and two unclassified documents, a majority (75%) after the introduction of donation after circulatory death in 2006. Four major themes emerged: organ donation process, quality of care, patient and family-centred care, and the role of the institution. These themes indicate areas where documented institutional standards may be beneficial. Further research is necessary to determine the relationship of local policies, protocols, and order sets to actual organ donation practices, and to identify barriers and facilitators to improving donation rates. Copyright © 2017 Elsevier Ltd. All rights reserved.
26 CFR 1.1471-1 - Scope of chapter 4 and definitions.

Code of Federal Regulations, 2013 CFR

2013-04-01

... an image retrieval system (such as portable document format (.pdf) or scanned documents). (35) Entity..., custodial institution, or specified insurance company. (124) TIN. The term TIN means the tax identifying...
26 CFR 1.1471-1 - Scope of chapter 4 and definitions.

Code of Federal Regulations, 2014 CFR

2014-04-01

... an image retrieval system (such as portable document format (.pdf) or scanned documents). (39) Entity..., custodial institution, or specified insurance company. (133) TIN. The term TIN means the tax identifying...
Computer Program and User Documentation Medical Data Input System

NASA Technical Reports Server (NTRS)

Anderson, J.

1971-01-01

Several levels of documentation are presented for the program module of the NASA medical directorate minicomputer storage and retrieval system. The biomedical information system overview gives reasons for the development of the minicomputer storage and retrieval system. It briefly describes all of the program modules which constitute the system. A technical discussion oriented to the programmer is given. Each subroutine is described in enough detail to permit in-depth understanding of the routines and to facilitate program modifications. The program utilization section may be used as a users guide.
Referenced-site environmental document for a Monitored Retrievable Storage facility: backup waste management option for handling 1800 MTU per year

DOE Office of Scientific and Technical Information (OSTI.GOV)

Silviera, D.J.; Aaberg, R.L.; Cushing, C.E.

This environmental document includes a discussion of the purpose of a monitored retrievable storage facility, a description of two facility design concepts (sealed storage cask and field drywell), a description of three reference sites (arid, warm-wet, and cold-wet), and a discussion and comparison of the impacts associated with each of the six site/concept combinations. This analysis is based on a 15,000-MTU storage capacity and a throughput rate of up to 1800 MTU per year.
New Term Weighting Formulas for the Vector Space Method in Information Retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chisholm, E.; Kolda, T.G.

The goal in information retrieval is to enable users to automatically and accurately find data relevant to their queries. One possible approach to this problem i use the vector space model, which models documents and queries as vectors in the term space. The components of the vectors are determined by the term weighting scheme, a function of the frequencies of the terms in the document or query as well as throughout the collection. We discuss popular term weighting schemes and present several new schemes that offer improved performance.
Knowledge-Sparse and Knowledge-Rich Learning in Information Retrieval.

ERIC Educational Resources Information Center

Rada, Roy

1987-01-01

Reviews aspects of the relationship between machine learning and information retrieval. Highlights include learning programs that extend from knowledge-sparse learning to knowledge-rich learning; the role of the thesaurus; knowledge bases; artificial intelligence; weighting documents; work frequency; and merging classification structures. (78…
Common Problems of Documentary Information Transfer, Storage and Retrieval in Industrial Organizations.

ERIC Educational Resources Information Center

Vickers, P. H.

1983-01-01

Examination of management information systems of three manufacturing firms highlights principal characteristics, document types and functions, main information flows, storage and retrieval systems, and common problems (corporate memory failure, records management, management information systems, general management). A literature review and…
Out of sight, out of mind: racial retrieval cues increase the accessibility of social justice concepts.

PubMed

Salter, Phia S; Kelley, Nicholas J; Molina, Ludwin E; Thai, Luyen T

2017-09-01

Photographs provide critical retrieval cues for personal remembering, but few studies have considered this phenomenon at the collective level. In this research, we examined the psychological consequences of visual attention to the presence (or absence) of racially charged retrieval cues within American racial segregation photographs. We hypothesised that attention to racial retrieval cues embedded in historical photographs would increase social justice concept accessibility. In Study 1, we recorded gaze patterns with an eye-tracker among participants viewing images that contained racial retrieval cues or were digitally manipulated to remove them. In Study 2, we manipulated participants' gaze behaviour by either directing visual attention toward racial retrieval cues, away from racial retrieval cues, or directing attention within photographs where racial retrieval cues were missing. Across Studies 1 and 2, visual attention to racial retrieval cues in photographs documenting historical segregation predicted social justice concept accessibility.
Documents, Dialogue and the Emergence of Tertiary Orality

ERIC Educational Resources Information Center

Turner, Deborah; Allen, Warren

2013-01-01

Introduction: This investigation opens with a description of why studying non-traditional, oral documents can inform efforts to extend traditional library and information science practices, of description, storage, and retrieval, to artefacts made available through emerging media. Method: This study extends the method used to identify a document,…
Design Package for Fuel Retrieval System Fuel Handling Tool Modification

DOE Office of Scientific and Technical Information (OSTI.GOV)

TEDESCHI, D.J.

This design package documents design, fabrication, and testing of new stinger tool design. Future revisions will document further development of the stinger tool and incorporate various developmental stages, and final test results.
Search and retrieval of office files using dBASE 3

NASA Technical Reports Server (NTRS)

Breazeale, W. L.; Talley, C. R.

1986-01-01

Described is a method of automating the office files retrieval process using a commercially available software package (dBASE III). The resulting product is a menu-driven computer program which requires no computer skills to operate. One part of the document is written for the potential user who has minimal computer experience and uses sample menu screens to explain the program; while a second part is oriented towards the computer literate individual and includes rather detailed descriptions of the methodology and search routines. Although much of the programming techniques are explained, this document is not intended to be a tutorial on dBASE III. It is hoped that the document will serve as a stimulus for other applications of dBASE III.
The Profile-Query Relationship.

ERIC Educational Resources Information Center

Shepherd, Michael A.; Phillips, W. J.

1986-01-01

Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…
A Bayesian Approach to Interactive Retrieval

ERIC Educational Resources Information Center

Tague, Jean M.

1973-01-01

A probabilistic model for interactive retrieval is presented. Bayesian statistical decision theory principles are applied: use of prior and sample information about the relationship of document descriptions to query relevance; maximization of expected value of a utility function, to the problem of optimally restructuring search strategies in an…
Development of the CODER System: A Testbed for Artificial Intelligence Methods in Information Retrieval.

ERIC Educational Resources Information Center

Fox, Edward A.

1987-01-01

Discusses the CODER system, which was developed to investigate the application of artificial intelligence methods to increase the effectiveness of information retrieval systems, particularly those involving heterogeneous documents. Highlights include the use of PROLOG programing, blackboard-based designs, knowledge engineering, lexicological…
Topics in Semantic Representation

ERIC Educational Resources Information Center

Griffiths, Thomas L.; Steyvers, Mark; Tenenbaum, Joshua B.

2007-01-01

Processing language requires the retrieval of concepts from memory in response to an ongoing stream of information. This retrieval is facilitated if one can infer the gist of a sentence, conversation, or document and use that gist to predict related concepts and disambiguate words. This article analyzes the abstract computational problem…
Context-sensitive medical information retrieval.

PubMed

Auerbuch, Mordechai; Karson, Tom H; Ben-Ami, Benjamin; Maimon, Oded; Rokach, Lior

2004-01-01

Substantial medical data such as pathology reports, operative reports, discharge summaries, and radiology reports are stored in textual form. Databases containing free-text medical narratives often need to be searched to find relevant information for clinical and research purposes. Terms that appear in these documents tend to appear in different contexts. The con-text of negation, a negative finding, is of special importance, since many of the most frequently described findings are those denied by the patient or subsequently "ruled out." Hence, when searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the retrieved documents will be irrelevant. The purpose of this work is to develop a methodology for automated learning of negative context patterns in medical narratives and test the effect of context identification on the performance of medical information retrieval. The algorithm presented significantly improves the performance of information retrieval done on medical narratives. The precision im-proves from about 60%, when using context-insensitive retrieval, to nearly 100%. The impact on recall is only minor. In addition, context-sensitive queries enable the user to search for terms in ways not otherwise available

On the use of the singular value decomposition for text retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Husbands, P.; Simon, H.D.; Ding, C.

2000-12-04

The use of the Singular Value Decomposition (SVD) has been proposed for text retrieval in several recent works. This technique uses the SVD to project very high dimensional document and query vectors into a low dimensional space. In this new space it is hoped that the underlying structure of the collection is revealed thus enhancing retrieval performance. Theoretical results have provided some evidence for this claim and to some extent experiments have confirmed this. However, these studies have mostly used small test collections and simplified document models. In this work we investigate the use of the SVD on large documentmore » collections. We show that, if interpreted as a mechanism for representing the terms of the collection, this technique alone is insufficient for dealing with the variability in term occurrence. Section 2 introduces the text retrieval concepts necessary for our work. A short description of our experimental architecture is presented in Section 3. Section 4 describes how term occurrence variability affects the SVD and then shows how the decomposition influences retrieval performance. A possible way of improving SVD-based techniques is presented in Section 5 and concluded in Section 6.« less
Microsoft Research at TREC 2009. Web and Relevance Feedback Tracks

DTIC Science & Technology

2009-11-01

Information Processing Systems, pages 193–200, 2006. [2] J . M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th...Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of the 3rd Text REtrieval Conference, 1994. [8] J . J . Rocchio. Relevance...feedback in information retrieval. In Gerard Salton , editor, The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall
Besides Precision & Recall: Exploring Alternative Approaches to Evaluating an Automatic Indexing Tool for MEDLINE

PubMed Central

Névéol, Aurélie; Zeng, Kelly; Bodenreider, Olivier

2006-01-01

Objective This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. Materials and methods The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Results Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. Conclusions The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis. PMID:17238409
Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for MEDLINE.

PubMed

Neveol, Aurélie; Zeng, Kelly; Bodenreider, Olivier

2006-01-01

This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.
Quarantine document system indexing procedure

NASA Technical Reports Server (NTRS)

1972-01-01

The Quarantine Document System (QDS) is described including the indexing procedures and thesaurus of indexing terms. The QDS consists of these functional elements: acquisition, cataloging, indexing, storage, and retrieval. A complete listing of the collection, and the thesaurus are included.
Features and Feedback: Enhancing Metamnemonic Knowledge at Retrieval Reduces Source-Monitoring Errors

ERIC Educational Resources Information Center

Lane, Sean M.; Roussel, Cristine C.; Villa, Diane; Morita, Shelby K.

2007-01-01

Three experiments explored the issue of whether enhanced metamnemonic knowledge at retrieval can improve participants' ability to make difficult source discriminations in the context of the eyewitness suggestibility paradigm. The 1st experiment documented differences in phenomenal experience between veridical and false memories. Experiment 2…
INFORMATION RETRIEVAL EXPERIMENT. FINAL REPORT.

ERIC Educational Resources Information Center

SELYE, HANS

THIS REPORT IS A BRIEF REVIEW OF RESULTS OF AN EXPERIMENT TO DETERMINE THE INFORMATION RETRIEVAL EFFICIENCY OF A MANUAL SPECIALIZED INFORMATION SYSTEM BASED ON 700,000 DOCUMENTS IN THE FIELDS OF ENDOCRINOLOGY, STRESS, MAST CELLS, AND ANAPHYLACTOID REACTIONS. THE SYSTEM RECEIVES 30,000 PUBLICATIONS ANNUALLY. DETAILED INFORMATION IS REPRESENTED BY…
NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.

ERIC Educational Resources Information Center

Zhou, Lina; Zhang, Dongsong

2003-01-01

Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…
The Negative Testing and Negative Generation Effects Are Eliminated by Delay

ERIC Educational Resources Information Center

Mulligan, Neil W.; Peterson, Daniel J.

2015-01-01

Although retrieval often enhances subsequent memory (the testing effect), a negative testing effect has recently been documented in which prior retrieval harms later recall compared with restudying. The negative testing effect was predicated on the negative generation effect and the item-specific-relational framework. The present experiments…
Subject Retrieval from Full-Text Databases in the Humanities

ERIC Educational Resources Information Center

East, John W.

2007-01-01

This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…
Statistical Techniques for Efficient Indexing and Retrieval of Document Images

ERIC Educational Resources Information Center

Bhardwaj, Anurag

2010-01-01

We have developed statistical techniques to improve the performance of document image search systems where the intermediate step of OCR based transcription is not used. Previous research in this area has largely focused on challenges pertaining to generation of small lexicons for processing handwritten documents and enhancement of poor quality…
Information retrieval system utilizing wavelet transform

DOEpatents

Brewster, Mary E.; Miller, Nancy E.

2000-01-01

A method for automatically partitioning an unstructured electronically formatted natural language document into its sub-topic structure. Specifically, the document is converted to an electronic signal and a wavelet transform is then performed on the signal. The resultant signal may then be used to graphically display and interact with the sub-topic structure of the document.
A hypertext system that learns from user feedback

NASA Technical Reports Server (NTRS)

Mathe, Nathalie

1994-01-01

Retrieving specific information from large amounts of documentation is not an easy task. It could be facilitated if information relevant in the current problem solving context could be automatically supplied to the user. As a first step towards this goal, we have developed an intelligent hypertext system called CID (Computer Integrated Documentation). Besides providing an hypertext interface for browsing large documents, the CID system automatically acquires and reuses the context in which previous searches were appropriate. This mechanism utilizes on-line user information requirements and relevance feedback either to reinforce current indexing in case of success or to generate new knowledge in case of failure. Thus, the user continually augments and refines the intelligence of the retrieval system. This allows the CID system to provide helpful responses, based on previous usage of the documentation, and to improve its performance over time. We successfully tested the CID system with users of the Space Station Freedom requirements documents. We are currently extending CID to other application domains (Space Shuttle operations documents, airplane maintenance manuals, and on-line training). We are also exploring the potential commercialization of this technique.
A Strategy for Reusing the Data of Electronic Medical Record Systems for Clinical Research.

PubMed

Matsumura, Yasushi; Hattori, Atsushi; Manabe, Shiro; Tsuda, Tsutomu; Takeda, Toshihiro; Okada, Katsuki; Murata, Taizo; Mihara, Naoki

2016-01-01

There is a great need to reuse data stored in electronic medical records (EMR) databases for clinical research. We previously reported the development of a system in which progress notes and case report forms (CRFs) were simultaneously recorded using a template in the EMR in order to exclude redundant data entry. To make the data collection process more efficient, we are developing a system in which the data originally stored in the EMR database can be populated within a frame in a template. We developed interface plugin modules that retrieve data from the databases of other EMR applications. A universal keyword written in a template master is converted to a local code using a data conversion table, then the objective data is retrieved from the corresponding database. The template element data, which are entered by a template, are stored in the template element database. To retrieve the data entered by other templates, the objective data is designated by the template element code with the template code, or by the concept code if it is written for the element. When the application systems in the EMR generate documents, they also generate a PDF file and a corresponding document profile XML, which includes important data, and send them to the document archive server and the data sharing saver, respectively. In the data sharing server, the data are represented by an item with an item code with a document class code and its value. By linking a concept code to an item identifier, an objective data can be retrieved by designating a concept code. We employed a flexible strategy in which a unique identifier for a hospital is initially attached to all of the data that the hospital generates. The identifier is secondarily linked with concept codes. The data that are not linked with a concept code can also be retrieved using the unique identifier of the hospital. This strategy makes it possible to reuse any of a hospital's data.
National Space Science Data Center data archive and distribution service (NDADS) automated retrieval mail system user's guide

NASA Technical Reports Server (NTRS)

Perry, Charleen M.; Vansteenberg, Michael E.

1992-01-01

The National Space Science Data Center (NSSDC) has developed an automated data retrieval request service utilizing our Data Archive and Distribution Service (NDADS) computer system. NDADS currently has selected project data written to optical disk platters with the disks residing in a robotic 'jukebox' near-line environment. This allows for rapid and automated access to the data with no staff intervention required. There are also automated help information and user services available that can be accessed. The request system permits an average-size data request to be completed within minutes of the request being sent to NSSDC. A mail message, in the format described in this document, retrieves the data and can send it to a remote site. Also listed in this document are the data currently available.
Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.

PubMed

Segura Bedmar, Isabel; Martínez, Paloma; Carruana Martín, Adrián

2017-12-01

Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. Our experiments show promising results with an F1 of 69% on the test dataset. To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time. ©Isabel Segura Bedmar, Paloma Martínez, Adrián Carruana Martín. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.12.2017.
How to Handle the Avalanche of Online Documentation.

ERIC Educational Resources Information Center

Nolan, Maureen P.

1981-01-01

The method of handling the printed documentation associated with online information retrieval, which is described, involves the use of a series of separate but related files: database files, system files, network files, index sheets, and equipment files. (FM)
EM-31 RETRIEVAL KNOWLEDGE CENTER MEETING REPORT: MOBILIZE AND DISLODGE TANK WASTE HEELS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fellinger, A.

2010-02-16

The Retrieval Knowledge Center sponsored a meeting in June 2009 to review challenges and gaps to retrieval of tank waste heels. The facilitated meeting was held at the Savannah River Research Campus with personnel broadly representing tank waste retrieval knowledge at Hanford, Savannah River, Idaho, and Oak Ridge. This document captures the results of this meeting. In summary, it was agreed that the challenges to retrieval of tank waste heels fell into two broad categories: (1) mechanical heel waste retrieval methodologies and equipment and (2) understanding and manipulating the heel waste (physical, radiological, and chemical characteristics) to support retrieval optionsmore » and subsequent processing. Recent successes and lessons from deployments of the Sand and Salt Mantis vehicles as well as retrieval of C-Area tanks at Hanford were reviewed. Suggestions to address existing retrieval approaches that utilize a limited set of tools and techniques are included in this report. The meeting found that there had been very little effort to improve or integrate the multiple proven or new techniques and tools available into a menu of available methods for rapid insertion into baselines. It is recommended that focused developmental efforts continue in the two areas underway (low-level mixing evaluation and pumping slurries with large solid materials) and that projects to demonstrate new/improved tools be launched to outfit tank farm operators with the needed tools to complete tank heel retrievals effectively and efficiently. This document describes the results of a meeting held on June 3, 2009 at the Savannah River Site in South Carolina to identify technology gaps and potential technology solutions to retrieving high-level waste (HLW) heels from waste tanks within the complex of sites run by the U. S. Department of Energy (DOE). The meeting brought together personnel with extensive tank waste retrieval knowledge from DOE's four major waste sites - Hanford, Savannah River, Idaho, and Oak Ridge. The meeting was arranged by the Retrieval Knowledge Center (RKC), which is a technology development project sponsored by the Office of Technology Innovation & Development - formerly the Office of Engineering and Technology - within the DOE Office of Environmental Management (EM).« less
Information retrieval system utilizing wavelet transform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brewster, M.E.; Miller, N.E.

A method is disclosed for automatically partitioning an unstructured electronically formatted natural language document into its sub-topic structure. Specifically, the document is converted to an electronic signal and a wavelet transform is then performed on the signal. The resultant signal may then be used to graphically display and interact with the sub-topic structure of the document.
A Study and Model of Machine-Like Indexing Behavior by Human Indexers.

ERIC Educational Resources Information Center

McAllister, Caryl

Although a large part of a document retrieval system's resources are devoted to indexing, the question of how people do subject indexing has been the subject of much conjecture and only a little experimentation. This dissertation examines the relationships between a document being indexed and the index terms assigned to that document in an attempt…

Applying Hypertext Structures to Software Documentation.

ERIC Educational Resources Information Center

French, James C.; And Others

1997-01-01

Describes a prototype system for software documentation management called SLEUTH (Software Literacy Enhancing Usefulness to Humans) being developed at the University of Virginia. Highlights include information retrieval techniques, hypertext links that are installed automatically, a WAIS (Wide Area Information Server) search engine, user…
Evaluating Documents Reference Service and the Implications for Improvement.

ERIC Educational Resources Information Center

Parker, June D.

1996-01-01

Presents an evaluation of reference services and government document use at East Carolina University (North Carolina) library. Factors that most affect retrieval success include cataloging and technical problems, the amount of time spent in searching, and staff knowledge. (Author/AEF)
Medical Language Processing for Knowledge Representation and Retrievals

PubMed Central

Lyman, Margaret; Sager, Naomi; Chi, Emile C.; Tick, Leo J.; Nhan, Ngo Thanh; Su, Yun; Borst, Francois; Scherrer, Jean-Raoul

1989-01-01

The Linguistic String Project-Medical Language Processor, a system for computer analysis of narrative patient documents in English, is being adapted for French Lettres de Sortie. The system converts the free-text input to a semantic representation which is then mapped into a relational database. Retrievals of clinical data from the database are described.
Topic Models in Information Retrieval

DTIC Science & Technology

2007-08-01

Information Processing Systems, Cambridge, MA, MIT Press, 2004. Brown, P.F., Della Pietra, V.J., deSouza, P.V., Lai, J.C. and Mercer, R.L., Class-based...2003. http://www.wkap.nl/prod/b/1-4020-1216-0. Croft, W.B., Lucia , T.J., Cringean, J., and Willett, P., Retrieving Documents By Plausible Inference
Technological Imperatives: Using Computers in Academic Debate.

ERIC Educational Resources Information Center

Ticku, Ravinder; Phelps, Greg

Intended for forensic educators and debate teams, this document details how one university debate team, at the University of Iowa, makes use of computer resources on campus to facilitate storage and retrieval of information useful to debaters. The introduction notes the problem of storing and retrieving the amount of information required by debate…
Using Taxonomic Indexing Trees to Efficiently Retrieve SCORM-Compliant Documents in e-Learning Grids

ERIC Educational Resources Information Center

Shih, Wen-Chung; Tseng, Shian-Shyong; Yang, Chao-Tung

2008-01-01

With the flourishing development of e-Learning, more and more SCORM-compliant teaching materials are developed by institutes and individuals in different sites. In addition, the e-Learning grid is emerging as an infrastructure to enhance traditional e-Learning systems. Therefore, information retrieval schemes supporting SCORM-compliant documents…
Implementing and Evaluating a Bibliographic Retrieval System for Print and Non-Print Media Materials.

ERIC Educational Resources Information Center

Buchholz, James L.

This document summarizes the selection, configuration, implementation, and evaluation of BiblioFile, a CD-ROM based bibliographic retrieval system used to catalog and process library materials for 103 school centers in the Palm Beach County Schools (Florida). Technical processing included the production of spine labels, check-out cards and…
A Study of Adaptive Relevance Feedback - UIUC TREC-2008 Relevance Feedback Experiments

DTIC Science & Technology

2008-11-01

terms. Journal of the American Society for Information Science, 27(3):129–146, 1976. [7] J . J . Rocchio. Relevance feedback in information retrieval. In...In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall Inc., 1971. [8] Gerard Salton and Chris
Comparing the Document Representations of Two IR-Systems: CLARIT and TOPIC.

ERIC Educational Resources Information Center

Paijmans, Hans

1993-01-01

Compares two information retrieval systems, CLARIT and TOPIC, in terms of assigned versus derived and precoordinate versus postcoordinate indexing. Models of information retrieval systems are discussed, and a test of the systems using a demonstration database of full-text articles from the "Wall Street Journal" is described. (Contains 21…
Enhancing Retrieval with Hyperlinks: A General Model Based on Propositional Argumentation Systems.

ERIC Educational Resources Information Center

Picard, Justin; Savoy, Jacques

2003-01-01

Discusses the use of hyperlinks for improving information retrieval on the World Wide Web and proposes a general model for using hyperlinks based on Probabilistic Argumentation Systems. Topics include propositional logic, knowledge, and uncertainty; assumptions; using hyperlinks to modify document score and rank; and estimating the popularity of a…
1999 Leak Detection and Monitoring and Mitigation Strategy Update

DOE Office of Scientific and Technical Information (OSTI.GOV)

OHL, P.C.

This document is a complete revision of WHC-SD-WM-ES-378, Rev 1. This update includes recent developments in Leak Detection, Leak Monitoring, and Leak Mitigation technologies, as well as, recent developments in single-shell tank retrieval technologies. In addition, a single-shell tank retrieval release protection strategy is presented.
Image/text automatic indexing and retrieval system using context vector approach

NASA Astrophysics Data System (ADS)

Qing, Kent P.; Caid, William R.; Ren, Clara Z.; McCabe, Patrick

1995-11-01

Thousands of documents and images are generated daily both on and off line on the information superhighway and other media. Storage technology has improved rapidly to handle these data but indexing this information is becoming very costly. HNC Software Inc. has developed a technology for automatic indexing and retrieval of free text and images. This technique is demonstrated and is based on the concept of `context vectors' which encode a succinct representation of the associated text and features of sub-image. In this paper, we will describe the Automated Librarian System which was designed for free text indexing and the Image Content Addressable Retrieval System (ICARS) which extends the technique from the text domain into the image domain. Both systems have the ability to automatically assign indices for a new document and/or image based on the content similarities in the database. ICARS also has the capability to retrieve images based on similarity of content using index terms, text description, and user-generated images as a query without performing segmentation or object recognition.
Indexing and retrieving DICOM data in disperse and unstructured archives.

PubMed

Costa, Carlos; Freitas, Filipe; Pereira, Marco; Silva, Augusto; Oliveira, José L

2009-01-01

This paper proposes an indexing and retrieval solution to gather information from distributed DICOM documents by allowing searches and access to the virtual data repository using a Google-like process. The medical imaging modalities are becoming more powerful and less expensive. The result is the proliferation of equipment acquisition by imaging centers, including the small ones. With this dispersion of data, it is not easy to take advantage of all the information that can be retrieved from these studies. Furthermore, many of these small centers do not have large enough requirements to justify the acquisition of a traditional PACS. A peer-to-peer PACS platform to index and query DICOM files over a set of distributed repositories that are logically viewed as a single federated unit. The solution is based on a public domain document-indexing engine and extends traditional PACS query and retrieval mechanisms. This proposal deals well with complex searching requirements, from a single desktop environment to distributed scenarios. The solution performance and robustness were demonstrated in trials. The characteristics of presented PACS platform make it particularly important for small institutions, including educational and research groups.
Evaluation of a simple method for the automatic assignment of MeSH descriptors to health resources in a French online catalogue.

PubMed

Névéol, Aurélie; Pereira, Suzanne; Kerdelhué, Gaetan; Dahamna, Badisse; Joubert, Michel; Darmoni, Stéfan J

2007-01-01

The growing number of resources to be indexed in the catalogue of online health resources in French (CISMeF) calls for curating strategies involving automatic indexing tools while maintaining the catalogue's high indexing quality standards. To develop a simple automatic tool that retrieves MeSH descriptors from documents titles. In parallel to research on advanced indexing methods, a bag-of-words tool was developed for timely inclusion in CISMeF's maintenance system. An evaluation was carried out on a corpus of 99 documents. The indexing sets retrieved by the automatic tool were compared to manual indexing based on the title and on the full text of resources. 58% of the major main headings were retrieved by the bag-of-words algorithm and the precision on main heading retrieval was 69%. Bag-of-words indexing has effectively been used on selected resources to be included in CISMeF since August 2006. Meanwhile, on going work aims at improving the current version of the tool.
Design of a graphical user interface for an intelligent multimedia information system for radiology research

NASA Astrophysics Data System (ADS)

Taira, Ricky K.; Wong, Clement; Johnson, David; Bhushan, Vikas; Rivera, Monica; Huang, Lu J.; Aberle, Denise R.; Cardenas, Alfonso F.; Chu, Wesley W.

1995-05-01

With the increase in the volume and distribution of images and text available in PACS and medical electronic health-care environments it becomes increasingly important to maintain indexes that summarize the content of these multi-media documents. Such indices are necessary to quickly locate relevant patient cases for research, patient management, and teaching. The goal of this project is to develop an intelligent document retrieval system that allows researchers to request for patient cases based on document content. Thus we wish to retrieve patient cases from electronic information archives that could include a combined specification of patient demographics, low level radiologic findings (size, shape, number), intermediate-level radiologic findings (e.g., atelectasis, infiltrates, etc.) and/or high-level pathology constraints (e.g., well-differentiated small cell carcinoma). The cases could be distributed among multiple heterogeneous databases such as PACS, RIS, and HIS. Content- based retrieval systems go beyond the capabilities of simple key-word or string-based retrieval matching systems. These systems require a knowledge base to comprehend the generality/specificity of a concept (thus knowing the subclasses or related concepts to a given concept) and knowledge of the various string representations for each concept (i.e., synonyms, lexical variants, etc.). We have previously reported on a data integration mediation layer that allows transparent access to multiple heterogeneous distributed medical databases (HIS, RIS, and PACS). The data access layer of our architecture currently has limited query processing capabilities. Given a patient hospital identification number, the access mediation layer collects all documents in RIS and HIS and returns this information to a specified workstation location. In this paper we report on our efforts to extend the query processing capabilities of the system by creation of custom query interfaces, an intelligent query processing engine, and a document-content index that can be generated automatically (i.e., no manual authoring or changes to the normal clinical protocols).
'SON-GO-KU' : a dream of automated library

NASA Astrophysics Data System (ADS)

Sato, Mamoru; Kishimoto, Juji

In the process of automating libraries, the retrieval of books through the browsing of shelves is being overlooked. The telematic library is a document based DBMS which can deliver the content of books by simulating the browsing process. The retrieval actually simulates the process a person would use in selecting a book in a real library, where a visual presentation using a graphic display is substituted. The characteristics of prototype system "Son-Go-Ku" for such retrieval implemented in 1988 are mentioned.
FUB at TREC 2008 Relevance Feedback Track: Extending Rocchio with Distributional Term Analysis

DTIC Science & Technology

2008-11-01

starting point is the improved version [ Salton and Buckley 1990] of the original Rocchio’s formula [Rocchio 1971]: newQ = α ⋅ origQ + β R r r∈R ∑ − γR...earlier studies about the low effect of the main relevance feedback parameters on retrieval performance (e.g., Salton and Buckley 1990), while they seem...Relevance feedback in information retrieval. In The SMART retrieval system - experiments in automatic document processing, Salton , G., Ed., Prentice Hall
A simple procedure for retrieval of a cement-retained implant-supported crown: a case report.

PubMed

Buzayan, Muaiyed Mahmoud; Mahmood, Wan Adida; Yunus, Norsiah Binti

2014-02-01

Retrieval of cement-retained implant prostheses can be more demanding than retrieval of screw-retained prostheses. This case report describes a simple and predictable procedure to locate the abutment screw access openings of cementretained implant-supported crowns in cases of fractured ceramic veneer. A conventional periapical radiography image was captured using a digital camera, transferred to a computer, and manipulated using Microsoft Word document software to estimate the location of the abutment screw access.
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

DTIC Science & Technology

2003-02-01

FEB 2003 2. REPORT TYPE 3. DATES COVERED 00-00-2003 to 00-00-2003 4. TITLE AND SUBTITLE The Effect of Bilingual Term List Size on Dictionary ...298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina...are extensively used as a resource for dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written
Machine Translation-Supported Cross-Language Information Retrieval for a Consumer Health Resource

PubMed Central

Rosemblat, Graciela; Gemoets, Darren; Browne, Allen C.; Tse, Tony

2003-01-01

The U.S. National Institutes of Health, through its National Library of Medicine, developed ClinicalTrials.gov to provide the public with easy access to information on clinical trials on a wide range of conditions or diseases. Only English language information retrieval is currently supported. Given the growing number of Spanish speakers in the U.S. and their increasing use of the Web, we anticipate a significant increase in Spanish-speaking users. This study compares the effectiveness of two common cross-language information retrieval methods using machine translation, query translation versus document translation, using a subset of genuine user queries from ClinicalTrials.gov. Preliminary results conducted with the ClinicalTrials.gov search engine show that in our environment, query translation is statistically significantly better than document translation. We discuss possible reasons for this result and we conclude with suggestions for future work. PMID:14728236

An XML-based system for the flexible classification and retrieval of clinical practice guidelines.

PubMed Central

Ganslandt, T.; Mueller, M. L.; Krieglstein, C. F.; Senninger, N.; Prokosch, H. U.

2002-01-01

Beneficial effects of clinical practice guidelines (CPGs) have not yet reached expectations due to limited routine adoption. Electronic distribution and reminder systems have the potential to overcome implementation barriers. Existing electronic CPG repositories like the National Guideline Clearinghouse (NGC) provide individual access but lack standardized computer-readable interfaces necessary for automated guideline retrieval. The aim of this paper was to facilitate automated context-based selection and presentation of CPGs. Using attributes from the NGC classification scheme, an XML-based metadata repository was successfully implemented, providing document storage, classification and retrieval functionality. Semi-automated extraction of attributes was implemented for the import of XML guideline documents using XPath. A hospital information system interface was exemplarily implemented for diagnosis-based guideline invocation. Limitations of the implemented system are discussed and possible future work is outlined. Integration of standardized computer-readable search interfaces into existing CPG repositories is proposed. PMID:12463831
Analysis of PubMed User Sessions Using a Full-Day PubMed Query Log: A Comparison of Experienced and Nonexperienced PubMed Users

PubMed Central

2015-01-01

Background PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users’ knowledge of information retrieval (IR) functions and their usage in query formulation. Objective The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. Methods To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. Results To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Conclusions Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length. PMID:26139516
Hierarchic Agglomerative Clustering Methods for Automatic Document Classification.

ERIC Educational Resources Information Center

Griffiths, Alan; And Others

1984-01-01

Considers classifications produced by application of single linkage, complete linkage, group average, and word clustering methods to Keen and Cranfield document test collections, and studies structure of hierarchies produced, extent to which methods distort input similarity matrices during classification generation, and retrieval effectiveness…
Get It Right First Time: A Beginner's Guide to Document Management.

ERIC Educational Resources Information Center

Hayes, Mike

1997-01-01

Document management (DM) systems capture, store, index, retrieve, route, distribute, and archive information in organizations. Discusses "passive" electronic libraries and "active" systems; characteristics of effective systems; implementing a system; fitting a new system to an existing infrastructure; budgets; system…
Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval

NASA Astrophysics Data System (ADS)

Zhang, Hui; Wang, Deqing; Wu, Wenjun; Hu, Hongping

2012-11-01

In today's business environment, enterprises are increasingly under pressure to process the vast amount of data produced everyday within enterprises. One method is to focus on the business intelligence (BI) applications and increasing the commercial added-value through such business analytics activities. Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital task in enterprise Information Retrieval (IR), text categorisation, text analytics, etc. When determining term weight in a document, the traditional TF-IDF scheme sets weight value for the term considering only its occurrence frequency within the document and in the entire set of documents, which leads to some meaningful terms that cannot get the appropriate weight. In this article, we propose a new term weighting scheme called Term Frequency - Function of Document Frequency (TF-FDF) to address this issue. Instead of using monotonically decreasing function such as Inverse Document Frequency, FDF presents a convex function that dynamically adjusts weights according to the significance of the words in a document set. This function can be manually tuned based on the distribution of the most meaningful words which semantically represent the document set. Our experiments show that the TF-FDF can achieve higher value of Normalised Discounted Cumulative Gain in IR than that of TF-IDF and its variants, and improving the accuracy of relevance ranking of the IR results.
Construction of Weak and Strong Similarity Measures for Ordered Sets of Documents Using Fuzzy Set Techniques.

ERIC Educational Resources Information Center

Egghe, L.; Michel, C.

2003-01-01

Ordered sets (OS) of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents need to be extended to these ordered sets. This is done in this article using fuzzy set techniques. The practical usability of the OS-measures is…
SDC DOCUMENTS APPLICABLE TO STATE AND LOCAL GOVERNMENT PROBLEMS.

DTIC Science & Technology

Public administration , Urban and regional planning, The administration of justice, Bio-medical systems, Educational systems, Computer program systems, The development and management of computer-based systems, Information retrieval, Simulation. AD numbers are provided for those documents which can be obtained from the Defense Documentation Center or the Department of Commerce’s Clearinghouse for Federal Scientific and Technical Information.
Automatic generation of stop word lists for information retrieval and analysis

DOEpatents

Rose, Stuart J

2013-01-08

Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
Health consumer-oriented information retrieval.

PubMed

Claveau, Vincent; Hamon, Thierry; Le Maguer, Sébastien; Grabar, Natalia

2015-01-01

While patients can freely access their Electronic Health Records or online health information, they may not be able to correctly understand the content of these documents. One of the challenges is related to the difference between expert and non-expert languages. We propose to investigate this issue within the Information Retrieval field. The patient queries have to be associated with the corresponding expert documents, that provide trustworthy information. Our approach relies on a state-of-the-art IR system called Indri and on semantic resources. Different query expansion strategies are explored. Our system shows up to 0.6740 P@10, up to 0.7610 R@10, and up to 0.6793 NDCG@10.
Information Retrieval and Text Mining Technologies for Chemistry.

PubMed

Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

2017-06-28

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Videofile for Law Enforcement

NASA Technical Reports Server (NTRS)

1977-01-01

Components of a videotape storage and retrieval system originally developed for NASA have been adapted as a tool for law enforcement agencies. Ampex Corp., Redwood City, Cal., built a unique system for NASA-Marshall. The first application of professional broadcast technology to computerized record-keeping, it incorporates new equipment for transporting tapes within the system. After completing the NASA system, Ampex continued development, primarily to improve image resolution. The resulting advanced system, known as the Ampex Videofile, offers advantages over microfilm for filing, storing, retrieving, and distributing large volumes of information. The system's computer stores information in digital code rather than in pictorial form. While microfilm allows visual storage of whole documents, it requires a step before usage--developing the film. With Videofile, the actual document is recorded, complete with photos and graphic material, and a picture of the document is available instantly.
Interpolation of the Extended Boolean Retrieval Model.

ERIC Educational Resources Information Center

Zanger, Daniel Z.

2002-01-01

Presents an interpolation theorem for an extended Boolean information retrieval model. Results show that whenever two or more documents are similarly ranked at any two points for a query containing exactly two terms, then they are similarly ranked at all points in between; and that results can fail for queries with more than two terms. (Author/LRW)
Evaluation of atmospheric density models and preliminary functional specifications for the Langley Atmospheric Information Retrieval System (LAIRS)

NASA Technical Reports Server (NTRS)

Lee, T.; Boland, D. F., Jr.

1980-01-01

This document presents the results of an extensive survey and comparative evaluation of current atmosphere and wind models for inclusion in the Langley Atmospheric Information Retrieval System (LAIRS). It includes recommended models for use in LAIRS, estimated accuracies for the recommended models, and functional specifications for the development of LAIRS.
Kellogg Library and Archive Retrieval System (KLARS) Document Capture Manual. Draft Version.

ERIC Educational Resources Information Center

Hugo, Jane

This manual is designed to supply background information for Kellogg Library and Archive Retrieval System (KLARS) processors and others who might work with the system, outline detailed policies and procedures for processors who prepare and enter data into the adult education database on KLARS, and inform general readers about the system. KLARS is…
What Friends Are For: Collaborative Intelligence Analysis and Search

DTIC Science & Technology

2014-06-01

14. SUBJECT TERMS Intelligence Community, information retrieval, recommender systems , search engines, social networks, user profiling, Lucene...improvements over existing search systems . The improvements are shown to be robust to high levels of human error and low similarity between users ...precision NOLH nearly orthogonal Latin hypercubes P@ precision at documents RS recommender systems TREC Text REtrieval Conference USM user
A STORAGE AND RETRIEVAL SYSTEM FOR DOCUMENTS IN INSTRUCTIONAL RESOURCES. REPORT NO. 13.

ERIC Educational Resources Information Center

DIAMOND, ROBERT M.; LEE, BERTA GRATTAN

IN ORDER TO IMPROVE INSTRUCTION WITHIN TWO-YEAR LOWER DIVISION COURSES, A COMPREHENSIVE RESOURCE LIBRARY WAS DEVELOPED AND A SIMPLIFIED CATALOGING AND INFORMATION RETRIEVAL SYSTEM WAS APPLIED TO IT. THE ROYAL MCBEE "KEYDEX" SYSTEM, CONTAINING THREE MAJOR COMPONENTS--A PUNCH MACHINE, FILE CARDS, AND A LIGHT BOX--WAS USED. CARDS WERE HEADED WITH KEY…
Mixed-Handedness Advantages in Episodic Memory Obtained under Conditions of Intentional Learning Extend to Incidental Learning

ERIC Educational Resources Information Center

Christman, Stephen D.; Butler, Michael

2011-01-01

The existence of handedness differences in the retrieval of episodic memories is well-documented, but virtually all have been obtained under conditions of intentional learning. Two experiments are reported that extend the presence of such handedness differences to memory retrieval under conditions of incidental learning. Experiment 1 used Craik…
A Survey in Indexing and Searching XML Documents.

ERIC Educational Resources Information Center

Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James

2002-01-01

Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…
Photogrammetry for Archaeology: Collecting Pieces Together

NASA Astrophysics Data System (ADS)

Chibunichev, A. G.; Knyaz, V. A.; Zhuravlev, D. V.; Kurkov, V. M.

2018-05-01

The complexity of retrieving and understanding the archaeological data requires to apply different techniques, tools and sensors for information gathering, processing and documenting. Archaeological research now has the interdisciplinary nature involving technologies based on different physical principles for retrieving information about archaeological findings. The important part of archaeological data is visual and spatial information which allows reconstructing the appearance of the findings and relation between them. Photogrammetry has a great potential for accurate acquiring of spatial and visual data of different scale and resolution allowing to create archaeological documents of new type and quality. The aim of the presented study is to develop an approach for creating new forms of archaeological documents, a pipeline for their producing and collecting in one holistic model, describing an archaeological site. A set of techniques is developed for acquiring and integration of spatial and visual data of different level of details. The application of the developed techniques is demonstrated for documenting of Bosporus archaeological expedition of Russian State Historical Museum.
Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies.

PubMed

Dinh, Duy; Tamine, Lynda; Boubekeur, Fatiha

2013-02-01

The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p<0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance. Copyright © 2012 Elsevier B.V. All rights reserved.

An Investigation of Document Partitions.

ERIC Educational Resources Information Center

Shaw, W. M., Jr.

1986-01-01

Empirical significance of document partitions is investigated as a function of index term-weight and similarity thresholds. Results show the same empirically preferred partitions can be detected by two independent strategies: an analysis of cluster-based retrieval analysis and an analysis of regularities in the underlying structure of the document…
Storing and Viewing Electronic Documents.

ERIC Educational Resources Information Center

Falk, Howard

1999-01-01

Discusses the conversion of fragile library materials to computer storage and retrieval to extend the life of the items and to improve accessibility through the World Wide Web. Highlights include entering the images, including scanning; optical character recognition; full text and manual indexing; and available document- and image-management…
Imaged document information location and extraction using an optical correlator

NASA Astrophysics Data System (ADS)

Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.

1999-12-01

Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). Many of these organizations are converting their paper archives to electronic images, which are then stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources and provide for rapid access to the information contained within these imaged documents. To meet this need, Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provide a means for the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and has the potential to determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.
Repository of not readily available documents for project W-320

DOE Office of Scientific and Technical Information (OSTI.GOV)

Conner, J.C.

1997-04-18

The purpose of this document is to provide a readily available source of the technical reports needed for the development of the safety documentation provided for the waste retrieval sluicing system (WRSS), designed to remove the radioactive and chemical sludge from tank 241-C-106, and transport that material to double-shell tank 241-AY-102 via a new, temporary, shielded, encased transfer line.
Batching alternatives for Phase I retrieval wastes to be processed in WRAP Module 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayancsik, B.A.

1994-10-13

During the next two decades, the transuranic (TRU) waste now stored in the 200 Area burial trenches and storage buildings is to be retrieved, processed in the Waste Receiving and Processing (WRAP) Module 1 facility, and shipped to a final disposal facility. The purpose of this document is to identify the criteria that can be used to batch suspect TRU waste, currently in retrievable storage, for processing through the WRAP Module 1 facility. These criteria are then used to generate a batch plan for Phase 1 Retrieval operations, which will retrieve the waste located in Trench 4C-04 of the 200more » West Area burial ground. The reasons for batching wastes for processing in WRAP Module 1 include reducing the exposure of workers and the environment to hazardous material and ionizing radiation; maximizing the efficiency of the retrieval, processing, and disposal processes by reducing costs, time, and space throughout the process; reducing analytical sampling and analysis; and reducing the amount of cleanup and decontamination between process runs. The criteria selected for batching the drums of retrieved waste entering WRAP Module 1 are based on the available records for the wastes sent to storage as well as knowledge of the processes that generated these wastes. The batching criteria identified in this document include the following: waste generator; type of process used to generate or package the waste; physical waste form; content of hazardous/dangerous chemicals in the waste; radiochemical type and quantity of waste; drum weight; and special waste types. These criteria were applied to the waste drums currently stored in Trench 4C-04. At least one batching scheme is shown for each of the criteria listed above.« less
Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches

PubMed Central

Wang, Yanshan; Shen, Feichen; Liu, Sijia; Rastegar-Mojarad, Majid; Wang, Liwei

2017-01-01

Background Self-management is crucial to diabetes care and providing expert-vetted content for answering patients’ questions is crucial in facilitating patient self-management. Objective The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. Methods We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. Results We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. Conclusions This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients’ questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums. PMID:29038097
Automated MeSH indexing of the World-Wide Web.

PubMed Central

Fowler, J.; Kouramajian, V.; Maram, S.; Devadhar, V.

1995-01-01

To facilitate networked discovery and information retrieval in the biomedical domain, we have designed a system for automatic assignment of Medical Subject Headings to documents retrieved from the World-Wide Web. Our prototype implementations show significant promise. We describe our methods and discuss the further development of a completely automated indexing tool called the "Web-MeSH Medibot." PMID:8563421
An Index to All "Query" Computer Searches Completed from July 1973 to June 1974. Search Number 0403-0619. Information Series No. 24.

ERIC Educational Resources Information Center

Wilder, Dolores J., Comp.; Hines, Rella, Comp.

The Tennessee Research Coordinating Unit (RCU) has implemented a computerized information retrieval system known as "Query," which allows for the retrieval of documents indexed in Research in Education (RIE), Current Index to Journals in Education (CIJE), and Abstracts of Instructional and Research Materials (AIM/ARM). The document…
Automatic Processing of Metallurgical Abstracts for the Purpose of Information Retrieval. Final Report.

ERIC Educational Resources Information Center

Melton, Jessica S.

Objectives of this project were to develop and test a method for automatically processing the text of abstracts for a document retrieval system. The test corpus consisted of 768 abstracts from the metallurgical section of Chemical Abstracts (CA). The system, based on a subject indexing rational, had two components: (1) a stored dictionary of words…
LDEF: 69 Months in Space. Second Post-Retrieval Symposium, part 2

NASA Technical Reports Server (NTRS)

Levine, Arlene S. (Editor)

1993-01-01

This document is a compilation of papers presented at the Second Long Duration Exposure Facility (LDEF) Post-Retrieval Symposium. The papers represent the data analysis of the 57 experiments flown on the LDEF. The experiments include materials, coatings, thermal systems, power and propulsion, science (cosmic ray, interstellar gas, heavy ions, micrometeoroid, etc.), electronics, optics, and life science.
HARV ANSER Flight Test Data Retrieval and Processing Procedures

NASA Technical Reports Server (NTRS)

Yeager, Jessie C.

1997-01-01

Under the NASA High-Alpha Technology Program the High Alpha Research Vehicle (HARV) was used to conduct flight tests of advanced control effectors, advanced control laws, and high-alpha design guidelines for future super-maneuverable fighters. The High-Alpha Research Vehicle is a pre-production F/A-18 airplane modified with a multi-axis thrust-vectoring system for augmented pitch and yaw control power and Actuated Nose Strakes for Enhanced Rolling (ANSER) to augment body-axis yaw control power. Flight testing at the Dryden Flight Research Center (DFRC) began in July 1995 and continued until May 1996. Flight data will be utilized to evaluate control law performance and aircraft dynamics, determine aircraft control and stability derivatives using parameter identification techniques, and validate design guidelines. To accomplish these purposes, essential flight data parameters were retrieved from the DFRC data system and stored on the Dynamics and Control Branch (DCB) computer complex at Langley. This report describes the multi-step task used to retrieve and process this data and documents the results of these tasks. Documentation includes software listings, flight information, maneuver information, time intervals for which data were retrieved, lists of data parameters and definitions, and example data plots.
MPEG-7 audio-visual indexing test-bed for video retrieval

NASA Astrophysics Data System (ADS)

Gagnon, Langis; Foucher, Samuel; Gouaillier, Valerie; Brun, Christelle; Brousseau, Julie; Boulianne, Gilles; Osterrath, Frederic; Chapdelaine, Claude; Dutrisac, Julie; St-Onge, Francis; Champagne, Benoit; Lu, Xiaojian

2003-12-01

This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
Operations and maintenance philosophy

DOE Office of Scientific and Technical Information (OSTI.GOV)

DUNCAN, G.P.

1999-10-28

This Operations and Maintenance (O&M) Philosophy document is intended to establish a future O&M vision, with an increased focus on minimizing worker exposure, ensuring uninterrupted retrieval operations, and minimizing operation life-cycle cost. It is intended that this document would incorporate O&M lessons learned into on-going and future project upgrades.
Can Visualizing Document Space Improve Users' Information Foraging?

ERIC Educational Resources Information Center

Song, Min

1998-01-01

This study shows how users access relevant information in a visualized document space and determine whether BiblioMapper, a visualization tool, strengthens an information retrieval (IR) system and makes it more usable. BiblioMapper, developed for a CISI collection, was evaluated by accuracy, time, and user satisfaction. Users' navigation…
A Synchronous Search for Documents

DTIC Science & Technology

An algorithm is described of a synchronous search in a complex system of selective retrieval of documents, with an allowance for exclusion of...stored on a magnetic tape. The number of topics served by the synchronous search goes into thousands; a search within 500-600 topics is performed without additional access to the tape.
Automatic Term Class Construction Using Relevance--A Summary of Work in Automatic Pseudoclassification.

ERIC Educational Resources Information Center

Salton, G.

1980-01-01

Summarizes studies of pseudoclassification, a process of utilizing user relevance assessments of certain documents with respect to certain queries to build term classes designed to retrieve relevant documents. Conclusions are reached concerning the effectiveness and feasibility of constructing term classifications based on human relevance…
Electronic Document Management Systems: Where Are They Today?

ERIC Educational Resources Information Center

Koulopoulos, Thomas M.; Frappaolo, Carl

1993-01-01

Discusses developments in document management systems based on a survey of over 400 corporations and government agencies. Text retrieval and imaging markets, architecture and integration, purchasing plans, and vendor market leaders are covered. Five graphs present data on user preferences for improvements. A sidebar article reviews the development…
Closeup of LDEF experiment trays documented during STS-32 photo survey

NASA Image and Video Library

1990-01-20

Closeup of Long Duration Exposure Facility (LDEF) experiment trays is documented during STS-32 retrieval activity and photo survey conducted by crewmembers onboard Columbia, Orbiter Vehicle (OV) 102. Partially visible is the Polymer Matrix Composite Materials Experiment. In the background is the surface of the Earth.
IFLA General Conference, 1986. Pre-Session Seminar, Kanasawa. Papers.

ERIC Educational Resources Information Center

International Federation of Library Associations and Institutions, The Hague (Netherlands).

The two papers in this document were presented at a pre-session held before the IFLA general conference in 1986. In "Problems of Document Delivery in the Science and Technology Information Environment--An African View," Lucilda Hunter (Sierra Leone) discusses typical difficulties encountered in the process of information retrieval in…
Leveraging Terminologies for Retrieval of Radiology Reports with Critical Imaging Findings

PubMed Central

Warden, Graham I.; Lacson, Ronilda; Khorasani, Ramin

2011-01-01

Introduction: Communication of critical imaging findings is an important component of medical quality and safety. A fundamental challenge includes retrieval of radiology reports that contain these findings. This study describes the expressiveness and coverage of existing medical terminologies for critical imaging findings and evaluates radiology report retrieval using each terminology. Methods: Four terminologies were evaluated: National Cancer Institute Thesaurus (NCIT), Radiology Lexicon (RadLex), Systemized Nomenclature of Medicine (SNOMED-CT), and International Classification of Diseases (ICD-9-CM). Concepts in each terminology were identified for 10 critical imaging findings. Three findings were subsequently selected to evaluate document retrieval. Results: SNOMED-CT consistently demonstrated the highest number of overall terms (mean=22) for each of ten critical findings. However, retrieval rate and precision varied between terminologies for the three findings evaluated. Conclusion: No single terminology is optimal for retrieving radiology reports with critical findings. The expressiveness of a terminology does not consistently correlate with radiology report retrieval. PMID:22195212

Natural brain-information interfaces: Recommending information by relevance inferred from human brain signals

PubMed Central

Eugster, Manuel J. A.; Ruotsalo, Tuukka; Spapé, Michiel M.; Barral, Oswald; Ravaja, Niklas; Jacucci, Giulio; Kaski, Samuel

2016-01-01

Finding relevant information from large document collections such as the World Wide Web is a common task in our daily lives. Estimation of a user’s interest or search intention is necessary to recommend and retrieve relevant information from these collections. We introduce a brain-information interface used for recommending information by relevance inferred directly from brain signals. In experiments, participants were asked to read Wikipedia documents about a selection of topics while their EEG was recorded. Based on the prediction of word relevance, the individual’s search intent was modeled and successfully used for retrieving new relevant documents from the whole English Wikipedia corpus. The results show that the users’ interests toward digital content can be modeled from the brain signals evoked by reading. The introduced brain-relevance paradigm enables the recommendation of information without any explicit user interaction and may be applied across diverse information-intensive applications. PMID:27929077
Essie: A Concept-based Search Engine for Structured Biomedical Text

PubMed Central

Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina

2007-01-01

This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729
Natural brain-information interfaces: Recommending information by relevance inferred from human brain signals

NASA Astrophysics Data System (ADS)

Eugster, Manuel J. A.; Ruotsalo, Tuukka; Spapé, Michiel M.; Barral, Oswald; Ravaja, Niklas; Jacucci, Giulio; Kaski, Samuel

2016-12-01

Finding relevant information from large document collections such as the World Wide Web is a common task in our daily lives. Estimation of a user’s interest or search intention is necessary to recommend and retrieve relevant information from these collections. We introduce a brain-information interface used for recommending information by relevance inferred directly from brain signals. In experiments, participants were asked to read Wikipedia documents about a selection of topics while their EEG was recorded. Based on the prediction of word relevance, the individual’s search intent was modeled and successfully used for retrieving new relevant documents from the whole English Wikipedia corpus. The results show that the users’ interests toward digital content can be modeled from the brain signals evoked by reading. The introduced brain-relevance paradigm enables the recommendation of information without any explicit user interaction and may be applied across diverse information-intensive applications.
A Semantic Approach for Geospatial Information Extraction from Unstructured Documents

NASA Astrophysics Data System (ADS)

Sallaberry, Christian; Gaio, Mauro; Lesbegueries, Julien; Loustau, Pierre

Local cultural heritage document collections are characterized by their content, which is strongly attached to a territory and its land history (i.e., geographical references). Our contribution aims at making the content retrieval process more efficient whenever a query includes geographic criteria. We propose a core model for a formal representation of geographic information. It takes into account characteristics of different modes of expression, such as written language, captures of drawings, maps, photographs, etc. We have developed a prototype that fully implements geographic information extraction (IE) and geographic information retrieval (IR) processes. All PIV prototype processing resources are designed as Web Services. We propose a geographic IE process based on semantic treatment as a supplement to classical IE approaches. We implement geographic IR by using intersection computing algorithms that seek out any intersection between formal geocoded representations of geographic information in a user query and similar representations in document collection indexes.
Personal Information Management for Nurses Returning to School.

PubMed

Bowman, Katherine

2015-12-01

Registered nurses with a diploma or an associate's degree are encouraged to return to school to earn a Bachelor of Science in Nursing degree. Until they return to school, many RNs have little need to regularly write, store, and retrieve work-related papers, but they are expected to complete the majority of assignments using a computer when in the student role. Personal information management (PIM) is a system of organizing and managing electronic information that will reduce computer clutter, while enhancing time use, task management, and productivity. This article introduces three PIM strategies for managing school work. Nesting is the creation of a system of folders to form a hierarchy for storing and retrieving electronic documents. Each folder, subfolder, and document must be given a meaningful unique name. Numbering is used to create different versions of the same paper, while preserving the original document. Copyright 2015, SLACK Incorporated.
Web document ranking via active learning and kernel principal component analysis

NASA Astrophysics Data System (ADS)

Cai, Fei; Chen, Honghui; Shu, Zhen

2015-09-01

Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.
Automated documentation generator for advanced protein crystal growth

NASA Technical Reports Server (NTRS)

Maddux, Gary A.; Provancha, Anna; Chattam, David

1994-01-01

To achieve an environment less dependent on the flow of paper, automated techniques of data storage and retrieval must be utilized. This software system, 'Automated Payload Experiment Tool,' seeks to provide a knowledge-based, hypertext environment for the development of NASA documentation. Once developed, the final system should be able to guide a Principal Investigator through the documentation process in a more timely and efficient manner, while supplying more accurate information to the NASA payload developer. The current system is designed for the development of the Science Requirements Document (SRD), the Experiment Requirements Document (ERD), the Project Plan, and the Safety Requirements Document.
Semantic concept-enriched dependence model for medical information retrieval.

PubMed

Choi, Sungbin; Choi, Jinwook; Yoo, Sooyoung; Kim, Heechun; Lee, Youngho

2014-02-01

In medical information retrieval research, semantic resources have been mostly used by expanding the original query terms or estimating the concept importance weight. However, implicit term-dependency information contained in semantic concept terms has been overlooked or at least underused in most previous studies. In this study, we incorporate a semantic concept-based term-dependence feature into a formal retrieval model to improve its ranking performance. Standardized medical concept terms used by medical professionals were assumed to have implicit dependency within the same concept. We hypothesized that, by elaborately revising the ranking algorithms to favor documents that preserve those implicit dependencies, the ranking performance could be improved. The implicit dependence features are harvested from the original query using MetaMap. These semantic concept-based dependence features were incorporated into a semantic concept-enriched dependence model (SCDM). We designed four different variants of the model, with each variant having distinct characteristics in the feature formulation method. We performed leave-one-out cross validations on both a clinical document corpus (TREC Medical records track) and a medical literature corpus (OHSUMED), which are representative test collections in medical information retrieval research. Our semantic concept-enriched dependence model consistently outperformed other state-of-the-art retrieval methods. Analysis shows that the performance gain has occurred independently of the concept's explicit importance in the query. By capturing implicit knowledge with regard to the query term relationships and incorporating them into a ranking model, we could build a more robust and effective retrieval model, independent of the concept importance. Copyright © 2013 Elsevier Inc. All rights reserved.
AdaNET prototype library administration manual

NASA Technical Reports Server (NTRS)

Hanley, Lionel

1989-01-01

The functions of the AdaNET Prototype Library of Reusable Software Parts is described. Adopted from the Navy Research Laboratory's Reusability Guidebook (V.5.0), this is a working document, customized for use the the AdaNET Project. Within this document, the term part is used to denote the smallest unit controlled by a library and retrievable from it. A part may have several constituents, which may not be individually tracked. Presented are the types of parts which may be stored in the library and the relationships among those parts; a concept of trust indicators which provide measures of confidence that a user of a previously developed part may reasonably apply to a part for a new application; search and retrieval, configuration management, and communications among those who interact with the AdaNET Prototype Library; and the AdaNET Prototype, described from the perspective of its three major users: the part reuser and retriever, the part submitter, and the librarian and/or administrator.
Biomedical information retrieval across languages.

PubMed

Daumke, Philipp; Markü, Kornél; Poprat, Michael; Schulz, Stefan; Klar, Rüdiger

2007-06-01

This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.
New public dataset for spotting patterns in medieval document images

NASA Astrophysics Data System (ADS)

En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

2017-01-01

With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.
Worldwide research productivity on tramadol: a bibliometric analysis.

PubMed

Sweileh, Waleed M; Shraim, Naser Y; Zyoud, Sa'ed H; Al-Jabi, Samah W

2016-01-01

Pain management and safe use of analgesics is an important medical issue. Tramadol is an old analgesic with controversial properties. Evaluation of worldwide scientific output on tramadol has not been explored. Therefore, the main objective of this study was to give a bibliometric overview of global research productivity on tramadol. SciVerse Scopus was used to retrieve and quantitatively and qualitatively analyze worldwide publications on tramadol. A total of 2059 original and review research articles on tramadol were retrieved from Scopus. Forty-six documents (2.23 %) were published in Anesthesia and Analgesia Journal whereas 30 (1.46 %) were published in Arzneimittel Forschung Drug Research Journal. Retrieved tramadol documents were published from 71 countries and appeared in 160 peer reviewed journals. Although the United States of America (259; 12.86 %) had the largest contribution to tramadol publications; the contribution by other countries like Turkey (232; 11.27) India (189; 8.09 %) and Germany (176; 8.56 % was not far away from that of USA. The most productive institution was Grunenthal, Germany (47; 2.28 %) followed by Tehran University of Medical Sciences, Iran (29; 1.41 %), and, Ortho-McNeil Pharmaceutical Incorporated, USA (25; 1.21 %). Of the 2059 documents, there were 370 documents about dependence. The leading institution in documents pertaining to tramadol dependence was Grunenthal GmbH (18; 4.86 %) followed by Ortho-McNeil Pharmaceutical Incorporated (17; 4.59 %). The current study showed that there is an obvious interest in tramadol research. More efforts are needed to clarify the abuse potential and safety profile of tramadol to help in determining the legal status of tramadol. Collaboration among pharmaceutical industry, clinical researchers and academic institutions can improve research quantity and quality on tramadol.
AVIRIS Reflectance Retrievals: UCSB Users Manual. Appendix 1

NASA Technical Reports Server (NTRS)

Roberts, Dar A.; Prentiss, Dylan

2001-01-01

The following write-up is designed to help students and researchers take Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) radiance data and retrieve surface reflectance. In the event that the software is not available, but a user has access to a reflectance product, this document is designed to provide a better understanding of how AVIRIS reflectance was retrieved. This guide assumes that the reader has both a basic understanding of the UNIX computing environment, and that of spectroscopy. Knowledge of the Interactive Data Language (IDL) and the Environment for Visualizing Images (ENVI) is helpful. This is a working document, and many of the fine details described in the following pages have been previously undocumented. After having read this document the reader should be able to process AVIRIS to reflectance, provided access to all of the code is possible. The AVIRIS radiance data itself is pre-processed at the Jet Propulsion Laboratory (JPL) in Pasadena, California. The first section of this paper describes how to read data from tape and byte-swap the data. Section 2 describes the procedure in preparing support files before running the 'h2o' suite of programs. Section 3 describes the four programs used in the process, h2olut9.f, h2ospl9.f, vlsfit9.f and rfl9.f.
Using the web to validate document recognition results: experiments with business cards

NASA Astrophysics Data System (ADS)

Oertel, Clemens; O'Shea, Shauna; Bodnar, Adam; Blostein, Dorothea

2004-12-01

The World Wide Web is a vast information resource which can be useful for validating the results produced by document recognizers. Three computational steps are involved, all of them challenging: (1) use the recognition results in a Web search to retrieve Web pages that contain information similar to that in the document, (2) identify the relevant portions of the retrieved Web pages, and (3) analyze these relevant portions to determine what corrections (if any) should be made to the recognition result. We have conducted exploratory implementations of steps (1) and (2) in the business-card domain: we use fields of the business card to retrieve Web pages and identify the most relevant portions of those Web pages. In some cases, this information appears suitable for correcting OCR errors in the business card fields. In other cases, the approach fails due to stale information: when business cards are several years old and the business-card holder has changed jobs, then websites (such as the home page or company website) no longer contain information matching that on the business card. Our exploratory results indicate that in some domains it may be possible to develop effective means of querying the Web with recognition results, and to use this information to correct the recognition results and/or detect that the information is stale.
Using the web to validate document recognition results: experiments with business cards

NASA Astrophysics Data System (ADS)

Oertel, Clemens; O'Shea, Shauna; Bodnar, Adam; Blostein, Dorothea

2005-01-01

The World Wide Web is a vast information resource which can be useful for validating the results produced by document recognizers. Three computational steps are involved, all of them challenging: (1) use the recognition results in a Web search to retrieve Web pages that contain information similar to that in the document, (2) identify the relevant portions of the retrieved Web pages, and (3) analyze these relevant portions to determine what corrections (if any) should be made to the recognition result. We have conducted exploratory implementations of steps (1) and (2) in the business-card domain: we use fields of the business card to retrieve Web pages and identify the most relevant portions of those Web pages. In some cases, this information appears suitable for correcting OCR errors in the business card fields. In other cases, the approach fails due to stale information: when business cards are several years old and the business-card holder has changed jobs, then websites (such as the home page or company website) no longer contain information matching that on the business card. Our exploratory results indicate that in some domains it may be possible to develop effective means of querying the Web with recognition results, and to use this information to correct the recognition results and/or detect that the information is stale.
Enabling search over encrypted multimedia databases

NASA Astrophysics Data System (ADS)

Lu, Wenjun; Swaminathan, Ashwin; Varna, Avinash L.; Wu, Min

2009-02-01

Performing information retrieval tasks while preserving data confidentiality is a desirable capability when a database is stored on a server maintained by a third-party service provider. This paper addresses the problem of enabling content-based retrieval over encrypted multimedia databases. Search indexes, along with multimedia documents, are first encrypted by the content owner and then stored onto the server. Through jointly applying cryptographic techniques, such as order preserving encryption and randomized hash functions, with image processing and information retrieval techniques, secure indexing schemes are designed to provide both privacy protection and rank-ordered search capability. Retrieval results on an encrypted color image database and security analysis of the secure indexing schemes under different attack models show that data confidentiality can be preserved while retaining very good retrieval performance. This work has promising applications in secure multimedia management.
Gridded Model Information Support System (GMISS) user's guide. Volume 3. Model-concentration data-retrieval subsystem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

The Gridded Model Information Support System (GMISS) is a data base management system for selected Regional Oxidant Model (ROM) input data and species concentrations produced by gridded photochemical air pollution models. The Model Concentration Data Retrieval Subsystem allows State and local air pollution control agencies to retrieve these hourly data for use in support of their regulatory programs. These hourly data may be used to calculate initial and boundary conditions for the Empirical Kinetics Modeling Approach (EKMA). They may be used for other modeling application needs as well as to support evaluation of regional emission controls strategies. Both temporal andmore » spatial subsets of the data may be retrieved. The document describes how to invoke and execute the Model Concentration Data Retrieval Subsystem using the full screen menus.« less
Dynamic reduction of dimensions of a document vector in a document search and retrieval system

DOEpatents

Jiao, Yu; Potok, Thomas E.

2011-05-03

The method and system of the invention involves processing each new document (20) coming into the system into a document vector (16), and creating a document vector with reduced dimensionality (17) for comparison with the data model (15) without recomputing the data model (15). These operations are carried out by a first computer (11) while a second computer (12) updates the data model (18), which can be comprised of an initial large group of documents (19) and is premised on the computing an initial data model (13, 14, 15) to provide a reference point for determining document vectors from documents processed from the data stream (20).
Mining for Evidence in Enterprise Corpora

ERIC Educational Resources Information Center

Almquist, Brian Alan

2011-01-01

The primary research aim of this dissertation is to identify the strategies that best meet the information retrieval needs as expressed in the "e-discovery" scenario. This task calls for a high-recall system that, in response to a request for all available relevant documents to a legal complaint, effectively prioritizes documents from an…
Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine

ERIC Educational Resources Information Center

Alghoson, Abdullah

2017-01-01

Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on…

World-Wide Web: The Information Universe.

ERIC Educational Resources Information Center

Berners-Lee, Tim; And Others

1992-01-01

Describes the World-Wide Web (W3) project, which is designed to create a global information universe using techniques of hypertext, information retrieval, and wide area networking. Discussion covers the W3 data model, W3 architecture, the document naming scheme, protocols, document formats, comparison with other systems, experience with the W3…
Aquaculture Thesaurus: Descriptors Used in the National Aquaculture Information System.

ERIC Educational Resources Information Center

Lanier, James A.; And Others

This document provides a listing of descriptors used in the National Aquaculture Information System (NAIS), a computer information storage and retrieval system on marine, brackish, and freshwater organisms. Included are an explanation of how to use the document, subject index terms, and a brief bibliography of the literature used in developing the…
Creating and indexing teaching files from free-text patient reports.

PubMed Central

Johnson, D. B.; Chu, W. W.; Dionisio, J. D.; Taira, R. K.; Kangarloo, H.

1999-01-01

Teaching files based on real patient data can enhance the education of students, staff and other colleagues. Although information retrieval system can index free-text documents using keywords, these systems do not work well where content bearing terms (e.g., anatomy descriptions) frequently appears. This paper describes a system that uses multi-word indexing terms to provide access to free-text patient reports. The utilization of multi-word indexing allows better modeling of the content of medical reports, thus improving retrieval performance. The method used to select indexing terms as well as early evaluation of retrieval performance is discussed. PMID:10566473
Coordinating Council. Tenth Meeting: Information retrieval: The role of controlled vocabularies

NASA Technical Reports Server (NTRS)

1993-01-01

The theme of this NASA Scientific and Technical Information Program Coordinating Council meeting was the role of controlled vocabularies (thesauri) in information retrieval. Included are summaries of the presentations and the accompanying visuals. Dr. Raya Fidel addressed 'Retrieval: Free Text, Full Text, and Controlled Vocabularies.' Dr. Bella Hass Weinberg spoke on 'Controlled Vocabularies and Thesaurus Standards.' The presentations were followed by a panel discussion with participation from NASA, the National Library of Medicine, the Defense Technical Information Center, and the Department of Energy; this discussion, however, is not summarized in any detail in this document.
Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols.

PubMed

Campagne, Fabien

2008-02-29

The evaluation of information retrieval techniques has traditionally relied on human judges to determine which documents are relevant to a query and which are not. This protocol is used in the Text Retrieval Evaluation Conference (TREC), organized annually for the past 15 years, to support the unbiased evaluation of novel information retrieval approaches. The TREC Genomics Track has recently been introduced to measure the performance of information retrieval for biomedical applications. We describe two protocols for evaluating biomedical information retrieval techniques without human relevance judgments. We call these protocols No Title Evaluation (NT Evaluation). The first protocol measures performance for focused searches, where only one relevant document exists for each query. The second protocol measures performance for queries expected to have potentially many relevant documents per query (high-recall searches). Both protocols take advantage of the clear separation of titles and abstracts found in Medline. We compare the performance obtained with these evaluation protocols to results obtained by reusing the relevance judgments produced in the 2004 and 2005 TREC Genomics Track and observe significant correlations between performance rankings generated by our approach and TREC. Spearman's correlation coefficients in the range of 0.79-0.92 are observed comparing bpref measured with NT Evaluation or with TREC evaluations. For comparison, coefficients in the range 0.86-0.94 can be observed when evaluating the same set of methods with data from two independent TREC Genomics Track evaluations. We discuss the advantages of NT Evaluation over the TRels and the data fusion evaluation protocols introduced recently. Our results suggest that the NT Evaluation protocols described here could be used to optimize some search engine parameters before human evaluation. Further research is needed to determine if NT Evaluation or variants of these protocols can fully substitute for human evaluations.
Semantic Clustering of Search Engine Results

PubMed Central

Soliman, Sara Saad; El-Sayed, Maged F.; Hassan, Yasser F.

2015-01-01

This paper presents a novel approach for search engine results clustering that relies on the semantics of the retrieved documents rather than the terms in those documents. The proposed approach takes into consideration both lexical and semantics similarities among documents and applies activation spreading technique in order to generate semantically meaningful clusters. This approach allows documents that are semantically similar to be clustered together rather than clustering documents based on similar terms. A prototype is implemented and several experiments are conducted to test the prospered solution. The result of the experiment confirmed that the proposed solution achieves remarkable results in terms of precision. PMID:26933673
IPAT: a freely accessible software tool for analyzing multiple patent documents with inbuilt landscape visualizer.

PubMed

Ajay, Dara; Gangwal, Rahul P; Sangamwar, Abhay T

2015-01-01

Intelligent Patent Analysis Tool (IPAT) is an online data retrieval tool, operated based on text mining algorithm to extract specific patent information in a predetermined pattern into an Excel sheet. The software is designed and developed to retrieve and analyze technology information from multiple patent documents and generate various patent landscape graphs and charts. The software is C# coded in visual studio 2010, which extracts the publicly available patent information from the web pages like Google Patent and simultaneously study the various technology trends based on user-defined parameters. In other words, IPAT combined with the manual categorization will act as an excellent technology assessment tool in competitive intelligence and due diligence for predicting the future R&D forecast.
Millimeter-wave Imaging Radiometer (MIR) data processing and development of water vapor retrieval algorithms

NASA Technical Reports Server (NTRS)

Chang, L. Aron

1995-01-01

This document describes the progress of the task of the Millimeter-wave Imaging Radiometer (MIR) data processing and the development of water vapor retrieval algorithms, for the second six-month performing period. Aircraft MIR data from two 1995 field experiments were collected and processed with a revised data processing software. Two revised versions of water vapor retrieval algorithm were developed, one for the execution of retrieval on a supercomputer platform, and one for using pressure as the vertical coordinate. Two implementations of incorporating products from other sensors into the water vapor retrieval system, one from the Special Sensor Microwave Imager (SSM/I), the other from the High-resolution Interferometer Sounder (HIS). Water vapor retrievals were performed for both airborne MIR data and spaceborne SSM/T-2 data, during field experiments of TOGA/COARE, CAMEX-1, and CAMEX-2. The climatology of water vapor during TOGA/COARE was examined by SSM/T-2 soundings and conventional rawinsonde.
Web information retrieval based on ontology

NASA Astrophysics Data System (ADS)

Zhang, Jian

2013-03-01

The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.
Langley Atmospheric Information Retrieval System (LAIRS): System description and user's guide

NASA Technical Reports Server (NTRS)

Boland, D. E., Jr.; Lee, T.

1982-01-01

This document presents the user's guide, system description, and mathematical specifications for the Langley Atmospheric Information Retrieval System (LAIRS). It also includes a description of an optimal procedure for operational use of LAIRS. The primary objective of the LAIRS Program is to make it possible to obtain accurate estimates of atmospheric pressure, density, temperature, and winds along Shuttle reentry trajectories for use in postflight data reduction.
Cognitive Memory; A Computer Oriented Epistemological Approach to Information Storage and Retrieval. Interim Report, Phase I, 1 September 1967-28 February 1969.

ERIC Educational Resources Information Center

Illinois Univ., Urbana. Coordinated Science Lab.

In contrast to conventional information storage and retrieval systems in which a body of knowledge is thought of as an indexed codex of documents to which access is obtained by an appropriately indexed query, this interdisciplinary study aims at an understanding of what is "knowledge" as distinct from a "data file," how this knowledge is acquired,…
Context as the Building Blocks of Meaning: A Retrieval Model for the Semantic Representation of Words

DTIC Science & Technology

2003-04-01

8 Deconstructing the model’s output................................................................................ 9 Implications of the ideas...identified characters of a word are used as a probe to retrieve a word’s identity (its spelling and phonology ) from memory. In addition to the...document matrix has been reduced by the SVD. Deconstructing the model’s output Why do semantic relationships between words emerge from the model? Is the
Indexing the medical open access literature for textual and content-based visual retrieval.

PubMed

Eggel, Ivan; Müller, Henning

2010-01-01

Over the past few years an increasing amount of scientific journals have been created in an open access format. Particularly in the medical field the number of openly accessible journals is enormous making a wide body of knowledge available for analysis and retrieval. Part of the trend towards open access publications can be linked to funding bodies such as the NIH¹ (National Institutes of Health) and the Swiss National Science Foundation (SNF²) requiring funded projects to make all articles of funded research available publicly. This article describes an approach to make part of the knowledge of open access journals available for retrieval including the textual information but also the images contained in the articles. For this goal all articles of 24 journals related to medical informatics and medical imaging were crawled from the web pages of BioMed Central. Text and images of the PDF (Portable Document Format) files were indexed separately and a web-based retrieval interface allows for searching via keyword queries or by visual similarity queries. Starting point for a visual similarity query can be an image on the local hard disk that is uploaded or any image found via the textual search. Search for similar documents is also possible.
Establishment of an inferior vena cava filter database and interventional radiology led follow-up - retrieval rates and patients lost to follow-up.

PubMed

Klinken, Sven; Humphries, Charlotte; Ferguson, John

2017-10-01

To evaluate the rates of inferior vena cava (IVC) filter retrieval and the number of patient's lost to follow-up, before and after the establishment of an IVC filter database and interventional radiology (inserting physician) led follow-up. On the 1st of June 2012, an electronic interventional radiology database was established at our Institution. In addition, the interventional radiology team took responsibility for follow-up of IVC filters. Data were prospectively collected from the database for all patients who had an IVC filter inserted between the 1st June 2012 and the 31st May 2014. Data on patients who had an IVC filter inserted between the 1st of June 2009 to the 31st of May 2012 were retrospectively reviewed. Patient demographics, insertion indications, filter types, retrieval status, documented retrieval decisions, time in situ, trackable events and complications were obtained in the pre-database (n = 136) and post-database (n = 118) cohorts. Attempted IVC filter retrieval rates were improved from 52.9% to 72.9% (P = 0.001) following the establishment of the database. The number of patients with no documented decision (lost to follow-up) regarding their IVC filter reduced from 31 of 136 (23%) to 0 of 118 patients (P = < 0.001). There was a non-significant reduction in IVC filter dwell time in the post-database group (113 as compared to 137 days, P = 0.129). Following the establishment of an IVC filter database and interventional radiology led follow-up, we demonstrate a significant improvement in the attempted retrieval rates of IVC filters and the number of patient's lost to follow-up. © 2017 The Royal Australian and New Zealand College of Radiologists.
Understanding human quality judgment in assessing online forum contents for thread retrieval purpose

NASA Astrophysics Data System (ADS)

Ismail, Zuriati; Salim, Naomie; Huspi, Sharin Hazlin

2017-10-01

Compared to traditional materials or journals, user-generated contents are not peer-reviewed. Lack of quality control and the explosive growth of web contents make the task of finding quality information on the web especially critical. The existence of new facilities for producing web contents such as forum makes this issue more significant. This study focuses on online forums threads or discussion, where the forums contain valuable human-generated information in a form of discussions. Due to the unique structure of the online forum pages, special techniques are required to organize and search for information in these forums. Quality biased retrieval is a retrieval approach that search for relevant document and prioritized higher quality documents. Despite major concern of quality content and recent development of quality biased retrieval, there is an urgent need to understand how quality content is being judged, for retrieval and performance evaluation purposes. Furthermore, even though there are various studies on the quality of information, there is no standard framework that has been established. The primary aim of this paper is to contribute to the understanding of human quality judgment in assessing online forum contents. The foundation of this study is to compare and evaluate different frameworks (for quality biased retrieval and information quality). This led to the finding that many quality dimensions are redundant and some dimensions are understood differently between different studies. We conducted a survey on crowdsourcing community to measure the importance of each quality dimensions found in various frameworks. Accuracy and ease of understanding are among top important dimensions while threads popularity and contents manipulability are among least important dimensions. This finding is beneficial in evaluating contents of online forum.
Automated payload experiment tool feasibility study

NASA Technical Reports Server (NTRS)

Maddux, Gary A.; Clark, James; Delugach, Harry; Hammons, Charles; Logan, Julie; Provancha, Anna

1991-01-01

To achieve an environment less dependent on the flow of paper, automated techniques of data storage and retrieval must be utilized. The prototype under development seeks to demonstrate the ability of a knowledge-based, hypertext computer system. This prototype is concerned with the logical links between two primary NASA support documents, the Science Requirements Document (SRD) and the Engineering Requirements Document (ERD). Once developed, the final system should have the ability to guide a principal investigator through the documentation process in a more timely and efficient manner, while supplying more accurate information to the NASA payload developer.
Extended Subject Access to Hypertext Online Documentation. Parts I and II: The Search-Support and Maintenance Problems.

ERIC Educational Resources Information Center

Girill, T. R.; And Others

1991-01-01

Describes enhancements made to a hypertext information retrieval system at the National Energy Research Supercomputer Center (NERSC) called DFT (Document, Find, and Theseus). The enrichment of DFT's entry vocabulary is described, DFT and other hypertext systems are compared, and problems that occur due to the need for frequent updates are…
Repetition and Diversification in Multi-Session Task Oriented Search

ERIC Educational Resources Information Center

Tyler, Sarah K.

2013-01-01

As the number of documents and the availability of information online grows, so to can the difficulty in sifting through documents to find what we're searching for. Traditional Information Retrieval (IR) systems consider the query as the representation of the user's needs, and as such are limited to the user's ability to describe the information…
Facsimile Transmission of Microforms.

DTIC Science & Technology

1983-12-30

display terminals, high speed printers, conventional facsimile receivers, and/or graphics COM recorders. Microforms designed for storage, retrieval...author and Whould not be construed as an official Department of the Army position, policy, or decision, unless so designated by other documentation...beconstrued as an official Department of the Army position, policy, or decision, unless so designated by other documentation. ,, -- UNCLASSIFIED SECURITY
User Evaluation of Automatically Generated Semantic Hypertext Links in a Heavily Used Procedural Manual.

ERIC Educational Resources Information Center

Tebbutt, John

1999-01-01

Discusses efforts at National Institute of Standards and Technology (NIST) to construct an information discovery tool through the fusion of hypertext and information retrieval that works by parsing a contiguous document base into smaller documents and inserting semantic links between them. Also presents a case study that evaluated user reactions.…

Preparing a collection of radiology examinations for distribution and retrieval.

PubMed

Demner-Fushman, Dina; Kohli, Marc D; Rosenman, Marc B; Shooshan, Sonya E; Rodriguez, Laritza; Antani, Sameer; Thoma, George R; McDonald, Clement J

2016-03-01

Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/). Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
Bibliometric analysis of worldwide scientific literature in mobile - health: 2006-2016.

PubMed

Sweileh, Waleed M; Al-Jabi, Samah W; AbuTaha, Adham S; Zyoud, Sa'ed H; Anayah, Fathi M A; Sawalha, Ansam F

2017-05-30

The advancement of mobile technology had positively influenced healthcare services. An emerging subfield of mobile technology is mobile health (m-Health) in which mobile applications are used for health purposes. The aim of this study was to analyze and assess literature published in the field of m-Health. SciVerse Scopus was used to retrieve literature in m-Health. The study period was set from 2006 to 2016. ArcGIS 10.1 was used to present geographical distribution of publications while VOSviewer was used for data visualization. Growth of publications, citation analysis, and research productivity were presented using standard bibliometric indicators. During the study period, a total of 5465 documents were published, giving an average of 496.8 documents per year. The h-index of retrieved documents was 81. Core keywords used in literature pertaining to m-Health included diabetes mellitus, adherence, and obesity among others. Relative growth rate and doubling time of retrieved literature were stable from 2009 to 2015 indicating exponential growth of literature in this field. A total of 4638 (84.9%) documents were multi-authored with a mean collaboration index of 4.1 authors per article. The United States of America ranked first in productivity with 1926 (35.2%) published documents. India ranked sixth with 183 (3.3%) documents while China ranked seventh with 155(2.8%) documents. VA Medical Center was the most prolific organization/institution while Journal of Medical Internet Research was the preferred journal for publications in the field of m-Health. Top cited articles in the field of m-Health included the use of mobile technology in improving adherence in HIV patients, weight loss, and improving glycemic control in diabetic patients. The size of literature in m-Health showed a noticeable increase in the past decade. Given the large volume of citations received in this field, it is expected that applications of m-Health will be seen into various health aspects and health services. Research in m-Health needs to be encouraged, particularly in the fight against AIDS, poor medication adherence, glycemic control in Africa and other low income world regions where technology can improve health services and decrease disease burden.
Medication order communication using fax and document-imaging technologies.

PubMed

Simonian, Armen I

2008-03-15

The implementation of fax and document-imaging technology to electronically communicate medication orders from nursing stations to the pharmacy is described. The evaluation of a commercially available pharmacy order imaging system to improve order communication and to make document retrieval more efficient led to the selection and customization of a system already licensed and used in seven affiliated hospitals. The system consisted of existing fax machines and document-imaging software that would capture images of written orders and send them from nursing stations to a central database server. Pharmacists would then retrieve the images and enter the orders in an electronic medical record system. The pharmacy representatives from all seven hospitals agreed on the configuration and functionality of the custom application. A 30-day trial of the order imaging system was successfully conducted at one of the larger institutions. The new system was then implemented at the remaining six hospitals over a period of 60 days. The transition from a paper-order system to electronic communication via a standardized pharmacy document management application tailored to the specific needs of this health system was accomplished. A health system with seven affiliated hospitals successfully implemented electronic communication and the management of inpatient paper-chart orders by using faxes and document-imaging technology. This standardized application eliminated the problems associated with the hand delivery of paper orders, the use of the pneumatic tube system, and the printing of traditional faxes.
Text grouping in patent analysis using adaptive K-means clustering algorithm

NASA Astrophysics Data System (ADS)

Shanie, Tiara; Suprijadi, Jadi; Zulhanif

2017-03-01

Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.
Final Inventory Work-Off Plan for ORNL transuranic wastes (1986 version)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dickerson, L.S.

1988-05-01

The Final Inventory Work-Off Plan (IWOP) for ORNL Transuranic Wastes addresses ORNL's strategy for retrieval, certification, and shipment of its stored and newly generated contact-handled (CH) and remote-handled (RH) transuranic (TRU) wastes to the Waste Isolation Pilot Plant (WIPP), the proposed geologic repository near Carlsbad, New Mexico. This document considers certification compliance with the WIPP waste acceptance criteria (WAC) and is consistent with the US Department of Energy's Long-Range Master Plan for Defense Transuranic Waste Management. This document characterizes Oak Ridge National Laboratory's (ORNL's) TRU waste by type and estimates the number of shipments required to dispose of it; describesmore » the methods, facilities, and systems required for its certification and shipment; presents work-off strategies and schedules for retrieval, certification, and transportation; discusses the resource needs and additions that will be required for the effort and forecasts costs for the long-term TRU waste management program; and lists public documentation required to support certification facilities and strategies. 22 refs., 6 figs., 10 tabs.« less
Beyond Information Retrieval—Medical Question Answering

PubMed Central

Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong

2006-01-01

Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385
Large Scale Document Inversion using a Multi-threaded Computing System

PubMed Central

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2018-01-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.

PubMed

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2017-06-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches.

PubMed

Zeng, Yuqun; Liu, Xusheng; Wang, Yanshan; Shen, Feichen; Liu, Sijia; Rastegar-Mojarad, Majid; Wang, Liwei; Liu, Hongfang

2017-10-16

Self-management is crucial to diabetes care and providing expert-vetted content for answering patients' questions is crucial in facilitating patient self-management. The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients' questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums. ©Yuqun Zeng, Xusheng Liu, Yanshan Wang, Feichen Shen, Sijia Liu, Majid Rastegar Mojarad, Liwei Wang, Hongfang Liu. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 16.10.2017.
Nurses using futuristic technology in today's healthcare setting.

PubMed

Wolf, Debra M; Kapadia, Amar; Kintzel, Jessie; Anton, Bonnie B

2009-01-01

Human computer interaction (HCI) equates nurses using voice assisted technology within a clinical setting to document patient care real time, retrieve patient information from care plans, and complete routine tasks. This is a reality currently utilized by clinicians today in acute and long term care settings. Voice assisted documentation provides hands & eyes free accurate documentation while enabling effective communication and task management. The speech technology increases the accuracy of documentation, while interfacing directly into the electronic health record (EHR). Using technology consisting of a light weight headset and small fist size wireless computer, verbal responses to easy to follow cues are converted into a database systems allowing staff to obtain individualized care status reports on demand. To further assist staff in their daily process, this innovative technology allows staff to send and receive pages as needed. This paper will discuss how leading edge and award winning technology is being integrated within the United States. Collaborative efforts between clinicians and analyst will be discussed reflecting the interactive design and build functionality. Features such as the system's voice responses and directed cues will be shared and how easily data can be documented, viewed and retrieved. Outcome data will be presented on how the technology impacted organization's quality outcomes, financial reimbursement, and employee's level of satisfaction.
Data Archival and Retrieval Enhancement (DARE) Metadata Modeling and Its User Interface

NASA Technical Reports Server (NTRS)

Hyon, Jason J.; Borgen, Rosana B.

1996-01-01

The Defense Nuclear Agency (DNA) has acquired terabytes of valuable data which need to be archived and effectively distributed to the entire nuclear weapons effects community and others...This paper describes the DARE (Data Archival and Retrieval Enhancement) metadata model and explains how it is used as a source for generating HyperText Markup Language (HTML)or Standard Generalized Markup Language (SGML) documents for access through web browsers such as Netscape.
Using Stream Features for Instant Document Filtering

DTIC Science & Technology

2012-11-01

expansion and qual- ity indicators in searching microblog posts. Advances in Information Retrieval, pages 362–367, 2011. [12] N. Naveed, T. Gottron, J ...16] G Salton and C Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988. [17...Overview of the TREC-2012 Microblog Track. In trec.nist.gov. NIST. [19] Michael J Welch, Uri Schonfeld, Dan He, and Junghoo Cho. Topical semantics of
PubMed Interact: an Interactive Search Application for MEDLINE/PubMed

PubMed Central

Muin, Michael; Fontelo, Paul; Ackerman, Michael

2006-01-01

Online search and retrieval systems are important resources for medical literature research. Progressive Web 2.0 technologies provide opportunities to improve search strategies and user experience. Using PHP, Document Object Model (DOM) manipulation and Asynchronous JavaScript and XML (Ajax), PubMed Interact allows greater functionality so users can refine search parameters with ease and interact with the search results to retrieve and display relevant information and related articles. PMID:17238658
The Smoothed Dirichlet Distribution: Understanding Cross-Entropy Ranking in Information Retrieval

DTIC Science & Technology

2006-07-01

reflect those of the spon- sor. viii ABSTRACT Unigram Language modeling is a successful probabilistic framework for Information Retrieval (IR) that uses...the Relevance model (RM), a state-of-the-art model for IR in the language modeling framework that uses the same cross-entropy as its ranking function...In addition, the SD based classifier provides more flexibility than RM in modeling documents owing to a consistent generative framework . We
Getting What You Want: Accurate Document Filtering in a Terabyte World

DTIC Science & Technology

2002-11-01

models are used widely in speech recognition and have shown promise for ad-hoc information retrieval (Ponte and Croft, 1998; Lafferty and Zhai, 2001...tasks is focused on developing techniques similar to those used in speech recognition. However the differing requirements of speech recognition and...Conference on Research and Development in Information Retrieval. ACM. 6. T.Ault, and Y. Yang. (2001.) kNN at TREC-9: A failure analysis. In
Project W-211 initial tank retrieval systems year 2000 compliance assessment project plan

DOE Office of Scientific and Technical Information (OSTI.GOV)

BUSSELL, J.H.

1999-08-24

This assessment describes the potential Year 2000 (Y2K) problems and describes the methods for achieving Y2K Compliance for Project W-211, Initial Tank Retrieval Systems (ITRS). The purpose of this assessment is to give an overview of the project. This document will not be updated and any dates contained in this document are estimates and may change. The scope of project W-211 is to provide systems for retrieval of radioactive wastes from ten double-shell tanks (DST). systems will be installed in tanks 102-AP, 104-AP, 105-AN, 104-AN, 102-AZ, 101-AW, 103-AN, 107-AN, 102-AY, and 102-SY. The current tank selection and sequence supports phasemore » I feed delivery to privatized processing plants. A detailed description of system dates, functions, interfaces, potential Y2K problems, and date resolutions can not be described since the project is in the definitive design phase. This assessment will describe the methods, protocols, and practices to assure that equipment and systems do not have Y2K problems.« less
Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease

PubMed Central

Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.

1998-01-01

The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.
Pilot production system cost/benefit analysis: Digital document storage project

NASA Technical Reports Server (NTRS)

1989-01-01

The Digital Document Storage (DDS)/Pilot Production System (PPS) will provide cost effective electronic document storage, retrieval, hard copy reproduction, and remote access for users of NASA Technical Reports. The DDS/PPS will result in major benefits, such as improved document reproduction quality within a shorter time frame than is currently possible. In addition, the DDS/PPS will provide an important strategic value through the construction of a digital document archive. It is highly recommended that NASA proceed with the DDS Prototype System and a rapid prototyping development methodology in order to validate recent working assumptions upon which the success of the DDS/PPS is dependent.
G-Bean: an ontology-graph based web tool for biomedical literature retrieval

PubMed Central

2014-01-01

Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.

PubMed

Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S

2014-01-01

Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.

Paperless Contract Folder’s (PCF) DoD 5015.2 Certification

DTIC Science & Technology

2010-06-01

Draft Version Controls ...........................................................11 i. Electronic Routing of Purchase Request (Funding) Documents...of electronic records, version control , robust search and retrieval, and automated disposition that is compliant with legal requirements. As shown...h. Draft Version Controls Draft and versioning controls track the changes to the documents once they are saved. Draft numbers (0.1, 0.2, 0.3
Evaluation of the Retrieval of Metallurgical Document References using the Universal Decimal Classification in a Computer-Based System.

ERIC Educational Resources Information Center

Freeman, Robert R.

A set of twenty five questions was processed against a computer-stored file of 9159 document references in the field of ferrous metallurgy, representing the 1965 coverage of the Iron and Steel Institute (London) information service. A basis for evaluation of system performance characteristics and analysis of system failures was provided by using…
Improving program documentation quality through the application of continuous improvement processes.

PubMed

Lovlien, Cheryl A; Johansen, Martha; Timm, Sandra; Eversman, Shari; Gusa, Dorothy; Twedell, Diane

2007-01-01

Maintaining the integrity of record keeping and retrievable information related to the provision of continuing education credit creates challenges for a large organization. Accurate educational program documentation is vital to support the knowledge and professional development of nursing staff. Quality review and accurate documentation of programs for nursing staff development occurred at one institution through the use of continuous improvement principles. Integration of the new process into the current system maintains the process of providing quality record keeping.
Relevance similarity: an alternative means to monitor information retrieval systems

PubMed Central

Dong, Peng; Loh, Marie; Mondry, Adrian

2005-01-01

Background Relevance assessment is a major problem in the evaluation of information retrieval systems. The work presented here introduces a new parameter, "Relevance Similarity", for the measurement of the variation of relevance assessment. In a situation where individual assessment can be compared with a gold standard, this parameter is used to study the effect of such variation on the performance of a medical information retrieval system. In such a setting, Relevance Similarity is the ratio of assessors who rank a given document same as the gold standard over the total number of assessors in the group. Methods The study was carried out on a collection of Critically Appraised Topics (CATs). Twelve volunteers were divided into two groups of people according to their domain knowledge. They assessed the relevance of retrieved topics obtained by querying a meta-search engine with ten keywords related to medical science. Their assessments were compared to the gold standard assessment, and Relevance Similarities were calculated as the ratio of positive concordance with the gold standard for each topic. Results The similarity comparison among groups showed that a higher degree of agreements exists among evaluators with more subject knowledge. The performance of the retrieval system was not significantly different as a result of the variations in relevance assessment in this particular query set. Conclusion In assessment situations where evaluators can be compared to a gold standard, Relevance Similarity provides an alternative evaluation technique to the commonly used kappa scores, which may give paradoxically low scores in highly biased situations such as document repositories containing large quantities of relevant data. PMID:16029513
Computer Program and User Documentation Medical Data Update System

NASA Technical Reports Server (NTRS)

Anderson, J.

1971-01-01

The update system for the NASA medical data minicomputer storage and retrieval system is described. The discussion includes general and technical specifications, a subroutine list, and programming instructions.
NSSDC and WDC-A-R/S document availability and distribution services

NASA Technical Reports Server (NTRS)

1980-01-01

Documents available from the National Space Science Data Center (NSSDC) and the World Data Center A for Rockets Satellites are described. The availability, costs, ordering procedures for documents presently available, and the procedures for obtaining future documents are given. NSSDC, established by NASA to further the widest practicable use of reduced data obtained from space science investigations and to provide investigators with an active repository for such data, is responsible for the active collection, organization, storage, announcement, retrieval, dissemination, and exchange of data received from satellite experiments. Information on sounding rocket investigations is also collected.
Spatial Paradigm for Information Retrieval and Exploration

DOE Office of Scientific and Technical Information (OSTI.GOV)

The SPIRE system consists of software for visual analysis of primarily text based information sources. This technology enables the content analysis of text documents without reading all the documents. It employs several algorithms for text and word proximity analysis. It identifies the key themes within the text documents. From this analysis, it projects the results onto a visual spatial proximity display (Galaxies or Themescape) where items (documents and/or themes) visually close to each other are known to have content which is close to each other. Innovative interaction techniques then allow for dynamic visual analysis of large text based information spaces.
SPIRE1.03. Spatial Paradigm for Information Retrieval and Exploration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adams, K.J.; Bohn, S.; Crow, V.

The SPIRE system consists of software for visual analysis of primarily text based information sources. This technology enables the content analysis of text documents without reading all the documents. It employs several algorithms for text and word proximity analysis. It identifies the key themes within the text documents. From this analysis, it projects the results onto a visual spatial proximity display (Galaxies or Themescape) where items (documents and/or themes) visually close to each other are known to have content which is close to each other. Innovative interaction techniques then allow for dynamic visual analysis of large text based information spaces.
PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

PubMed Central

Lin, Jimmy

2008-01-01

Background Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web. Results We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics. Conclusion The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain. PMID:18538027
A knowledgebase system to enhance scientific discovery: Telemakus

PubMed Central

Fuller, Sherrilynne S; Revere, Debra; Bugni, Paul F; Martin, George M

2004-01-01

Background With the rapid expansion of scientific research, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. Efforts to improve and speed up scientific discovery are being explored on a number of fronts. However, much of this work is based on traditional search and retrieval approaches and the bibliographic citation presentation format remains unchanged. Methods Case study. Results The Telemakus KnowledgeBase System provides flexible new tools for creating knowledgebases to facilitate retrieval and review of scientific research reports. In formalizing the representation of the research methods and results of scientific reports, Telemakus offers a potential strategy to enhance the scientific discovery process. While other research has demonstrated that aggregating and analyzing research findings across domains augments knowledge discovery, the Telemakus system is unique in combining document surrogates with interactive concept maps of linked relationships across groups of research reports. Conclusion Based on how scientists conduct research and read the literature, the Telemakus KnowledgeBase System brings together three innovations in analyzing, displaying and summarizing research reports across a domain: (1) research report schema, a document surrogate of extracted research methods and findings presented in a consistent and structured schema format which mimics the research process itself and provides a high-level surrogate to facilitate searching and rapid review of retrieved documents; (2) research findings, used to index the documents, allowing searchers to request, for example, research studies which have studied the relationship between neoplasms and vitamin E; and (3) visual exploration interface of linked relationships for interactive querying of research findings across the knowledgebase and graphical displays of what is known as well as, through gaps in the map, what is yet to be tested. The rationale and system architecture are described and plans for the future are discussed. PMID:15507158
HONselect: multilingual assistant search engine operated by a concept-based interface system to decentralized heterogeneous sources.

PubMed

Boyer, C; Baujard, V; Scherrer, J R

2001-01-01

Any new user to the Internet will think that to retrieve the relevant document is an easy task especially with the wealth of sources available on this medium, but this is not the case. Even experienced users have difficulty formulating the right query for making the most of a search tool in order to efficiently obtain an accurate result. The goal of this work is to reduce the time and the energy necessary in searching and locating medical and health information. To reach this goal we have developed HONselect [1]. The aim of HONselect is not only to improve efficiency in retrieving documents but to respond to an increased need for obtaining a selection of relevant and accurate documents from a breadth of various knowledge databases including scientific bibliographical references, clinical trials, daily news, multimedia illustrations, conferences, forum, Web sites, clinical cases, and others. The authors based their approach on the knowledge representation using the National Library of Medicine's Medical Subject Headings (NLM, MeSH) vocabulary and classification [2,3]. The innovation is to propose a multilingual "one-stop searching" (one Web interface to databases currently in English, French and German) with full navigational and connectivity capabilities. The user may choose from a given selection of related terms the one that best suit his search, navigate in the term's hierarchical tree, and access directly to a selection of documents from high quality knowledge suppliers such as the MEDLINE database, the NLM's ClinicalTrials.gov server, the NewsPage's daily news, the HON's media gallery, conference listings and MedHunt's Web sites [4, 5, 6, 7, 8, 9]. HONselect, developed by HON, a non-profit organisation [10], is a free online available multilingual tool based on the MeSH thesaurus to index, select, retrieve and display accurate, up to date, high-level and quality documents.
A Scopus-based examination of tobacco use publications in Middle Eastern Arab countries during the period 2003–2012

PubMed Central

2014-01-01

Background Tobacco smoking is the main health-care problem in the world. Evaluation of scientific output in the field of tobacco use has been poorly explored in Middle Eastern Arab (MEA) countries to date, and there are few internationally published reports on research activity in tobacco use. The main objectives of this study were to analyse the research output originating from 13 MEA countries on tobacco fields and to examine the authorship pattern and the citations retrieved from the Scopus database. Methods Data from 1 January 2003 through 31 December 2012 were searched for documents with specific words regarding the tobacco field as 'keywords’ in the title in any 1 of the 13 MEA countries. Research productivity was evaluated based on a methodology developed and used in other bibliometric studies. Results Five hundred documents were retrieved from 320 peer-reviewed journals. The greatest amount of research activity was from Egypt (25.4%), followed by the Kingdom of Saudi Arabia (KSA) (23.2%), Lebanon (16.3%), and Jordan (14.8%). The total number of citations for the 560 documents, at the time of data analysis (27 August 2013), was 5,585, with a mean ± SD of 9.95 ± 22.64 and a median (interquartile range) of 3(1–10). The h-index of the retrieved documents was 34. This study identified 232 (41.4%) documents from 53 countries in MEA-foreign country collaborations. By region, MEA collaborated most often with countries in the Americas (29.6%), followed by countries in the same MEA region (13.4%), especially KSA and Egypt. Conclusions The present data reveal a promising rise and a good start for research productivity in the tobacco field in the Arab world. Research output is low in some countries, which can be improved by investing in more international and national collaborative research projects in the field of tobacco. PMID:24885706
A Scopus-based examination of tobacco use publications in Middle Eastern Arab countries during the period 2003-2012.

PubMed

Zyoud, Sa'ed H; Al-Jabi, Samah W; Sweileh, Waleed M; Awang, Rahmat

2014-05-01

Tobacco smoking is the main health-care problem in the world. Evaluation of scientific output in the field of tobacco use has been poorly explored in Middle Eastern Arab (MEA) countries to date, and there are few internationally published reports on research activity in tobacco use. The main objectives of this study were to analyse the research output originating from 13 MEA countries on tobacco fields and to examine the authorship pattern and the citations retrieved from the Scopus database. Data from 1 January 2003 through 31 December 2012 were searched for documents with specific words regarding the tobacco field as 'keywords' in the title in any 1 of the 13 MEA countries. Research productivity was evaluated based on a methodology developed and used in other bibliometric studies. Five hundred documents were retrieved from 320 peer-reviewed journals. The greatest amount of research activity was from Egypt (25.4%), followed by the Kingdom of Saudi Arabia (KSA) (23.2%), Lebanon (16.3%), and Jordan (14.8%). The total number of citations for the 560 documents, at the time of data analysis (27 August 2013), was 5,585, with a mean ± SD of 9.95 ± 22.64 and a median (interquartile range) of 3(1-10). The h-index of the retrieved documents was 34. This study identified 232 (41.4%) documents from 53 countries in MEA-foreign country collaborations. By region, MEA collaborated most often with countries in the Americas (29.6%), followed by countries in the same MEA region (13.4%), especially KSA and Egypt. The present data reveal a promising rise and a good start for research productivity in the tobacco field in the Arab world. Research output is low in some countries, which can be improved by investing in more international and national collaborative research projects in the field of tobacco.
Framing Electronic Medical Records as Polylingual Documents in Query Expansion

PubMed Central

Huang, Edward W; Wang, Sheng; Lee, Doris Jung-Lin; Zhang, Runshun; Liu, Baoyan; Zhou, Xuezhong; Zhai, ChengXiang

2017-01-01

We present a study of electronic medical record (EMR) retrieval that emulates situations in which a doctor treats a new patient. Given a query consisting of a new patient’s symptoms, the retrieval system returns the set of most relevant records of previously treated patients. However, due to semantic, functional, and treatment synonyms in medical terminology, queries are often incomplete and thus require enhancement. In this paper, we present a topic model that frames symptoms and treatments as separate languages. Our experimental results show that this method improves retrieval performance over several baselines with statistical significance. These baselines include methods used in prior studies as well as state-of-the-art embedding techniques. Finally, we show that our proposed topic model discovers all three types of synonyms to improve medical record retrieval. PMID:29854161
Concept-Based Retrieval from Critical Incident Reports.

PubMed

Denecke, Kerstin

2017-01-01

Critical incident reporting systems (CIRS) are used as a means to collect anonymously entered information of incidents that occurred for example in a hospital. Analyzing this information helps to identify among others problems in the workflow, in the infrastructure or in processes. The entire potential of these sources of experiential knowledge remains often unconsidered since retrieval of relevant reports and their analysis is difficult and time-consuming, and the reporting systems often do not provide support for these tasks. The objective of this work is to develop a method for retrieving reports from the CIRS related to a specific user query. atural language processing (NLP) and information retrieval (IR) methods are exploited for realizing the retrieval. We compare standard retrieval methods that rely upon frequency of words with an approach that includes a semantic mapping of natural language to concepts of a medical ontology. By an evaluation, we demonstrate the feasibility of semantic document enrichment to improve recall in incident reporting retrieval. It is shown that a combination of standard keyword-based retrieval with semantic search results in highly satisfactory recall values. In future work, the evaluation should be repeated on a larger data set and real-time user evaluation need to be performed to assess user satisfactory with the system and results.
Issues associated with manipulator-based waste retrieval from Hanford underground storage tanks with a preliminary review of commercial concepts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berglin, E.J.

1996-09-17

Westinghouse Hanford Company (WHC) is exploring commercial methods for retrieving waste from the underground storage tanks at the Hanford site in south central Washington state. WHC needs data on commercial retrieval systems equipment in order to make programmatic decisions for waste retrieval. Full system testing of retrieval processes is to be demonstrated in phases through September 1997 in support of programs aimed to Acquire Commercial Technology for Retrieval (ACTR) and at the Hanford Tanks Initiative (HTI). One of the important parts of the integrated testing will be the deployment of retrieval tools using manipulator-based systems. WHC requires an assessment ofmore » a number of commercial deployment systems that have been identified by the ACTR program as good candidates to be included in an integrated testing effort. Included in this assessment should be an independent evaluation of manipulator tests performed to date, so that WHC can construct an integrated test based on these systems. The objectives of this document are to provide a description of the need, requirements, and constraints for a manipulator-based retrieval system; to evaluate manipulator-based concepts and testing performed to date by a number of commercial organizations; and to identify issues to be resolved through testing and/or analysis for each concept.« less
Divided attention improves delayed, but not immediate retrieval of a consolidated memory.

PubMed

Kessler, Yoav; Vandermorris, Susan; Gopie, Nigel; Daros, Alexander; Winocur, Gordon; Moscovitch, Morris

2014-01-01

A well-documented dissociation between memory encoding and retrieval concerns the role of attention in the two processes. The typical finding is that divided attention (DA) during encoding impairs future memory, but retrieval is relatively robust to attentional manipulations. However, memory research in the past 20 years had demonstrated that retrieval is a memory-changing process, in which the strength and availability of information are modified by various characteristics of the retrieval process. Based on this logic, several studies examined the effects of DA during retrieval (Test 1) on a future memory test (Test 2). These studies yielded inconsistent results. The present study examined the role of memory consolidation in accounting for the after-effect of DA during retrieval. Initial learning required a classification of visual stimuli, and hence involved incidental learning. Test 1 was administered 24 hours after initial learning, and therefore required retrieval of consolidated information. Test 2 was administered either immediately following Test 1 or after a 24-hour delay. Our results show that the effect of DA on Test 2 depended on this delay. DA during Test 1 did not affect performance on Test 2 when it was administered immediately, but improved performance when Test 2 was given 24-hours later. The results are consistent with other findings showing long-term benefits of retrieval difficulty. Implications for theories of reconsolidation in human episodic memory are discussed.
Fusion of Deep Learning and Compressed Domain features for Content Based Image Retrieval.

PubMed

Liu, Peizhong; Guo, Jing-Ming; Wu, Chi-Yi; Cai, Danlin

2017-08-29

This paper presents an effective image retrieval method by combining high-level features from Convolutional Neural Network (CNN) model and low-level features from Dot-Diffused Block Truncation Coding (DDBTC). The low-level features, e.g., texture and color, are constructed by VQ-indexed histogram from DDBTC bitmap, maximum, and minimum quantizers. Conversely, high-level features from CNN can effectively capture human perception. With the fusion of the DDBTC and CNN features, the extended deep learning two-layer codebook features (DL-TLCF) is generated using the proposed two-layer codebook, dimension reduction, and similarity reweighting to improve the overall retrieval rate. Two metrics, average precision rate (APR) and average recall rate (ARR), are employed to examine various datasets. As documented in the experimental results, the proposed schemes can achieve superior performance compared to the state-of-the-art methods with either low- or high-level features in terms of the retrieval rate. Thus, it can be a strong candidate for various image retrieval related applications.
First European Congress on Documentation Systems and Networks. Luxembourg, 16th, 17th and 18th May 1973.

ERIC Educational Resources Information Center

Commission des Communautes Europeennes (Luxembourg).

The conference proceedings contained in this document include invited papers, transcripts of discussions following those papers, and the reports of topical committees that met during the three day conference held in Luxembourg, May 1973. The focus of the conference was on the design and use of information retrieval and data base systems in various…
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.

PubMed

Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel

2012-01-01

Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012

Implementation of the common phrase index method on the phrase query for information retrieval

NASA Astrophysics Data System (ADS)

Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah

2017-08-01

As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.
Nosql for Storage and Retrieval of Large LIDAR Data Collections

NASA Astrophysics Data System (ADS)

Boehm, J.; Liu, K.

2015-08-01

Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
Automatic generation of Web mining environments

NASA Astrophysics Data System (ADS)

Cibelli, Maurizio; Costagliola, Gennaro

1999-02-01

The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
The SAPHIRE server: a new algorithm and implementation.

PubMed Central

Hersh, W.; Leone, T. J.

1995-01-01

SAPHIRE is an experimental information retrieval system implemented to test new approaches to automated indexing and retrieval of medical documents. Due to limitations in its original concept-matching algorithm, a modified algorithm has been implemented which allows greater flexibility in partial matching and different word order within concepts. With the concomitant growth in client-server applications and the Internet in general, the new algorithm has been implemented as a server that can be accessed via other applications on the Internet. PMID:8563413
LDEF grappled by remote manipulator system (RMS) during STS-32 retrieval

NASA Image and Video Library

1990-01-20

This view taken through overhead window W7 on Columbia's, Orbiter Vehicle (OV) 102's, aft flight deck shows the Long Duration Exposure Facility (LDEF) in the grasp of the remote manipulator system (RMS) during STS-32 retrieval activities. Other cameras at eye level were documenting the bus-sized spacecraft at various angles as the RMS manipulated LDEF for a lengthy photo survey. The glaring celestial body in the upper left is the sun with the Earth's surface visible below.
Modeling and mining term association for improving biomedical information retrieval performance.

PubMed

Hu, Qinmin; Huang, Jimmy Xiangji; Hu, Xiaohua

2012-06-11

The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.
Modeling and mining term association for improving biomedical information retrieval performance

PubMed Central

2012-01-01

Background The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. Results We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. Conclusions First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages. PMID:22901087
Full-scale system impact analysis: Digital document storage project

NASA Technical Reports Server (NTRS)

1989-01-01

The Digital Document Storage Full Scale System can provide cost effective electronic document storage, retrieval, hard copy reproduction, and remote access for users of NASA Technical Reports. The desired functionality of the DDS system is highly dependent on the assumed requirements for remote access used in this Impact Analysis. It is highly recommended that NASA proceed with a phased, communications requirement analysis to ensure that adequate communications service can be supplied at a reasonable cost in order to validate recent working assumptions upon which the success of the DDS Full Scale System is dependent.
Revised description of index of Florida water data collection active stations and a user's guide for station or site information retrieval computer program FINDEX H578

USGS Publications Warehouse

Geiger, Linda H.

1983-01-01

The report is an update of U.S. Geological Survey Open-File Report 77-703, which described a retrieval program for administrative index of active data-collection sites in Florida. Extensive changes to the Findex system have been made since 1977 , making the previous report obsolete. A description of the data base and computer programs that are available in the Findex system are documented in this report. This system serves a vital need in the administration of the many and diverse water-data collection activities. District offices with extensive data-collection activities will benefit from the documentation of the system. Largely descriptive, the report tells how a file of computer card images has been established which contains entries for all sites in Florida at which there is currently a water-data collection activity. Entries include information such as identification number, station name, location, type of site, county, frequency of data collection, funding, and other pertinent details. The computer program FINDEX selectively retrieves entries and lists them in a format suitable for publication. The index is updated routinely. (USGS)
Energy and Environmental Issues in Eastern Europe and Central Asia: An Annotated Guide to Information Resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gant, K.S.

2000-10-09

Energy and environmental problems undermine the potential for sustained economic development and contribute to political and economic instability in the strategically important region surrounding the Caspian and Black Seas. Many organizations supporting efforts to resolve problems in this region have found that consensus building--a prerequisite for action--is a difficult process. Reaching agreement on priorities for investment, technical collaboration, and policy incentives depends upon informed decision-making by governments and local stakeholders. And while vast quantities of data and numerous analyses and reports are more accessible than ever, wading through the many potential sources in search of timely and relevant data ismore » a formidable task. To facilitate more successful data searches and retrieval, this document provides annotated references to over 200 specific information sources, and over twenty primary search engines and data retrieval services, that provide relevant and timely information related to the environment, energy, and economic development around the Caspian and Black Seas. This document is an advance copy of the content that Oak Ridge National Laboratory (ORNL) plans to transfer to the web in HTML format to facilitate interactive search and retrieval of information using standard web-browser software.« less
Autobiographical memory in semantic dementia: implication for theories of limbic-neocortical interaction in remote memory.

PubMed

McKinnon, Margaret C; Black, Sandra E; Miller, Bruce; Moscovitch, Morris; Levine, Brian

2006-01-01

We examined autobiographical memory performance in two patients with semantic dementia using a novel measure, the Autobiographical Interview [Levine, Svoboda, Hay, Winocur, & Moscovitch (2002). Aging and autobiographical memory: Dissociating episodic from semantic retrieval. Psychology and Aging, 17, 677-689], that is capable of dissociating episodic and personal semantic recall under varying levels of retrieval support. Earlier reports indicated that patients with semantic dementia demonstrate autobiographical episodic memory loss following a "reverse gradient" by which recent memories are preserved relative to remote memories. We found limited evidence for this pattern at conditions of low retrieval support. When structured probing was provided, patients' autobiographical memory performance was similar to that of controls. Retesting of one patient after 1 year indicated that retrieval support was insufficient to bolster performance following progressive prefrontal volume loss, as documented with quantified structural neuroimaging. These findings are discussed in relation to theories of limbic-neocortical interaction in autobiographical memory.
Automated documentation generator for advanced protein crystal growth

NASA Technical Reports Server (NTRS)

Maddux, Gary A.; Provancha, Anna; Chattam, David; Ford, Ronald

1993-01-01

The System Management and Production Laboratory at the Research Institute, the University of Alabama in Huntsville (UAH), was tasked by the Microgravity Experiment Projects (MEP) Office of the Payload Projects Office (PPO) at Marshall Space Flight Center (MSFC) to conduct research in the current methods of written documentation control and retrieval. The goals of this research were to determine the logical interrelationships within selected NASA documentation, and to expand on a previously developed prototype system to deliver a distributable, electronic knowledge-based system. This computer application would then be used to provide a paperless interface between the appropriate parties for the required NASA document.
GeoGIS : phase III.

DOT National Transportation Integrated Search

2011-08-01

GeoGIS is a web-based geotechnical database management system that is being developed for the Alabama : Department of Transportation (ALDOT). The purpose of GeoGIS is to facilitate the efficient storage and retrieval of : geotechnical documents for A...
Accident Analyses in Support of the Sludge Water System Safety Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

FINFROCK, S.H.

This document quantifies the potential health effects of the unmitigated hazards identified Hey (2002) for retrieval of sludge from the KE basin. It also identifies potential controls and any supporting mitigative analyses.
19 CFR 163.1 - Definitions.

Code of Federal Regulations, 2011 CFR

2011-04-01

... following: Statements; declarations; documents; electronically generated or machine readable data; electronically stored or transmitted information or data; books; papers; correspondence; accounts; financial accounting data; technical data; computer programs necessary to retrieve information in a usable form; and...
Usability Assessment of Secure Messaging for Clinical Document Sharing between Health Care Providers and Patients.

PubMed

Jahn, Michelle A; Porter, Brian W; Patel, Himalaya; Zillich, Alan J; Simon, Steven R; Russ, Alissa L

2018-04-01

Web-based patient portals feature secure messaging systems that enable health care providers and patients to communicate information. However, little is known about the usability of these systems for clinical document sharing. This article evaluates the usability of a secure messaging system for providers and patients in terms of its ability to support sharing of electronic clinical documents. We conducted usability testing with providers and patients in a human-computer interaction laboratory at a Midwestern U.S. hospital. Providers sent a medication list document to a fictitious patient via secure messaging. Separately, patients retrieved the clinical document from a secure message and returned it to a fictitious provider. We collected use errors, task completion, task time, and satisfaction. Twenty-nine individuals participated: 19 providers (6 physicians, 6 registered nurses, and 7 pharmacists) and 10 patients. Among providers, 11 (58%) attached and sent the clinical document via secure messaging without requiring assistance, in a median (range) of 4.5 (1.8-12.7) minutes. No patients completed tasks without moderator assistance. Patients accessed the secure messaging system within 3.6 (1.2-15.0) minutes; retrieved the clinical document within 0.8 (0.5-5.7) minutes; and sent the attached clinical document in 6.3 (1.5-18.1) minutes. Although median satisfaction ratings were high, with 5.8 for providers and 6.0 for patients (scale, 0-7), we identified 36 different use errors. Physicians and pharmacists requested additional features to support care coordination via health information technology, while nurses requested features to support efficiency for their tasks. This study examined the usability of clinical document sharing, a key feature of many secure messaging systems. Our results highlight similarities and differences between provider and patient end-user groups, which can inform secure messaging design to improve learnability and efficiency. The observations suggest recommendations for improving the technical aspects of secure messaging for clinical document sharing. Schattauer GmbH Stuttgart.
Subject and Citation Indexing. Part I: The Clustering Structure of Composite Representations in the Cystic Fibrosis Document Collection. Part II: The Optimal, Cluster-Based Retrieval Performance of Composite Representations.

ERIC Educational Resources Information Center

Shaw, W. M., Jr.

1991-01-01

Two articles discuss the clustering of composite representations in the Cystic Fibrosis Document Collection from the National Library of Medicine's MEDLINE file. Clustering is evaluated as a function of the exhaustivity of composite representations based on Medical Subject Headings (MeSH) and citation indexes, and evaluation of retrieval…
An Inquiry into Testing of Information Retrieval Systems. Comparative Systems Laboratory Final Technical Report, Part III: CSL Related Studies.

ERIC Educational Resources Information Center

Zull, Carolyn Gifford, Ed.; And Others

This third volume of the Comparative Systems Laboratory (CSL) Final Technical Report is a collection of relatively independent studies performed on CSL materials. Covered in this document are studies on: (1) properties of files, including a study of the growth rate of a dictionary of index terms as influenced by number of documents in the file and…
Portable exhauster POR-007/Skid E and POR-008/Skid F storage plan

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, O.D.

1998-07-25

This document provides storage requirements for 1,000 CFM portable exhausters POR-O07/Skid E and POR-008/Skid F. These requirements are presented in three parts: preparation for storage, storage maintenance and testing, and retrieval from storage. The exhauster component identification numbers listed in this document contain the prefix POR-007 or POR-008 depending on which exhauster is being used.
Documentation--INFO: A Small Computer Data Base Management System for School Applications. The Illinois Series on Educational Application of Computers, No. 24e.

ERIC Educational Resources Information Center

Cox, John

This paper documents the program used in the application of the INFO system for data storage and retrieval in schools, from the viewpoints of both the unsophisticated user and the experienced programmer interested in using the INFO system or modifying it for use within an existing school's computer system. The opening user's guide presents simple…

Pipelining Architecture of Indexing Using Agglomerative Clustering

NASA Astrophysics Data System (ADS)

Goyal, Deepika; Goyal, Deepti; Gupta, Parul

2010-11-01

The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.
Issues and solutions for storage, retrieval, and searching of MPEG-7 documents

NASA Astrophysics Data System (ADS)

Chang, Yuan-Chi; Lo, Ming-Ling; Smith, John R.

2000-10-01

The ongoing MPEG-7 standardization activity aims at creating a standard for describing multimedia content in order to facilitate the interpretation of the associated information content. Attempting to address a broad range of applications, MPEG-7 has defined a flexible framework consisting of Descriptors, Description Schemes, and Description Definition Language. Descriptors and Description Schemes describe features, structure and semantics of multimedia objects. They are written in the Description Definition Language (DDL). In the most recent revision, DDL applies XML (Extensible Markup Language) Schema with MPEG-7 extensions. DDL has constructs that support inclusion, inheritance, reference, enumeration, choice, sequence, and abstract type of Description Schemes and Descriptors. In order to enable multimedia systems to use MPEG-7, a number of important problems in storing, retrieving and searching MPEG-7 documents need to be solved. This paper reports on initial finding on issues and solutions of storing and accessing MPEG-7 documents. In particular, we discuss the benefits of using a virtual document management framework based on XML Access Server (XAS) in order to bridge the MPEG-7 multimedia applications and database systems. The need arises partly because MPEG-7 descriptions need customized storage schema, indexing and search engines. We also discuss issues arising in managing dependence and cross-description scheme search.
JANE, A new information retrieval system for the Radiation Shielding Information Center

DOE Office of Scientific and Technical Information (OSTI.GOV)

Trubey, D.K.

A new information storage and retrieval system has been developed for the Radiation Shielding Information Center (RSIC) at Oak Ridge National Laboratory to replace mainframe systems that have become obsolete. The database contains citations and abstracts of literature which were selected by RSIC analysts and indexed with terms from a controlled vocabulary. The database, begun in 1963, has been maintained continuously since that time. The new system, called JANE, incorporates automatic indexing techniques and on-line retrieval using the RSIC Data General Eclipse MV/4000 minicomputer, Automatic indexing and retrieval techniques based on fuzzy-set theory allow the presentation of results in ordermore » of Retrieval Status Value. The fuzzy-set membership function depends on term frequency in the titles and abstracts and on Term Discrimination Values which indicate the resolving power of the individual terms. These values are determined by the Cover Coefficient method. The use of a commercial database base to store and retrieve the indexing information permits rapid retrieval of the stored documents. Comparisons of the new and presently-used systems for actual searches of the literature indicate that it is practical to replace the mainframe systems with a minicomputer system similar to the present version of JANE. 18 refs., 10 figs.« less
System Description for Tank 241-AZ-101 Waste Retrieval Data Acquisition System

DOE Office of Scientific and Technical Information (OSTI.GOV)

ROMERO, S.G.

2000-02-14

The proposed activity provides the description of the Data Acquisition System for Tank 241-AZ-101. This description is documented in HNF-5572, Tank 241-AZ-101 Waste Retrieval Data Acquisition System (DAS). This activity supports the planned mixer pump tests for Tank 241-AZ-101. Tank 241-AZ-101 has been selected for the first full-scale demonstration of a mixer pump system. The tank currently holds over 960,000 gallons of neutralized current acid waste, including approximately 12.7 inches of settling solids (sludge) at the bottom of the tank. As described in Addendum 4 of the FSAR (LMHC 2000a), two 300 HP mixer pumps with associated measurement and monitoringmore » equipment have been installed in Tank 241-AZ-101. The purpose of the Tank 241-AZ-101 retrieval system Data Acquisition System (DAS) is to provide monitoring and data acquisition of key parameters in order to confirm the effectiveness of the mixer pumps utilized for suspending solids in the tank. The suspension of solids in Tank 241-AZ-101 is necessary for pretreatment of the neutralized current acid waste and eventual disposal as glass via the Hanford Waste Vitrification Plant. HNF-5572 provides a basic description of the Tank 241-AZ-101 retrieval system DAS, including the field instrumentation and application software. The DAS is provided to fulfill requirements for data collection and monitoring. This document is not an operations procedure or is it intended to describe the mixing operation. This USQ screening provides evaluation of HNF-5572 (Revision 1) including the changes as documented on ECN 654001. The changes include (1) add information on historical trending and data backup, (2) modify DAS I/O list in Appendix E to reflect actual conditions in the field, and (3) delete IP address in Appendix F per Lockheed Martin Services, Inc. request.« less
Agent-based method for distributed clustering of textual information

DOEpatents

Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN

2010-09-28

A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.
An overview of the National Space Science data Center Standard Information Retrieval System (SIRS)

NASA Technical Reports Server (NTRS)

Shapiro, A.; Blecher, S.; Verson, E. E.; King, M. L. (Editor)

1974-01-01

A general overview is given of the National Space Science Data Center (NSSDC) Standard Information Retrieval System. A description, in general terms, the information system that contains the data files and the software system that processes and manipulates the files maintained at the Data Center. Emphasis is placed on providing users with an overview of the capabilities and uses of the NSSDC Standard Information Retrieval System (SIRS). Examples given are taken from the files at the Data Center. Detailed information about NSSDC data files is documented in a set of File Users Guides, with one user's guide prepared for each file processed by SIRS. Detailed information about SIRS is presented in the SIRS Users Guide.
Patent Family Databases.

ERIC Educational Resources Information Center

Simmons, Edlyn S.

1985-01-01

Reports on retrieval of patent information online and includes definition of patent family, basic and equivalent patents, "parents and children" applications, designated states, patent family databases--International Patent Documentation Center, World Patents Index, APIPAT (American Petroleum Institute), CLAIMS (IFI/Plenum). A table…
Dynamic storage in resource-scarce browsing multimedia applications

NASA Astrophysics Data System (ADS)

Elenbaas, Herman; Dimitrova, Nevenka

1998-10-01

In the convergence of information and entertainment there is a conflict between the consumer's expectation of fast access to high quality multimedia content through narrow bandwidth channels versus the size of this content. During the retrieval and information presentation of a multimedia application there are two problems that have to be solved: the limited bandwidth during transmission of the retrieved multimedia content and the limited memory for temporary caching. In this paper we propose an approach for latency optimization in information browsing applications. We proposed a method for flattening hierarchically linked documents in a manner convenient for network transport over slow channels to minimize browsing latency. Flattening of the hierarchy involves linearization, compression and bundling of the document nodes. After the transfer, the compressed hierarchy is stored on a local device where it can be partly unbundled to fit the caching limits at the local site while giving the user availability to the content.
Using Replicates in Information Retrieval Evaluation.

PubMed

Voorhees, Ellen M; Samarov, Daniel; Soboroff, Ian

2017-09-01

This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions-something not possible without replicates-yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness.
User Evaluation of the NASA Technical Report Server Recommendation Service

NASA Technical Reports Server (NTRS)

Nelson, Michael L.; Bollen, Johan; Calhoun, JoAnne R.; Mackey, Calvin E.

2004-01-01

We present the user evaluation of two recommendation server methodologies implemented for the NASA Technical Report Server (NTRS). One methodology for generating recommendations uses log analysis to identify co-retrieval events on full-text documents. For comparison, we used the Vector Space Model (VSM) as the second methodology. We calculated cosine similarities and used the top 10 most similar documents (based on metadata) as recommendations . We then ran an experiment with NASA Langley Research Center (LaRC) staff members to gather their feedback on which method produced the most quality recommendations. We found that in most cases VSM outperformed log analysis of co-retrievals. However, analyzing the data revealed the evaluations may have been structurally biased in favor of the VSM generated recommendations. We explore some possible methods for combining log analysis and VSM generated recommendations and suggest areas of future work.
User Evaluation of the NASA Technical Report Server Recommendation Service

NASA Technical Reports Server (NTRS)

Nelson, Michael L.; Bollen, Johan; Calhoun, JoAnne R.; Mackey, Calvin E.

2004-01-01

We present the user evaluation of two recommendation server methodologies implemented for the NASA Technical Report Server (NTRS). One methodology for generating recommendations uses log analysis to identify co-retrieval events on full-text documents. For comparison, we used the Vector Space Model (VSM) as the second methodology. We calculated cosine similarities and used the top 10 most similar documents (based on metadata) as 'recommendations'. We then ran an experiment with NASA Langley Research Center (LaRC) staff members to gather their feedback on which method produced the most 'quality' recommendations. We found that in most cases VSM outperformed log analysis of co-retrievals. However, analyzing the data revealed the evaluations may have been structurally biased in favor of the VSM generated recommendations. We explore some possible methods for combining log analysis and VSM generated recommendations and suggest areas of future work.
Using Replicates in Information Retrieval Evaluation

PubMed Central

VOORHEES, ELLEN M.; SAMAROV, DANIEL; SOBOROFF, IAN

2018-01-01

This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions—something not possible without replicates—yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness. PMID:29905334
The integration of system specifications and program coding

NASA Technical Reports Server (NTRS)

Luebke, W. R.

1970-01-01

Experience in maintaining up-to-date documentation for one module of the large-scale Medical Literature Analysis and Retrieval System 2 (MEDLARS 2) is described. Several innovative techniques were explored in the development of this system's data management environment, particularly those that use PL/I as an automatic documenter. The PL/I data description section can provide automatic documentation by means of a master description of data elements that has long and highly meaningful mnemonic names and a formalized technique for the production of descriptive commentary. The techniques discussed are practical methods that employ the computer during system development in a manner that assists system implementation, provides interim documentation for customer review, and satisfies some of the deliverable documentation requirements.
Retrieval Performance Prediction and Document Quality

DTIC Science & Technology

2007-09-01

can help. However, there are hard-to-expand queries that the method fail to detect. One is “ Legionnaires disease ” [delta average precision (REL-QL...0.248, model comparison score:-0.256 ] where documents can contain the terms “ legionnaire (meaning soldier)” and “ disease ” (and 99 related words...yet not be about Legionnaires ’ disease , leading to a low comparison score despite its hard-to-expand status. 4.5 Summary Several prediction
Development and evaluation of a biomedical search engine using a predicate-based vector space model.

PubMed

Kwak, Myungjae; Leroy, Gondy; Martinez, Jesse D; Harwell, Jeffrey

2013-10-01

Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search. Copyright © 2013 Elsevier Inc. All rights reserved.
Translation lexicon acquisition from bilingual dictionaries

NASA Astrophysics Data System (ADS)

Doermann, David S.; Ma, Huanfeng; Karagol-Ayan, Burcu; Oard, Douglas W.

2001-12-01

Bilingual dictionaries hold great potential as a source of lexical resources for training automated systems for optical character recognition, machine translation and cross-language information retrieval. In this work we describe a system for extracting term lexicons from printed copies of bilingual dictionaries. We describe our approach to page and definition segmentation and entry parsing. We have used the approach to parse a number of dictionaries and demonstrate the results for retrieval using a French-English Dictionary to generate a translation lexicon and a corpus of English queries applied to French documents to evaluation cross-language IR.
CAESAR : an expert system for evaluation of scour and stream stability

DOT National Transportation Integrated Search

1999-01-01

This report documents the development and testing of a field-deployable, knowledge-based decision support system that assists bridge inspectors by acquiring, cataloging, storing, and retrieving information necessary for the evaluation of a bridge for...
34 CFR 5.2 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-07-01

..., and retrievable in electronic format; (ii) Records maintained for the Department by a private entity under a records management contract with the Federal Government; and (iii) Documentary materials... documents preserved only for convenience of reference; stocks of publications; and personal records created...
34 CFR 5.2 - Definitions.

Code of Federal Regulations, 2011 CFR

2011-07-01

..., and retrievable in electronic format; (ii) Records maintained for the Department by a private entity under a records management contract with the Federal Government; and (iii) Documentary materials... documents preserved only for convenience of reference; stocks of publications; and personal records created...
34 CFR 5.2 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-07-01

..., and retrievable in electronic format; (ii) Records maintained for the Department by a private entity under a records management contract with the Federal Government; and (iii) Documentary materials... documents preserved only for convenience of reference; stocks of publications; and personal records created...

Using Concept Relations to Improve Ranking in Information Retrieval

PubMed Central

Price, Susan L.; Delcambre, Lois M.

2005-01-01

Despite improved search engine technology, most searches return numerous documents not directly related to the query. This problem is mitigated if relevant documents appear high on a ranked list of search results. We propose that some queries and the underlying information needs can be modeled as relationships between concepts (relations), and we match relations in queries to relations in documents to try to improve ranking of search results. We investigate four techniques to identify two relationships important in medicine, causes and treats, to improve the ranking of medical text documents relevant to clinical questions about causation and treatment. Preliminary results suggest that identifying relation instances can improve the ranking of search results. PMID:16779114
EM-21 Retrieval Knowledge Center: Waste Retrieval Challenges

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fellinger, Andrew P.; Rinker, Michael W.; Berglin, Eric J.

EM-21 is the Waste Processing Division of the Office of Engineering and Technology, within the U.S. Department of Energy’s (DOE) Office of Environmental Management (EM). In August of 2008, EM-21 began an initiative to develop a Retrieval Knowledge Center (RKC) to provide the DOE, high level waste retrieval operators, and technology developers with centralized and focused location to share knowledge and expertise that will be used to address retrieval challenges across the DOE complex. The RKC is also designed to facilitate information sharing across the DOE Waste Site Complex through workshops, and a searchable database of waste retrieval technology information.more » The database may be used to research effective technology approaches for specific retrieval tasks and to take advantage of the lessons learned from previous operations. It is also expected to be effective for remaining current with state-of-the-art of retrieval technologies and ongoing development within the DOE Complex. To encourage collaboration of DOE sites with waste retrieval issues, the RKC team is co-led by the Savannah River National Laboratory (SRNL) and the Pacific Northwest National Laboratory (PNNL). Two RKC workshops were held in the Fall of 2008. The purpose of these workshops was to define top level waste retrieval functional areas, exchange lessons learned, and develop a path forward to support a strategic business plan focused on technology needs for retrieval. The primary participants involved in these workshops included retrieval personnel and laboratory staff that are associated with Hanford and Savannah River Sites since the majority of remaining DOE waste tanks are located at these sites. This report summarizes and documents the results of the initial RKC workshops. Technology challenges identified from these workshops and presented here are expected to be a key component to defining future RKC-directed tasks designed to facilitate tank waste retrieval solutions.« less
Tobacco documents research methodology

PubMed Central

McCandless, Phyra M; Klausner, Kim; Taketa, Rachel; Yerger, Valerie B

2011-01-01

Tobacco documents research has developed into a thriving academic enterprise since its inception in 1995. The technology supporting tobacco documents archiving, searching and retrieval has improved greatly since that time, and consequently tobacco documents researchers have considerably more access to resources than was the case when researchers had to travel to physical archives and/or electronically search poorly and incompletely indexed documents. The authors of the papers presented in this supplement all followed the same basic research methodology. Rather than leave the reader of the supplement to read the same discussion of methods in each individual paper, presented here is an overview of the methods all authors followed. In the individual articles that follow in this supplement, the authors present the additional methodological information specific to their topics. This brief discussion also highlights technological capabilities in the Legacy Tobacco Documents Library and updates methods for organising internal tobacco documents data and findings. PMID:21504933
Retrieval characteristics of the Bard Denali and Argon Option inferior vena cava filters.

PubMed

Dowell, Joshua D; Semaan, Dominic; Makary, Mina S; Ryu, John; Khayat, Mamdouh; Pan, Xueliang

2017-11-01

The purpose of this study was to compare the retrieval characteristics of the Option Elite (Argon Medical, Plano, Tex) and Denali (Bard, Tempe, Ariz) retrievable inferior vena cava filters (IVCFs), two filters that share a similar conical design. A single-center, retrospective study reviewed all Option and Denali IVCF removals during a 36-month period. Attempted retrievals were classified as advanced if the routine "snare and sheath" technique was initially unsuccessful despite multiple attempts or an alternative endovascular maneuver or access site was used. Patient and filter characteristics were documented. In our study, 63 Option and 45 Denali IVCFs were retrieved, with an average dwell time of 128.73 and 99.3 days, respectively. Significantly higher median fluoroscopy times were experienced in retrieving the Option filter compared with the Denali filter (12.18 vs 6.85 minutes; P = .046). Use of adjunctive techniques was also higher in comparing the Option filter with the Denali filter (19.0% vs 8.7%; P = .079). No significant difference was noted between these groups in regard to gender, age, or history of malignant disease. Option IVCF retrieval procedures required significantly longer retrieval fluoroscopy time compared with Denali IVCFs. Although procedure time was not analyzed in this study, as a surrogate, the increased fluoroscopy time may also have an impact on procedural direct costs and throughput. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Increasing alcohol restrictions and rates of serious injury in four remote Australian Indigenous communities.

PubMed

Margolis, Stephen A; Ypinazar, Valmae A; Muller, Reinhold; Clough, Alan

2011-05-16

To document rates of serious injuries in relation to government alcohol restrictions in remote Australian Indigenous communities. An ecological study using Royal Flying Doctor Service injury retrieval data, before and after changes in legal access to alcohol in four remote Australian Indigenous communities, Queensland, 1 January 1996-31 July 2010. Changes in rates of aeromedical retrievals for serious injury, and proportion of retrievals for serious injury, before and after alcohol restrictions. After alcohol restrictions were introduced in 2002-2003, retrieval rates for serious injury dropped initially, and then increased in the 2 years before further restrictions in 2008 (average increase, 2.34 per 1000 per year). This trend reversed in the 2 years after the 2008 restrictions (average decrease, 7.97 per 1000 per year). There was a statistically significant decreasing time trend in serious-injury retrieval rates in each of the four communities for the period 2 years before the 2002-2003 restrictions, 2 years before the 2008 restrictions, and the final 2 years of observations (2009-2010) (P < 0.001 for all four communities combined). Overall, serious-injury retrieval rates dropped from 30 per 1000 in 2008 to 14 per 1000 in 2010, and the proportions of serious-injury retrievals decreased significantly for all four communities. The absolute and the proportional rates of serious-injury retrievals fell significantly as government restrictions on legal access to alcohol increased; they are now at their lowest recorded level in 15 years.
Worldwide research productivity of paracetamol (acetaminophen) poisoning: a bibliometric analysis (2003-2012).

PubMed

Zyoud, S H; Al-Jabi, S W; Sweileh, W M

2015-01-01

There is a lack of data concerning the evaluation of scientific research productivity in paracetamol poisoning from the world. The purposes of this study were to analyse the worldwide research output related to paracetamol poisoning and to examine the authorship pattern and the citations retrieved from the Scopus database for over a decade. Data were searched for documents with specific words regarding paracetamol poisoning as 'keywords' in the title or/and abstract. Scientific output was evaluated based on a methodology developed and used in other bibliometric studies. Research productivity was adjusted to the national population and nominal gross domestic product (GDP) per capita. There were 1721 publications that met the criteria during study period from the world. All retrieved documents were published from 72 countries. The largest number of articles related to paracetamol poisoning was from the United States (US; 30.39%), followed by India (10.75%) and the United Kingdom (UK; 9.36%). The total number of citations at the time of data analysis was 21,109, with an average of 12.3 citations per each documents and median (interquartile range) of 4 (1-14). The h-index of the retrieved documents was 57. After adjusting for economy and population power, India (124.2), Nigeria (18.6) and the US (10.5) had the highest research productivity. Countries with large economies, such as the UK, Australia, Japan, China and France, tended to rank relatively low after adjustment for GDP over the entire study period. Our study demonstrates evidence that research productivity related to paracetamol poisoning has increased rapidly during the recent years. The US obviously dominated in research productivity. However, certain smaller country such as Nigeria has high scientific output relative to their population size and GDP. A highly noticeable increase in the contributions of Asia-Pacific and Middle East regions to scientific literature related to paracetamol poisoning was also observed. © The Author(s) 2014.
Mixed-handedness advantages in episodic memory obtained under conditions of intentional learning extend to incidental learning.

PubMed

Christman, Stephen D; Butler, Michael

2011-10-01

The existence of handedness differences in the retrieval of episodic memories is well-documented, but virtually all have been obtained under conditions of intentional learning. Two experiments are reported that extend the presence of such handedness differences to memory retrieval under conditions of incidental learning. Experiment 1 used Craik and Tulving's (1975) classic levels-of-processing paradigm and obtained handedness differences under incidental and intentional conditions of deep processing, but not under conditions of shallow incidental processing. Experiment 2 looked at incidental memory for distracter items from a recognition memory task and again found a mixed-handed advantage. Results are discussed in terms of the relation between interhemispheric interaction, levels of processing, and episodic memory retrieval. Copyright © 2011 Elsevier Inc. All rights reserved.
Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers.

PubMed

Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Rzepa, Henry S

2015-01-01

We describe three different procedures based on metadata standards for enabling automated retrieval of scientific data from digital repositories utilising the persistent identifier of the dataset with optional specification of the attributes of the data document such as filename or media type. The procedures are demonstrated using the JSmol molecular visualizer as a component of a web page and Avogadro as a stand-alone modelling program. We compare our methods for automated retrieval of data from a standards-compliant data repository with those currently in operation for a selection of existing molecular databases and repositories. Our methods illustrate the importance of adopting a standards-based approach of using metadata declarations to increase access to and discoverability of repository-based data. Graphical abstract.
On-Demand Associative Cross-Language Information Retrieval

NASA Astrophysics Data System (ADS)

Geraldo, André Pinto; Moreira, Viviane P.; Gonçalves, Marcos A.

This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.
Stackfile Database

NASA Technical Reports Server (NTRS)

deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

2013-01-01

This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.
Cluster-based query expansion using external collections in medical information retrieval.

PubMed

Oh, Heung-Seon; Jung, Yuchul

2015-12-01

Utilizing external collections to improve retrieval performance is challenging research because various test collections are created for different purposes. Improving medical information retrieval has also gained much attention as various types of medical documents have become available to researchers ever since they started storing them in machine processable formats. In this paper, we propose an effective method of utilizing external collections based on the pseudo relevance feedback approach. Our method incorporates the structure of external collections in estimating individual components in the final feedback model. Extensive experiments on three medical collections (TREC CDS, CLEF eHealth, and OHSUMED) were performed, and the results were compared with a representative expansion approach utilizing the external collections to show the superiority of our method. Copyright © 2015 Elsevier Inc. All rights reserved.
Web image retrieval using an effective topic and content-based technique

NASA Astrophysics Data System (ADS)

Lee, Ching-Cheng; Prabhakara, Rashmi

2005-03-01

There has been an exponential growth in the amount of image data that is available on the World Wide Web since the early development of Internet. With such a large amount of information and image available and its usefulness, an effective image retrieval system is thus greatly needed. In this paper, we present an effective approach with both image matching and indexing techniques that improvise on existing integrated image retrieval methods. This technique follows a two-phase approach, integrating query by topic and query by example specification methods. In the first phase, The topic-based image retrieval is performed by using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. This technique consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. In the second phase, we use query by example specification to perform a low-level content-based image match in order to retrieve smaller and relatively closer results of the example image. From this, information related to the image feature is automatically extracted from the query image. The main objective of our approach is to develop a functional image search and indexing technique and to demonstrate that better retrieval results can be achieved.
Care episode retrieval: distributional semantic models for information retrieval in the clinical domain.

PubMed

Moen, Hans; Ginter, Filip; Marsi, Erwin; Peltonen, Laura-Maria; Salakoski, Tapio; Salanterä, Sanna

2015-01-01

Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a--possibly unfinished--care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task.
Care episode retrieval: distributional semantic models for information retrieval in the clinical domain

PubMed Central

2015-01-01

Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task. PMID:26099735
An architecture for diversity-aware search for medical web content.

PubMed

Denecke, K

2012-01-01

The Web provides a huge source of information, also on medical and health-related issues. In particular the content of medical social media data can be diverse due to the background of an author, the source or the topic. Diversity in this context means that a document covers different aspects of a topic or a topic is described in different ways. In this paper, we introduce an approach that allows to consider the diverse aspects of a search query when providing retrieval results to a user. We introduce a system architecture for a diversity-aware search engine that allows retrieving medical information from the web. The diversity of retrieval results is assessed by calculating diversity measures that rely upon semantic information derived from a mapping to concepts of a medical terminology. Considering these measures, the result set is diversified by ranking more diverse texts higher. The methods and system architecture are implemented in a retrieval engine for medical web content. The diversity measures reflect the diversity of aspects considered in a text and its type of information content. They are used for result presentation, filtering and ranking. In a user evaluation we assess the user satisfaction with an ordering of retrieval results that considers the diversity measures. It is shown through the evaluation that diversity-aware retrieval considering diversity measures in ranking could increase the user satisfaction with retrieval results.
Independent Orbiter Assessment (IOA): Analysis of the remote manipulator system

NASA Technical Reports Server (NTRS)

Tangorra, F.; Grasmeder, R. F.; Montgomery, A. D.

1987-01-01

The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items (PCIs). To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. The independent analysis results for the Orbiter Remote Manipulator System (RMS) are documented. The RMS hardware and software are primarily required for deploying and/or retrieving up to five payloads during a single mission, capture and retrieve free-flying payloads, and for performing Manipulator Foot Restraint operations. Specifically, the RMS hardware consists of the following components: end effector; displays and controls; manipulator controller interface unit; arm based electronics; and the arm. The IOA analysis process utilized available RMS hardware drawings, schematics and documents for defining hardware assemblies, components and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. Of the 574 failure modes analyzed, 413 were determined to be PCIs.
A novel architecture for information retrieval system based on semantic web

NASA Astrophysics Data System (ADS)

Zhang, Hui

2011-12-01

Nowadays, the web has enabled an explosive growth of information sharing (there are currently over 4 billion pages covering most areas of human endeavor) so that the web has faced a new challenge of information overhead. The challenge that is now before us is not only to help people locating relevant information precisely but also to access and aggregate a variety of information from different resources automatically. Current web document are in human-oriented formats and they are suitable for the presentation, but machines cannot understand the meaning of document. To address this issue, Berners-Lee proposed a concept of semantic web. With semantic web technology, web information can be understood and processed by machine. It provides new possibilities for automatic web information processing. A main problem of semantic web information retrieval is that when these is not enough knowledge to such information retrieval system, the system will return to a large of no sense result to uses due to a huge amount of information results. In this paper, we present the architecture of information based on semantic web. In addiction, our systems employ the inference Engine to check whether the query should pose to Keyword-based Search Engine or should pose to the Semantic Search Engine.
Video Information Communication and Retrieval/Image Based Information System (VICAR/IBIS)

NASA Technical Reports Server (NTRS)

Wherry, D. B.

1981-01-01

The acquisition, operation, and planning stages of installing a VICAR/IBIS system are described. The system operates in an IBM mainframe environment, and provides image processing of raster data. System support problems with software and documentation are discussed.
Mission analysis for cross-site transfer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Riesenweber, S.D.; Fritz, R.L.; Shipley, L.E.

1995-11-01

The Mission Analysis Report describes the requirements and constraints associated with the Transfer Waste Function as necessary to support the Manage Tank Waste, Retrieve Waste, and Process Tank Waste Functions described in WHC-SD-WM-FRD-020, Tank Waste Remediation System (TWRS) Functions and Requirements Document and DOE/RL-92-60, Revision 1, TWRS Functions and Requirements Document, March 1994. It further assesses the ability of the ``initial state`` (or current cross-site transfer system) to meet the requirements and constraints.
Architectural Design Document for the Technology Demonstration of the Joint Network Defence and Management System (JNDMS) Project

DTIC Science & Technology

2009-09-21

specified by contract no. W7714-040875/001/SV. This document contains the design of the JNDMS software to the system architecture level. Other...alternative for the presentation functions. ASP, Java, ActiveX , DLL, HTML, DHTML, SOAP, .NET HTML, DHTML, XML, Jscript, VBScript, SOAP, .NET...retrieved through the network, typically by a network management console. Information is contained in a Management Information Base (MIB), which is a data

Commercial applications for optical data storage

NASA Astrophysics Data System (ADS)

Tas, Jeroen

1991-03-01

Optical data storage has spurred the market for document imaging systems. These systems are increasingly being used to electronically manage the processing, storage and retrieval of documents. Applications range from straightforward archives to sophisticated workflow management systems. The technology is developing rapidly and within a few years optical imaging facilities will be incorporated in most of the office information systems. This paper gives an overview of the status of the market, the applications and the trends of optical imaging systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

IRIS is a search tool plug-in that is used to implement latent topic feedback for enhancing text navigation. It accepts a list of returned documents from an information retrieval wywtem that is generated from keyword search queries. Data is pulled directly from a topic information database and processed by IRIS to determine the most prominent and relevant topics, along with topic-ngrams, associated with the list of returned documents. User selected topics are then used to expand the query and presumabley refine the search results.
Health search engine with e-document analysis for reliable search results.

PubMed

Gaudinat, Arnaud; Ruch, Patrick; Joubert, Michel; Uziel, Philippe; Strauss, Anne; Thonnet, Michèle; Baud, Robert; Spahni, Stéphane; Weber, Patrick; Bonal, Juan; Boyer, Celia; Fieschi, Marius; Geissbuhler, Antoine

2006-01-01

After a review of the existing practical solution available to the citizen to retrieve eHealth document, the paper describes an original specialized search engine WRAPIN. WRAPIN uses advanced cross lingual information retrieval technologies to check information quality by synthesizing medical concepts, conclusions and references contained in the health literature, to identify accurate, relevant sources. Thanks to MeSH terminology [1] (Medical Subject Headings from the U.S. National Library of Medicine) and advanced approaches such as conclusion extraction from structured document, reformulation of the query, WRAPIN offers to the user a privileged access to navigate through multilingual documents without language or medical prerequisites. The results of an evaluation conducted on the WRAPIN prototype show that results of the WRAPIN search engine are perceived as informative 65% (59% for a general-purpose search engine), reliable and trustworthy 72% (41% for the other engine) by users. But it leaves room for improvement such as the increase of database coverage, the explanation of the original functionalities and an audience adaptability. Thanks to evaluation outcomes, WRAPIN is now in exploitation on the HON web site (http://www.healthonnet.org), free of charge. Intended to the citizen it is a good alternative to general-purpose search engines when the user looks up trustworthy health and medical information or wants to check automatically a doubtful content of a Web page.
Transuranic Waste Program Framework Agreement - December Deliverable July 2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Patricia

Framework agreement deliverables are: (1) 'DOE/NNSA commits to complete removal of all non-cemented above-ground EM Legacy TRU and newly generated TRU currently-stored at Area G as of October 1, 2011, by no later than June 30, 2014. This inventory of above-ground TRU is defined as 3706 cubic meters of material.' (2) 'DOE commits to the complete removal of all newly generated TRU received in Area G during FY 2012 and 2013 by no later than December 31, 2014.' (3) 'Based on projected funding profiles, DOE/NNSA will develop by December 31, 2012, a schedule, including pacing milestones, for disposition of themore » below-ground TRU requiring retrieval at Area G.' Objectives are to: (1) restore the 'Core Team' to develop the December, 2012 deliverable; (2) obtain agreement on the strategy for below ground water disposition; and (3) establish timeline for completion of the deliverable. Below Grade Waste Strategy is to: (1) Perform an evaluation on below grade waste currently considered retrievable TRU; (2) Only commit to retrieve waste that must be retrieved; (3) Develop the Deliverable including Pacing Milestones based on planned commitments; (4) Align all Regulatory Documents for Consistency; and (5) answer these 3 primary questions, is the waste TRU; is the waste retrievable, can retrieval cause more harm than benefit?« less
Validating the AIRS Version 5 CO Retrieval with DACOM In Situ Measurements During INTEX-A and -B

NASA Technical Reports Server (NTRS)

McMillan, Wallace W.; Evans, Keith D.; Barnet, Christopher D.; Maddy, Eric; Sachse, Glen W.; Diskin, Glenn S.

2011-01-01

Herein we provide a description of the atmospheric infrared sounder (AIRS) version 5 (v5) carbon monoxide (CO) retrieval algorithm and its validation with the DACOM in situ measurements during the INTEX-A and -B campaigns. All standard and support products in the AIRS v5 CO retrieval algorithm are documented. Building on prior publications, we describe the convolution of in situ measurements with the AIRS v5 CO averaging kernel and first-guess CO profile as required for proper validation. Validation is accomplished through comparison of AIRS CO retrievals with convolved in situ CO profiles acquired during the NASA Intercontinental Chemical Transport Experiments (INTEX) in 2004 and 2006. From 143 profiles in the northern mid-latitudes during these two experiments, we find AIRS v5 CO retrievals are biased high by 6% 10% between 900 and 300 hPa with a root-mean-square error of 8% 12%. No significant differences were found between validation using spiral profiles coincident with AIRS overpasses and in-transit profiles under the satellite track but up to 13 h off in time. Similarly, no significant differences in validation results were found for ocean versus land, day versus night, or with respect to retrieved cloud top pressure or cloud fraction.
Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016

PubMed Central

Cieslewicz, Artur; Dutkiewicz, Jakub; Jedrzejek, Czeslaw

2018-01-01

Abstract Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data PMID:29688372
Information Clustering Based on Fuzzy Multisets.

ERIC Educational Resources Information Center

Miyamoto, Sadaaki

2003-01-01

Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Autobiographical Memory Specificity among Preschool-Aged Children

ERIC Educational Resources Information Center

Nuttall, Amy K.; Valentino, Kristin; Comas, Michelle; McNeill, Anne T.; Stey, Paul C.

2014-01-01

"Overgeneral memory" refers to difficulty retrieving specific autobiographical memories and is consistently associated with depression and/or trauma. The present study developed a downward extension of the Autobiographical Memory Test (AMT; Williams & Broadbent, 1986) given the need to document normative developmental changes in…
Selected Mechanized Scientific and Technical Information Systems.

ERIC Educational Resources Information Center

Ackerman, Lynn, Ed.; And Others

The publication describes the following thirteen computer-based, operational systems designed primarily for the announcement, storage, retrieval and secondary distribution of scientific and technical reports: Defense Documentation Center; Highway Research Board; National Aeronautics and Space Administration; National Library of Medicine; U.S.…
CCIS Experiment : Comparing Transit Information Retrieval Modes at the Southern California Rapid Transit District

DOT National Transportation Integrated Search

1984-03-01

This report documents the results of a controlled experiment performed in the Telephone Information Section of the Marketing Department at the Southern California Rapid Transit District (SCRTD) in Los Angeles. The Telephone Information Section is the...
The design, implementation, and use of a statewide land use inventory: The New York experience

NASA Technical Reports Server (NTRS)

Hardy, E. E.

1975-01-01

The New York State land use and natural resource inventory is described with emphasis on its design, implementation, and user requirements. Other topics discussed include: classification, data acquisition, geographic referencing, data storage, data retrieval, and documentation.
On the Application of Syntactic Methodologies in Automatic Text Analysis.

ERIC Educational Resources Information Center

Salton, Gerard; And Others

1990-01-01

Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…
36 CFR 1222.26 - What are the general recordkeeping requirements for agency programs?

Code of Federal Regulations, 2011 CFR

2011-07-01

... NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT CREATION AND MAINTENANCE OF FEDERAL RECORDS... to document program policies, procedures, functions, activities, and transactions; (b) The office... administrator responsible for ensuring authenticity, protection, and ready retrieval of electronic records; (c...
36 CFR 1222.26 - What are the general recordkeeping requirements for agency programs?

Code of Federal Regulations, 2012 CFR

2012-07-01

... NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT CREATION AND MAINTENANCE OF FEDERAL RECORDS... to document program policies, procedures, functions, activities, and transactions; (b) The office... administrator responsible for ensuring authenticity, protection, and ready retrieval of electronic records; (c...
36 CFR 1222.26 - What are the general recordkeeping requirements for agency programs?

Code of Federal Regulations, 2014 CFR

2014-07-01

... NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT CREATION AND MAINTENANCE OF FEDERAL RECORDS... to document program policies, procedures, functions, activities, and transactions; (b) The office... administrator responsible for ensuring authenticity, protection, and ready retrieval of electronic records; (c...
Semantic retrieval and navigation in clinical document collections.

PubMed

Kreuzthaler, Markus; Daumke, Philipp; Schulz, Stefan

2015-01-01

Patients with chronic diseases undergo numerous in- and outpatient treatment periods, and therefore many documents accumulate in their electronic records. We report on an on-going project focussing on the semantic enrichment of medical texts, in order to support recall-oriented navigation across a patient's complete documentation. A document pool of 1,696 de-identified discharge summaries was used for prototyping. A natural language processing toolset for document annotation (based on the text-mining framework UIMA) and indexing (Solr) was used to support a browser-based platform for document import, search and navigation. The integrated search engine combines free text and concept-based querying, supported by dynamically generated facets (diagnoses, procedures, medications, lab values, and body parts). The prototype demonstrates the feasibility of semantic document enrichment within document collections of a single patient. Originally conceived as an add-on for the clinical workplace, this technology could also be adapted to support personalised health record platforms, as well as cross-patient search for cohort building and other secondary use scenarios.
An exponentiation method for XML element retrieval.

PubMed

Wichaiwong, Tanakorn

2014-01-01

XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP.
Development of an information retrieval tool for biomedical patents.

PubMed

Alves, Tiago; Rodrigues, Rúben; Costa, Hugo; Rocha, Miguel

2018-06-01

The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipeline's main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools. Copyright © 2018 Elsevier B.V. All rights reserved.
User-Centered Indexing for Adaptive Information Access

NASA Technical Reports Server (NTRS)

Chen, James R.; Mathe, Nathalie

1996-01-01

We are focusing on information access tasks characterized by large volume of hypermedia connected technical documents, a need for rapid and effective access to familiar information, and long-term interaction with evolving information. The problem for technical users is to build and maintain a personalized task-oriented model of the information to quickly access relevant information. We propose a solution which provides user-centered adaptive information retrieval and navigation. This solution supports users in customizing information access over time. It is complementary to information discovery methods which provide access to new information, since it lets users customize future access to previously found information. It relies on a technique, called Adaptive Relevance Network, which creates and maintains a complex indexing structure to represent personal user's information access maps organized by concepts. This technique is integrated within the Adaptive HyperMan system, which helps NASA Space Shuttle flight controllers organize and access large amount of information. It allows users to select and mark any part of a document as interesting, and to index that part with user-defined concepts. Users can then do subsequent retrieval of marked portions of documents. This functionality allows users to define and access personal collections of information, which are dynamically computed. The system also supports collaborative review by letting users share group access maps. The adaptive relevance network provides long-term adaptation based both on usage and on explicit user input. The indexing structure is dynamic and evolves over time. Leading and generalization support flexible retrieval of information under similar concepts. The network is geared towards more recent information access, and automatically manages its size in order to maintain rapid access when scaling up to large hypermedia space. We present results of simulated learning experiments.
Document creation, linking, and maintenance system

DOEpatents

Claghorn, Ronald [Pasco, WA

2011-02-15

A document creation and citation system designed to maintain a database of reference documents. The content of a selected document may be automatically scanned and indexed by the system. The selected documents may also be manually indexed by a user prior to the upload. The indexed documents may be uploaded and stored within a database for later use. The system allows a user to generate new documents by selecting content within the reference documents stored within the database and inserting the selected content into a new document. The system allows the user to customize and augment the content of the new document. The system also generates citations to the selected content retrieved from the reference documents. The citations may be inserted into the new document in the appropriate location and format, as directed by the user. The new document may be uploaded into the database and included with the other reference documents. The system also maintains the database of reference documents so that when changes are made to a reference document, the author of a document referencing the changed document will be alerted to make appropriate changes to his document. The system also allows visual comparison of documents so that the user may see differences in the text of the documents.

Six sigma for revenue retrieval.

PubMed

Plonien, Cynthia

2013-01-01

Deficiencies in revenue retrieval due to failures in obtaining charges have contributed to a negative bottom line for numerous hospitals. Improving documentation practices through a Six Sigma process improvement initiative can minimize opportunities for errors through reviews and instill structure for compliance and consistency. Commitment to the Six Sigma principles with continuous monitoring of outcomes and constant communication of results to departments, management, and payers is a strong approach to reducing the financial impact of denials on an organization's revenues and expenses. Using Six Sigma tools can help improve the organization's financial performance not only for today, but also for health care's uncertain future.
The NASA ADS Abstract Service and the Distributed Astronomy Digital Library [and] Project Soup: Comparing Evaluations of Digital Collection Efforts [and] Cross-Organizational Access Management: A Digital Library Authentication and Authorization Architecture [and] BibRelEx: Exploring Bibliographic Databases by Visualization of Annotated Content-based Relations [and] Semantics-Sensitive Retrieval for Digital Picture Libraries [and] Encoded Archival Description: An Introduction and Overview.

ERIC Educational Resources Information Center

Kurtz, Michael J.; Eichorn, Guenther; Accomazzi, Alberto; Grant, Carolyn S.; Demleitner, Markus; Murray, Stephen S.; Jones, Michael L. W.; Gay, Geri K.; Rieger, Robert H.; Millman, David; Bruggemann-Klein, Anne; Klein, Rolf; Landgraf, Britta; Wang, James Ze; Li, Jia; Chan, Desmond; Wiederhold, Gio; Pitti, Daniel V.

1999-01-01

Includes six articles that discuss a digital library for astronomy; comparing evaluations of digital collection efforts; cross-organizational access management of Web-based resources; searching scientific bibliographic databases based on content-based relations between documents; semantics-sensitive retrieval for digital picture libraries; and…
Data discretization for novel resource discovery in large medical data sets.

PubMed Central

Benoît, G.; Andrews, J. E.

2000-01-01

This paper is motivated by the problems of dealing with large data sets in information retrieval. The authors suggest an information retrieval framework based on mathematical principles to organize and permit end-user manipulation of a retrieval set. By adjusting through the interface the weights and types of relationships between query and set members, it is possible to expose unanticipated, novel relationships between the query/document pair. The retrieval set as a whole is parsed into discrete concept-oriented subsets (based on within-set similarity measures) and displayed on screen as interactive "graphic nodes" in an information space, distributed at first based on the vector model (similarity measure of set to query). The result is a visualized map wherein it is possible to identify main concept regions and multiple sub-regions as dimensions of the same data. Users may examine the membership within sub-regions. Based on this framework, a data visualization user interface was designed to encourage users to work with the data on multiple levels to find novel relationships between the query and retrieval set members. Space constraints prohibit addressing all aspects of this project. PMID:11079845
Technical Review of Retrieval and Closure Plans for the INEEL INTEC Tank Farm Facility

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bamberger, Judith A; Burks, Barry L; Quigley, Keith D

2001-09-28

The purpose of this report is to document the conclusions of a technical review of retrieval and closure plans for the Idaho National Energy and Environmental Laboratory (INEEL) Idaho Nuclear Technology and Engineering Center (INTEC) Tank Farm Facility. In addition to reviewing retrieval and closure plans for these tanks, the review process served as an information exchange mechanism so that staff in the INEEL High Level Waste (HLW) Program could become more familiar with retrieval and closure approaches that have been completed or are planned for underground storage tanks at the Oak Ridge National Laboratory (ORNL) and Hanford sites. Thismore » review focused not only on evaluation of the technical feasibility and appropriateness of the approach selected by INEEL but also on technology gaps that could be addressed through utilization of technologies or performance data available at other DOE sites and in the private sector. The reviewers, Judith Bamberger of Pacific Northwest National Laboratory (PNNL) and Dr. Barry Burks of The Providence Group Applied Technology, have extensive experience in the development and application of tank waste retrieval technologies for nuclear waste remediation.« less
Supplemental design requirements document, Multifunction Waste Tank Facility, Project W-236A. Revision 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Groth, B.D.

The Multi-Function Waste Tank Facility (MWTF) consists of four, nominal 1 million gallon, underground double-shell tanks, located in the 200-East area, and two tanks of the same capacity in the 200-West area. MWTF will provide environmentally safe storage capacity for wastes generated during remediation/retrieval activities of existing waste storage tanks. This document delineates in detail the information to be used for effective implementation of the Functional Design Criteria requirements.
DRR is a teenager

NASA Astrophysics Data System (ADS)

Nagy, George

2008-01-01

The fifteenth anniversary of the first SPIE symposium (titled Character Recognition Technologies) on Document Recognition and Retrieval provides an opportunity to examine DRR's contributions to the development of document technologies. Many of the tools taken for granted today, including workable general purpose OCR, large-scale, semi-automatic forms processing, inter-format table conversion, and text mining, followed research presented at this venue. This occasion also affords an opportunity to offer tribute to the conference organizers and proceedings editors and to the coterie of professionals who regularly participate in DRR.
Unified System Of Data On Materials And Processes

NASA Technical Reports Server (NTRS)

Key, Carlo F.

1989-01-01

Wide-ranging sets of data for aerospace industry described. Document describes Materials and Processes Technical Information System (MAPTIS), computerized set of integrated data bases for use by NASA and aerospace industry. Stores information in standard format for fast retrieval in searches and surveys of data. Helps engineers select materials and verify their properties. Promotes standardized nomenclature as well as standarized tests and presentation of data. Format of document of photographic projection slides used in lectures. Presents examples of reports from various data bases.
IRRA at TREC 2009: Index Term Weighting based on Divergence From Independence Model

DTIC Science & Technology

2009-11-01

weighting scheme ( Salton and Buckley, 1988), where TF stands for the term frequency and IDF stands for the inverse document frequency. In contrast to TF...IDF is a collection dependent factor, which identifies the terms that concentrates in a few documents of the collection. Salton and Buckley (1988...chapter 4, pages 35–56. Butterworths, Oxford, UK, 1981. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. In Information Processing and Management, pages 513–523, 1988. 15
Managing Content in a Matter of Minutes

NASA Technical Reports Server (NTRS)

2004-01-01

NASA software created to help scientists expeditiously search and organize their research documents is now aiding compliance personnel, law enforcement investigators, and the general public in their efforts to search, store, manage, and retrieve documents more efficiently. Developed at Ames Research Center, NETMARK software was designed to manipulate vast amounts of unstructured and semi-structured NASA documents. NETMARK is both a relational and object-oriented technology built on an Oracle enterprise-wide database. To ensure easy user access, Ames constructed NETMARK as a Web-enabled platform utilizing the latest in Internet technology. One of the significant benefits of the program was its ability to store and manage mission-critical data.
Assessment and application of AirMSPI high-resolution multiangle imaging photo-polarimetric observations for atmospheric correction

NASA Astrophysics Data System (ADS)

Kalashnikova, O. V.; Xu, F.; Garay, M. J.; Seidel, F. C.; Diner, D. J.

2016-02-01

Water-leaving radiance comprises less than 10% of the signal measured from space, making correction for absorption and scattering by the intervening atmosphere imperative. Modern improvements have been developed in ocean color retrieval algorithms to handle absorbing aerosols such as urban particulates in coastal areas and transported desert dust over the open ocean. In addition, imperfect knowledge of the absorbing aerosol optical properties or their height distribution results in well-documented sources of error in the retrieved water leaving radiance. Multi-angle spectro-polarimetric measurements have been advocated as an additional tool to better understand and retrieve the aerosol properties needed for atmospheric correction for ocean color retrievals. The Airborne Multiangle SpectroPolarimetric Imager-1 (AirMSPI-1) has been flying aboard the NASA ER-2 high altitude aircraft since October 2010. AirMSPI typically acquires observations of a target area at 9 view angles between ±67° at 10 m resolution. AirMSPI spectral channels are centered at 355, 380, 445, 470, 555, 660, and 865 nm, with 470, 660, and 865 reporting linear polarization. We have developed a retrieval code that employs a coupled Markov Chain (MC) and adding/doubling radiative transfer method for joint retrieval of aerosol properties and water leaving radiance from AirMSPI polarimetric observations. We tested prototype retrievals by comparing the retrieved aerosol concentration, size distribution, water-leaving radiance, and chlorophyll concentrations to values reported by the USC SeaPRISM AERONET-OC site off the coast of California. The retrieval then was applied to a variety of costal regions in California to evaluate variability in the water-leaving radiance under different atmospheric conditions. We will present results, and will discuss algorithm sensitivity and potential applications for future space-borne coastal monitoring.
Monitored retrievable storage submission to Congress: Volume 2, Environmental assessment for a monitored retrievable storage facility. [Contains glossary

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

1986-02-01

This Environmental Assessment (EA) supports the DOE proposal to Congress to construct and operate a facility for monitored retrievable storage (MRS) of spent fuel at a site on the Clinch River in the Roane County portion of Oak Ridge, Tennessee. The first part of this document is an assessment of the value of, need for, and feasibility of an MRS facility as an integral component of the waste management system. The second part is an assessment and comparison of the potential environmental impacts projected for each of six site-design combinations. The MRS facility would be centrally located with respect tomore » existing reactors, and would receive and canister spent fuel in preparation for shipment to and disposal in a geologic repository. 207 refs., 57 figs., 132 tabs.« less
Knowledge Modeling in Prior Art Search

NASA Astrophysics Data System (ADS)

Graf, Erik; Frommholz, Ingo; Lalmas, Mounia; van Rijsbergen, Keith

This study explores the benefits of integrating knowledge representations in prior art patent retrieval. Key to the introduced approach is the utilization of human judgment available in the form of classifications assigned to patent documents. The paper first outlines in detail how a methodology for the extraction of knowledge from such an hierarchical classification system can be established. Further potential ways of integrating this knowledge with existing Information Retrieval paradigms in a scalable and flexible manner are investigated. Finally based on these integration strategies the effectiveness in terms of recall and precision is evaluated in the context of a prior art search task for European patents. As a result of this evaluation it can be established that in general the proposed knowledge expansion techniques are particularly beneficial to recall and, with respect to optimizing field retrieval settings, further result in significant precision gains.
On-Line Databases in Mexico.

ERIC Educational Resources Information Center

Molina, Enzo

1986-01-01

Use of online bibliographic databases in Mexico is provided through Servicio de Consulta a Bancos de Informacion, a public service that provides information retrieval, document delivery, translation, technical support, and training services. Technical infrastructure is based on a public packet-switching network and institutional users may receive…
A Phrase-Based Matching Function.

ERIC Educational Resources Information Center

Galbiati, Giulia

1991-01-01

Describes the development of an information retrieval system designed for nonspecialist users that is based on the binary vector model. The syntactic structure of phrases used for indexing is examined, queries using an experimental collection of documents are described, and precision values are examined. (19 references) (LRW)
Distributed Multimedia Computing: An Assessment of the State of the Art.

ERIC Educational Resources Information Center

Williams, Neil; And Others

1991-01-01

Describes multimedia computing and the characteristics of multimedia information. Trends in information technology are reviewed; distributed multimedia computing is explained; media types are described, including digital media; and multimedia applications are examined, including office systems, documents, information storage and retrieval,…
76 FR 49805 - Submission for OMB Review; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-11

... request for extension of the previously approved collection of information discussed below. Regulation S-T... submission of documents on the Electronic Data Gathering, Analysis and Retrieval (``EDGAR'') system... any information collection requirements. An agency may not conduct or sponsor, and a person is not...
Montana Faxnet Project. Final Report.

ERIC Educational Resources Information Center

Brander, Linda L.

This report summarizes the activities and accomplishments of the Montana Faxnet Project, which was created to design and demonstrate a statewide document delivery network utilizing telefacsimile equipment that would create equitable access for all Montanans accessing and retrieving information, and reduce the waiting time for requested materials…
Document Delivery from Full-Text Online Files: A Pilot Project.

ERIC Educational Resources Information Center

Gillikin, David P.

1990-01-01

Describes the Electronic Journal Retrieval Project (EJRP) developed at the University of Tennessee, Knoxville Libraries, to provide full-text journal articles from online systems. Highlights include costs of various search strategies; implications for library services; collection development and interlibrary loan considerations; and suggestions…
Bibliography on Liquefied Natural Gas (LNG) safety

NASA Technical Reports Server (NTRS)

Ordin, P. M.

1976-01-01

Approximately 600 citations concerning safety of liquefied natural gas and liquid methane are presented. Each entry includes the title, author, abstract, source, description of figures, key references, and major descriptors for retrieving the document. An author index is provided as well as an index of descriptors.
Institutional Foundations for Cyber Security: Current Responses and New Challenges

DTIC Science & Technology

2010-09-01

endowed with regional authority, they remain restricted in their capacity to respond to cyber criminals . National CERTs occupy a first-line responder role...economiccrime/ cybercrime/Documents/CountryProfiles/default_en.asp Federal Bureau of Investigation. (2006). Netting cyber criminals . Retrieved on February

32 CFR 701.21 - Electronic record.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 32 National Defense 5 2013-07-01 2013-07-01 false Electronic record. 701.21 Section 701.21... THE NAVY DOCUMENTS AFFECTING THE PUBLIC FOIA Definitions and Terms § 701.21 Electronic record. Records (including e-mail) which are created, stored, and retrieved by electronic means. ...
32 CFR 701.21 - Electronic record.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 32 National Defense 5 2012-07-01 2012-07-01 false Electronic record. 701.21 Section 701.21... THE NAVY DOCUMENTS AFFECTING THE PUBLIC FOIA Definitions and Terms § 701.21 Electronic record. Records (including e-mail) which are created, stored, and retrieved by electronic means. ...
32 CFR 701.21 - Electronic record.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 32 National Defense 5 2010-07-01 2010-07-01 false Electronic record. 701.21 Section 701.21... THE NAVY DOCUMENTS AFFECTING THE PUBLIC FOIA Definitions and Terms § 701.21 Electronic record. Records (including e-mail) which are created, stored, and retrieved by electronic means. ...
32 CFR 701.21 - Electronic record.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 32 National Defense 5 2011-07-01 2011-07-01 false Electronic record. 701.21 Section 701.21... THE NAVY DOCUMENTS AFFECTING THE PUBLIC FOIA Definitions and Terms § 701.21 Electronic record. Records (including e-mail) which are created, stored, and retrieved by electronic means. ...
32 CFR 701.21 - Electronic record.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 32 National Defense 5 2014-07-01 2014-07-01 false Electronic record. 701.21 Section 701.21... THE NAVY DOCUMENTS AFFECTING THE PUBLIC FOIA Definitions and Terms § 701.21 Electronic record. Records (including e-mail) which are created, stored, and retrieved by electronic means. ...
The Cybernetics of Bibliographic Control: Toward a Theory of Document Retrieval Systems.

ERIC Educational Resources Information Center

Wellisch, Hans H.

1980-01-01

Explores the concept of cataloging, analyzes its functions and operations, and holds that as a control system bibliographic organization is subject to the laws of cybernetics. The role of relevance and the limitations of some regulatory devices are examined. (FM)
Faculty Research and Publication Practices

ERIC Educational Resources Information Center

Zoellner, Kate; Hines, Samantha; Keenan, Teressa; Samson, Sue

2015-01-01

Understanding faculty work practices can translate into improved library services. This study documents how education and behavioral science faculty locate, retrieve, and use information resources for research and writing and how they publish and store their research materials. The authors interviewed twelve professors using a structured interview…
Database in Artificial Intelligence.

ERIC Educational Resources Information Center

Wilkinson, Julia

1986-01-01

Describes a specialist bibliographic database of literature in the field of artificial intelligence created by the Turing Institute (Glasgow, Scotland) using the BRS/Search information retrieval software. The subscription method for end-users--i.e., annual fee entitles user to unlimited access to database, document provision, and printed awareness…
10 CFR 51.61 - Environmental report-independent spent fuel storage installation (ISFSI) or monitored retrievable...

Code of Federal Regulations, 2010 CFR

2010-01-01

... NUCLEAR REGULATORY COMMISSION (CONTINUED) ENVIRONMENTAL PROTECTION REGULATIONS FOR DOMESTIC LICENSING AND RELATED REGULATORY FUNCTIONS National Environmental Policy Act-Regulations Implementing Section 102(2... Control Desk, Director, Office of Nuclear Material Safety and Safeguards, a separate document entitled...
Mentorship: A Joint Perspective from a Deployed Environment

DTIC Science & Technology

2010-03-01

Registered Nurses ( AORN ) defines mentorship as: “The developmental relationship between an experienced person and a less-experienced person referred to as...Room Nurses. AORN Journal, 88(2), 175-6. Retrieved October 24, 2009, from Research Library. (Document ID: 1538599441). 17 Lori Leffler, Mentorship
Application of portable CDA for secure clinical-document exchange.

PubMed

Huang, Kuo-Hsuan; Hsieh, Sung-Huai; Chang, Yuan-Jen; Lai, Feipei; Hsieh, Sheau-Ling; Lee, Hsiu-Hui

2010-08-01

Health Level Seven (HL7) organization published the Clinical Document Architecture (CDA) for exchanging documents among heterogeneous systems and improving medical quality based on the design method in CDA. In practice, although the HL7 organization tried to make medical messages exchangeable, it is still hard to exchange medical messages. There are many issues when two hospitals want to exchange clinical documents, such as patient privacy, network security, budget, and the strategies of the hospital. In this article, we propose a method for the exchange and sharing of clinical documents in an offline model based on the CDA-the Portable CDA. This allows the physician to retrieve the patient's medical record stored in a portal device, but not through the Internet in real time. The security and privacy of CDA data will also be considered.
Acquisition plan for Digital Document Storage (DDS) prototype system

NASA Technical Reports Server (NTRS)

1990-01-01

NASA Headquarters maintains a continuing interest in and commitment to exploring the use of new technology to support productivity improvements in meeting service requirements tasked to the NASA Scientific and Technical Information (STI) Facility, and to support cost effective approaches to the development and delivery of enhanced levels of service provided by the STI Facility. The DDS project has been pursued with this interest and commitment in mind. It is believed that DDS will provide improved archival blowback quality and service for ad hoc requests for paper copies of documents archived and serviced centrally at the STI Facility. It will also develop an operating capability to scan, digitize, store, and reproduce paper copies of 5000 NASA technical reports archived annually at the STI Facility and serviced to the user community. Additionally, it will provide NASA Headquarters and field installations with on-demand, remote, electronic retrieval of digitized, bilevel, bit mapped report images along with branched, nonsequential retrieval of report subparts.
Structuring Broadcast Audio for Information Access

NASA Astrophysics Data System (ADS)

Gauvain, Jean-Luc; Lamel, Lori

2003-12-01

One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks.
Experiments with a novel content-based image retrieval software: can we eliminate classification systems in adolescent idiopathic scoliosis?

PubMed

Menon, K Venugopal; Kumar, Dinesh; Thomas, Tessamma

2014-02-01

Study Design Preliminary evaluation of new tool. Objective To ascertain whether the newly developed content-based image retrieval (CBIR) software can be used successfully to retrieve images of similar cases of adolescent idiopathic scoliosis (AIS) from a database to help plan treatment without adhering to a classification scheme. Methods Sixty-two operated cases of AIS were entered into the newly developed CBIR database. Five new cases of different curve patterns were used as query images. The images were fed into the CBIR database that retrieved similar images from the existing cases. These were analyzed by a senior surgeon for conformity to the query image. Results Within the limits of variability set for the query system, all the resultant images conformed to the query image. One case had no similar match in the series. The other four retrieved several images that were matching with the query. No matching case was left out in the series. The postoperative images were then analyzed to check for surgical strategies. Broad guidelines for treatment could be derived from the results. More precise query settings, inclusion of bending films, and a larger database will enhance accurate retrieval and better decision making. Conclusion The CBIR system is an effective tool for accurate documentation and retrieval of scoliosis images. Broad guidelines for surgical strategies can be made from the postoperative images of the existing cases without adhering to any classification scheme.
Collaborative Information Retrieval Method among Personal Repositories

NASA Astrophysics Data System (ADS)

Kamei, Koji; Yukawa, Takashi; Yoshida, Sen; Kuwabara, Kazuhiro

In this paper, we describe a collaborative information retrieval method among personal repositorie and an implementation of the method on a personal agent framework. We propose a framework for personal agents that aims to enable the sharing and exchange of information resources that are distributed unevenly among individuals. The kernel of a personal agent framework is an RDF(resource description framework)-based information repository for storing, retrieving and manipulating privately collected information, such as documents the user read and/or wrote, email he/she exchanged, web pages he/she browsed, etc. The repository also collects annotations to information resources that describe relationships among information resources and records of interaction between the user and information resources. Since the information resources in a personal repository and their structure are personalized, information retrieval from other users' is an important application of the personal agent. A vector space model with a personalized concept-base is employed as an information retrieval mechanism in a personal repository. Since a personalized concept-base is constructed from information resources in a personal repository, it reflects its user's knowledge and interests. On the other hand, it leads to another problem while querying other users' personal repositories; that is, simply transferring query requests does not provide desirable results. To solve this problem, we propose a query equalization scheme based on a relevance feedback method for collaborative information retrieval between personalized concept-bases. In this paper, we describe an implementation of the collaborative information retrieval method and its user interface on the personal agent framework.
Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

PubMed

Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

2017-03-01

The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
Artificial Intelligence and Expert Systems Research and Their Possible Impact on Information Science.

ERIC Educational Resources Information Center

Borko, Harold

1985-01-01

Defines artificial intelligence (AI) and expert systems; describes library applications utilizing AI to automate creation of document representations, request formulations, and design and modify search strategies for information retrieval systems; discusses expert system development for information services; and reviews impact of these…
Evaluation of an Automated Keywording System.

ERIC Educational Resources Information Center

Malone, Linda C.; And Others

1990-01-01

Discussion of automated indexing techniques focuses on ways to statistically document improvements in the development of an automated keywording system over time. The system developed by the Joint Chiefs of Staff to automate the storage, categorization, and retrieval of information from military exercises is explained, and performance measures are…
International Patent Information: The Role of the World Intellectual Property Organization.

ERIC Educational Resources Information Center

Sviridov, Felix A.

1978-01-01

Discusses two facets of the multi-aspect program of the World Intellectual Property Organization (WIPO) aimed at international cooperation with a view to standardizing documents and elaborating new patent information retrieval methods, while stressing the role of three international patent information organizations. (CWM)
Training Ideas. Premiere Issue. Aug/Sept Issue. Apr/May Issue.

ERIC Educational Resources Information Center

Training Ideas, 1984

1984-01-01

This document contains three issues of "Training Ideas," a bimonthly publication of instructional materials and articles dealing with human resource development. The premiere issue (1984) includes the following articles: "Information Retrieval: Finding That Lost Article" by Patrick Suessmuth; "Increasing Learning in Printed Materials through the…

Distributed Non-Parametric Representations for Vital Filtering: UW at TREC KBA 2014

DTIC Science & Technology

2014-11-01

formation about the entity, every new document would drive an update to the entity profile, strongly suggesting vitalness. Figure 3 represents...of the Twenty-Second Text REtrieval Conference (TREC 2013), 2013. Caruana, Richard. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In
Contingency Contractor Optimization Phase 3 Sustainment Database Design Document - Contingency Contractor Optimization Tool - Prototype

DOE Office of Scientific and Technical Information (OSTI.GOV)

Frazier, Christopher Rawls; Durfee, Justin David; Bandlow, Alisa

The Contingency Contractor Optimization Tool – Prototype (CCOT-P) database is used to store input and output data for the linear program model described in [1]. The database allows queries to retrieve this data and updating and inserting new input data.
The Bartlesville System; TGISS Software Documentation.

ERIC Educational Resources Information Center

Roberts, Tommy L.; And Others

TGISS (Total Guidance Information Support System) is an information storage and retrieval system specifically designed to meet the needs and requirements of a counselor in the Bartlesville Public School environment. The system, which is a combination of man/machine capabilities, includes the hardware and software necessary to extend the…
Computer aided modeling of soil mix designs to predict characteristics and properties of stabilized road bases.

DOT National Transportation Integrated Search

2009-07-01

"Considerable data exists for soils that were tested and documented, both for native properties and : properties with pozzolan stabilization. While the data exists there was no database for the Nebraska : Department of Roads to retrieve this data for...
Day two post retrieval 1500 IUI hCG bolus, progesterone-free luteal support post GnRH agonist trigger - a proof of concept study.

PubMed

Vanetik, Sharon; Segal, Linoy; Breizman, Tatiana; Kol, Shahar

2018-02-01

Small dose of hCG (1500 IU) on the day of oocyte retrieval, followed by daily progesterone administration, is currently the preferred way to secure adequate luteal support following GnRH agonist trigger. In the current proof-of-concept study, we explored the possibility that a bolus of 1500 IU hCG, given two days after oocyte retrieval, may be sufficient to sustain adequate luteal support without additional progesterone treatment. From February 2015 to August 2016, we obtained 44 pregnancies following GnRHa trigger followed by day 2 hCG (1500 IU) support only (study group). Data from these 44 cycles were compared with the latest 44 pregnancies obtained following hCG (6500 IU) trigger followed by conventional progesterone luteal documented (control group). Mean progesterone levels (14 days postoocyte retrieval) in the study and control groups were 197 nmol/l and 173 nmol/l, respectively (NS). Mean E 2 levels (14 days post oocyte retrieval) in the study group was 6937 pmol/l, significantly higher (p < .001) than in the control group (3.276 pmol/l). We conclude that bolus of 1500 IU hCG, administered 2 days after retrieval, can provide excellent support, without the need to further supplement with progesterone.
Partial Automation of Requirements Tracing

NASA Technical Reports Server (NTRS)

Hayes, Jane; Dekhtyar, Alex; Sundaram, Senthil; Vadlamudi, Sravanthi

2006-01-01

Requirements Tracing on Target (RETRO) is software for after-the-fact tracing of textual requirements to support independent verification and validation of software. RETRO applies one of three user-selectable information-retrieval techniques: (1) term frequency/inverse document frequency (TF/IDF) vector retrieval, (2) TF/IDF vector retrieval with simple thesaurus, or (3) keyword extraction. One component of RETRO is the graphical user interface (GUI) for use in initiating a requirements-tracing project (a pair of artifacts to be traced to each other, such as a requirements spec and a design spec). Once the artifacts have been specified and the IR technique chosen, another component constructs a representation of the artifact elements and stores it on disk. Next, the IR technique is used to produce a first list of candidate links (potential matches between the two artifact levels). This list, encoded in Extensible Markup Language (XML), is optionally processed by a filtering component designed to make the list somewhat smaller without sacrificing accuracy. Through the GUI, the user examines a number of links and returns decisions (yes, these are links; no, these are not links). Coded in XML, these decisions are provided to a "feedback processor" component that prepares the data for the next application of the IR technique. The feedback reduces the incidence of erroneous candidate links. Unlike related prior software, RETRO does not require the user to assign keywords, and automatically builds a document index.
Tank Waste Retrieval Lessons Learned at the Hanford Site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dodd, R.A.

One of the environmental remediation challenges facing the nation is the retrieval and permanent disposal of approximately 90 million gallons of radioactive waste stored in underground tanks at the U. S. Department of Energy (DOE) facilities. The Hanford Site is located in southeastern Washington State and stores roughly 60 percent of this waste. An estimated 53 million gallons of high-level, transuranic, and low-level radioactive waste is stored underground in 149 single-shell tanks (SSTs) and 28 newer double-shell tanks (DSTs) at the Hanford Site. These SSTs range in size from 55,000 gallons to 1,000,000 gallon capacity. Approximately 30 million gallons ofmore » this waste is stored in SSTs. The SSTs were constructed between 1943 and 1964 and all have exceeded the nominal 20-year design life. Sixty-seven SSTs are known or suspected to have leaked an estimated 1,000,000 gallons of waste to the surrounding soil. The risk of additional SST leakage has been greatly reduced by removing more than 3 million gallons of interstitial liquids and supernatant and transferring this waste to the DST system. Retrieval of SST salt-cake and sludge waste is underway to further reduce risks and stage feed materials for the Hanford Site Waste Treatment Plant. Regulatory requirements for SST waste retrieval and tank farm closure are established in the Hanford Federal Facility Agreement and Consent Order (HFFACO), better known as the Tri- Party Agreement, or TPA. The HFFACO was signed by the DOE, the State of Washington Department of Ecology (Ecology), and U.S. Environmental Protection Agency (EPA) and requires retrieval of as much waste as technically possible, with waste residues not to exceed 360 ft{sup 3} in 530,000 gallon or larger tanks; 30 ft{sup 3} in 55,000 gallon or smaller tanks; or the limit of waste retrieval technology, whichever is less. If residual waste volume requirements cannot be achieved, then HFFACO Appendix H provisions can be invoked to request Ecology and EPA approval of an exception to the waste retrieval criteria for a specific tank. Tank waste retrieval has been conducted at the Hanford Site over the last few decades using a method referred to as Past Practice Hydraulic Sluicing. Past Practice Hydraulic Sluicing employs large volumes of DST supernatant and water to dislodge, dissolve, mobilize, and retrieve tank waste. Concern over the leak integrity of SSTs resulted in the need for tank waste retrieval methods capable of using smaller volumes of liquid in a more controlled manner. Retrieval of SST waste in accordance with HFFACO requirements was initiated at the Hanford Site in April 2003. New and innovative tank waste retrieval methods that minimize and control the use of liquids are being implemented for the first time. These tank waste retrieval methods replace Past Practice Hydraulic Sluicing and employ modified sluicing, vacuum retrieval, and in-tank vehicle techniques. Waste retrieval has been completed in seven Hanford Site SSTs (C-106, C-103, C-201, C-202, C-203, C-204, and S-112) in accordance with HFFACO requirements. Three additional tanks are currently in the process of being retrieved (C-108, C-109 and S-102) Preparation for retrieval of two additional SSTs (C-104 and C-110) is ongoing with retrieval operations forecasted to start in calendar year 2008. Tank C-106 was retrieved to a residual waste volume of 470 ft{sup 3} using oxalic acid dissolution and modified sluicing. An Appendix H exception request for Tank C-106 is undergoing review. Tank C-103 was retrieved to a residual volume of 351 ft{sup 3} using a modified sluicing technology. This approach was successful at reaching the TPA limits for this tank of less than 360 ft{sup 3}and the limits of the technology. Tanks C-201, C-202, C-203, and C-204 are smaller (55,000 gallon) tanks and waste removal was completed in accordance with HFFACO requirements using a vacuum retrieval system. Residual waste volumes in each of these four tanks were less than 25 ft{sup 3}. Tank S-112 retrieval was completed February 28, 2007, meeting the TPA Limits of less than 360 cu ft using salt-cake dissolution, modified sluicing, in-tank vehicle with high pressure water spray and caustic dissolution. Tanks C-108 and C-109 have been retrieved to 90% and 85% respectively. Modified sluicing was no longer effective at retrieving the remaining 5,000 to 10,000 gallons of residual. A Mobile Retrieval Tool (FoldTrac) is scheduled for installation early in 2008 to assist in breaking up chunks of waste and mobilizing the waste for transfer. Lessons learned from application of new tank waste retrieval methods are being documented and incorporated into future retrieval operations. They address all phases of retrieval including process design, equipment procurement and installation, supporting documentation, and system operations. Information is obtained through interviews with retrieval project personnel, focused workshops, review of problem evaluation requests, and evaluation of retrieval performance data. This paper presents current retrieval successes and lessons learned from retrieval of tank waste at the Hanford Site and discusses how this information is used to optimize retrieval system efficiency, improve overall cost effectiveness of retrieval operations, and ensure that HFFACO requirements are met. (authors)« less
Adapting Document Similarity Measures for Ligand-Based Virtual Screening.

PubMed

Himmat, Mubarak; Salim, Naomie; Al-Dabbagh, Mohammed Mumtaz; Saeed, Faisal; Ahmed, Ali

2016-04-13

Quantifying the similarity of molecules is considered one of the major tasks in virtual screening. There are many similarity measures that have been proposed for this purpose, some of which have been derived from document and text retrieving areas as most often these similarity methods give good results in document retrieval and can achieve good results in virtual screening. In this work, we propose a similarity measure for ligand-based virtual screening, which has been derived from a text processing similarity measure. It has been adopted to be suitable for virtual screening; we called this proposed measure the Adapted Similarity Measure of Text Processing (ASMTP). For evaluating and testing the proposed ASMTP we conducted several experiments on two different benchmark datasets: the Maximum Unbiased Validation (MUV) and the MDL Drug Data Report (MDDR). The experiments have been conducted by choosing 10 reference structures from each class randomly as queries and evaluate them in the recall of cut-offs at 1% and 5%. The overall obtained results are compared with some similarity methods including the Tanimoto coefficient, which are considered to be the conventional and standard similarity coefficients for fingerprint-based similarity calculations. The achieved results show that the performance of ligand-based virtual screening is better and outperforms the Tanimoto coefficients and other methods.
An Exponentiation Method for XML Element Retrieval

PubMed Central

2014-01-01

XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP. PMID:24696643
Computer integrated documentation

NASA Technical Reports Server (NTRS)

Boy, Guy

1991-01-01

The main technical issues of the Computer Integrated Documentation (CID) project are presented. The problem of automation of documents management and maintenance is analyzed both from an artificial intelligence viewpoint and from a human factors viewpoint. Possible technologies for CID are reviewed: conventional approaches to indexing and information retrieval; hypertext; and knowledge based systems. A particular effort was made to provide an appropriate representation for contextual knowledge. This representation is used to generate context on hypertext links. Thus, indexing in CID is context sensitive. The implementation of the current version of CID is described. It includes a hypertext data base, a knowledge based management and maintenance system, and a user interface. A series is also presented of theoretical considerations as navigation in hyperspace, acquisition of indexing knowledge, generation and maintenance of a large documentation, and relation to other work.
Web information retrieval for health professionals.

PubMed

Ting, S L; See-To, Eric W K; Tse, Y K

2013-06-01

This paper presents a Web Information Retrieval System (WebIRS), which is designed to assist the healthcare professionals to obtain up-to-date medical knowledge and information via the World Wide Web (WWW). The system leverages the document classification and text summarization techniques to deliver the highly correlated medical information to the physicians. The system architecture of the proposed WebIRS is first discussed, and then a case study on an application of the proposed system in a Hong Kong medical organization is presented to illustrate the adoption process and a questionnaire is administrated to collect feedback on the operation and performance of WebIRS in comparison with conventional information retrieval in the WWW. A prototype system has been constructed and implemented on a trial basis in a medical organization. It has proven to be of benefit to healthcare professionals through its automatic functions in classification and summarizing the medical information that the physicians needed and interested. The results of the case study show that with the use of the proposed WebIRS, significant reduction of searching time and effort, with retrieval of highly relevant materials can be attained.
Unified modeling language and design of a case-based retrieval system in medical imaging.

PubMed Central

LeBozec, C.; Jaulent, M. C.; Zapletal, E.; Degoulet, P.

1998-01-01

One goal of artificial intelligence research into case-based reasoning (CBR) systems is to develop approaches for designing useful and practical interactive case-based environments. Explaining each step of the design of the case-base and of the retrieval process is critical for the application of case-based systems to the real world. We describe herein our approach to the design of IDEM--Images and Diagnosis from Examples in Medicine--a medical image case-based retrieval system for pathologists. Our approach is based on the expressiveness of an object-oriented modeling language standard: the Unified Modeling Language (UML). We created a set of diagrams in UML notation illustrating the steps of the CBR methodology we used. The key aspect of this approach was selecting the relevant objects of the system according to user requirements and making visualization of cases and of the components of the case retrieval process. Further evaluation of the expressiveness of the design document is required but UML seems to be a promising formalism, improving the communication between the developers and users. Images Figure 6 Figure 7 PMID:9929346
Unified modeling language and design of a case-based retrieval system in medical imaging.

PubMed

LeBozec, C; Jaulent, M C; Zapletal, E; Degoulet, P

1998-01-01

One goal of artificial intelligence research into case-based reasoning (CBR) systems is to develop approaches for designing useful and practical interactive case-based environments. Explaining each step of the design of the case-base and of the retrieval process is critical for the application of case-based systems to the real world. We describe herein our approach to the design of IDEM--Images and Diagnosis from Examples in Medicine--a medical image case-based retrieval system for pathologists. Our approach is based on the expressiveness of an object-oriented modeling language standard: the Unified Modeling Language (UML). We created a set of diagrams in UML notation illustrating the steps of the CBR methodology we used. The key aspect of this approach was selecting the relevant objects of the system according to user requirements and making visualization of cases and of the components of the case retrieval process. Further evaluation of the expressiveness of the design document is required but UML seems to be a promising formalism, improving the communication between the developers and users.
Passive microwave algorithm development and evaluation

NASA Technical Reports Server (NTRS)

Petty, Grant W.

1995-01-01

The scientific objectives of this grant are: (1) thoroughly evaluate, both theoretically and empirically, all available Special Sensor Microwave Imager (SSM/I) retrieval algorithms for column water vapor, column liquid water, and surface wind speed; (2) where both appropriate and feasible, develop, validate, and document satellite passive microwave retrieval algorithms that offer significantly improved performance compared with currently available algorithms; and (3) refine and validate a novel physical inversion scheme for retrieving rain rate over the ocean. This report summarizes work accomplished or in progress during the first year of a three year grant. The emphasis during the first year has been on the validation and refinement of the rain rate algorithm published by Petty and on the analysis of independent data sets that can be used to help evaluate the performance of rain rate algorithms over remote areas of the ocean. Two articles in the area of global oceanic precipitation are attached.
Toward Medical Documentation That Enhances Situational Awareness Learning

PubMed Central

Lenert, Leslie A.

2016-01-01

The purpose of writing medical notes in a computer system goes beyond documentation for medical-legal purposes or billing. The structure of documentation is a checklist that serves as a cognitive aid and a potential index to retrieve information for learning from the record. For the past 50 years, one of the primary organizing structures for physicians’ clinical documentation have been the SOAP note (Subjective, Objective, Assessment, Plan). The cognitive check list is well-suited to differential diagnosis but may not support detection of changes in systems and/or learning from cases. We describe an alternative cognitive checklist called the OODA Loop (Observe, Orient, Decide, Act. Through incorporation of projections of anticipated course events with and without treatment and by making “Decisions” an explicit category of documentation in the medical record in the context of a variable temporal cycle for observations, OODA may enhance opportunities to learn from clinical care. PMID:28269872
Data Entry. ERIC Processing Manual, Section IX.

ERIC Educational Resources Information Center

Weller, Carolyn R., Ed.

Documents and journal articles acquired by the ERIC Clearinghouses are processed (cataloged, indexed, abstracted/annotated) for retrieval and use by the educational community. The bibliographic data resulting from this processing are provided by the ERIC Clearinghouses on a regular basis to the ERIC Processing and Reference Facility. The ERIC…
USER'S GUIDE FOR GLOED VERSION 1.0 - THE GLOBAL EMISSIONS DATABASE

EPA Science Inventory

The document is a user's guide for the EPA-developed, powerful software package, Global Emissions Database (GloED). GloED is a user-friendly, menu-driven tool for storing and retrieving emissions factors and activity data on a country-specific basis. Data can be selected from dat...
Television and Human Behavior.

ERIC Educational Resources Information Center

Comstock, George; And Others

To compile a comprehensive review of English language scientific literature regarding the effects of television on human behavior, the authors of this book evaluated more than 2,500 books, articles, reports, and other documents. Rather than taking a traditional approach, the authors followed a new model for the retrieval and synthesis of…
Trademark Status & Document Retrieval

Science.gov Websites

Policy & Law Reports TSDR now includes a Post Registration Maintenance Tab. When viewing a Registered mark, users will now find a new 3rd tab providing Post Registration information next to the " mark is not registered. TSDR now includes a Post Registration Maintenance Tab. When viewing a
Report on Information Retrieval and Library Automation Studies.

ERIC Educational Resources Information Center

Alberta Univ., Edmonton. Dept. of Computing Science.

Short abstracts of works in progress or completed in the Department of Computing Science at the University of Alberta are presented under five major headings. The five categories are: Storage and search techniques for document data bases, Automatic classification, Study of indexing and classification languages through computer manipulation of data…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.