The present status and problems in document retrieval system : document input type retrieval system
NASA Astrophysics Data System (ADS)
Inagaki, Hirohito
The office-automation (OA) made many changes. Many documents were begun to maintained in an electronic filing system. Therefore, it is needed to establish efficient document retrieval system to extract useful information. Current document retrieval systems are using simple word-matching, syntactic-matching, semantic-matching to obtain high retrieval efficiency. On the other hand, the document retrieval systems using special hardware devices, such as ISSP, were developed for aiming high speed retrieval. Since these systems can accept a single sentence or keywords as input, it is difficult to explain searcher's request. We demonstrated document input type retrieval system, which can directly accept document as an input, and can search similar documents from document data-base.
An Intelligent System for Document Retrieval in Distributed Office Environments.
ERIC Educational Resources Information Center
Mukhopadhyay, Uttam; And Others
1986-01-01
MINDS (Multiple Intelligent Node Document Servers) is a distributed system of knowledge-based query engines for efficiently retrieving multimedia documents in an office environment of distributed workstations. By learning document distribution patterns and user interests and preferences during system usage, it customizes document retrievals for…
INFORMATION STORAGE AND RETRIEVAL, REPORTS ON EVALUATION PROCEDURES AND RESULTS 1965-1967.
ERIC Educational Resources Information Center
SALTON, GERALD
A DETAILED ANALYSIS OF THE RETRIEVAL EVALUATION RESULTS OBTAINED WITH THE AUTOMATIC SMART DOCUMENT RETRIEVAL SYSTEM FOR DOCUMENT COLLECTIONS IN THE FIELDS OF AERODYNAMICS, COMPUTER SCIENCE, AND DOCUMENTATION IS GIVEN IN THIS REPORT. THE VARIOUS COMPONENTS OF FULLY AUTOMATIC DOCUMENT RETRIEVAL SYSTEMS ARE DISCUSSED IN DETAIL, INCLUDING THE FORMS OF…
Cognitive Process as a Basis for Intelligent Retrieval Systems Design.
ERIC Educational Resources Information Center
Chen, Hsinchun; Dhar, Vasant
1991-01-01
Two studies of the cognitive processes involved in online document-based information retrieval were conducted. These studies led to the development of five computational models of online document retrieval which were incorporated into the design of an "intelligent" document-based retrieval system. Both the system and the broader implications of…
Query Expansion for Noisy Legal Documents
2008-11-01
9] G. Salton (ed). The SMART retrieval system experiments in automatic document processing. 1971. [10] H. Schutze and J . Pedersen. A cooccurrence...Language Modeling and Information Retrieval. http://www.lemurproject.org. [2] J . Baron, D. Lewis, and D. Oard. TREC 2006 legal track overview. In...Retrieval, 1993. [8] J . Rocchio. Relevance feedback in information retrieval. In The SMART retrieval system experiments in automatic document processing, 1971
Information Retrieval: A Sequential Learning Process.
ERIC Educational Resources Information Center
Bookstein, Abraham
1983-01-01
Presents decision-theoretic models which intrinsically include retrieval of multiple documents whereby system responds to request by presenting documents to patron in sequence, gathering feedback, and using information to modify future retrievals. Document independence model, set retrieval model, sequential retrieval model, learning model,…
ERIC Educational Resources Information Center
Cornell Univ., Ithaca, NY. Dept. of Computer Science.
On-line retrieval system design is discussed in the two papers which make up Part Five of this report on Salton's Magical Automatic Retriever of Texts (SMART) project report. The first paper: "A Prototype On-Line Document Retrieval System" by D. Williamson and R. Williamson outlines a design for a SMART on-line document retrieval system…
Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation
ERIC Educational Resources Information Center
Rajagopal, Prabha; Ravana, Sri Devi
2017-01-01
Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
Predicting Document Retrieval System Performance: An Expected Precision Measure.
ERIC Educational Resources Information Center
Losee, Robert M., Jr.
1987-01-01
Describes an expected precision (EP) measure designed to predict document retrieval performance. Highlights include decision theoretic models; precision and recall as measures of system performance; EP graphs; relevance feedback; and computing the retrieval status value of a document for two models, the Binary Independent Model and the Two Poisson…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choo, Jaegul; Kim, Hannah; Clarkson, Edward
In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents,more » VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.« less
Choo, Jaegul; Kim, Hannah; Clarkson, Edward; ...
2018-01-31
In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents,more » VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.« less
Using Induction to Refine Information Retrieval Strategies
NASA Technical Reports Server (NTRS)
Baudin, Catherine; Pell, Barney; Kedar, Smadar
1994-01-01
Conceptual information retrieval systems use structured document indices, domain knowledge and a set of heuristic retrieval strategies to match user queries with a set of indices describing the document's content. Such retrieval strategies increase the set of relevant documents retrieved (increase recall), but at the expense of returning additional irrelevant documents (decrease precision). Usually in conceptual information retrieval systems this tradeoff is managed by hand and with difficulty. This paper discusses ways of managing this tradeoff by the application of standard induction algorithms to refine the retrieval strategies in an engineering design domain. We gathered examples of query/retrieval pairs during the system's operation using feedback from a user on the retrieved information. We then fed these examples to the induction algorithm and generated decision trees that refine the existing set of retrieval strategies. We found that (1) induction improved the precision on a set of queries generated by another user, without a significant loss in recall, and (2) in an interactive mode, the decision trees pointed out flaws in the retrieval and indexing knowledge and suggested ways to refine the retrieval strategies.
Markó, K; Schulz, S; Hahn, U
2005-01-01
We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain. Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents. We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline. Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contra-distinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.
Computer program and user documentation medical data tape retrieval system
NASA Technical Reports Server (NTRS)
Anderson, J.
1971-01-01
This volume provides several levels of documentation for the program module of the NASA medical directorate mini-computer storage and retrieval system. A biomedical information system overview describes some of the reasons for the development of the mini-computer storage and retrieval system. It briefly outlines all of the program modules which constitute the system.
Topology of Document Retrieval Systems.
ERIC Educational Resources Information Center
Everett, Daniel M.; Cater, Steven C.
1992-01-01
Explains the use of a topological structure to examine the closeness between documents in retrieval systems and analyzes the topological structure of a vector-space model, a fuzzy-set model, an extended Boolean model, a probabilistic model, and a TIRS (Topological Information Retrieval System) model. Proofs for the results are appended. (17…
Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.
ERIC Educational Resources Information Center
Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald
2001-01-01
Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…
AP-102/104 Retrieval control system qualification test procedure
DOE Office of Scientific and Technical Information (OSTI.GOV)
RIECK, C.A.
1999-05-18
This Qualification Test Procedure documents the results of the qualification testing that was performed on the Project W-211, ''Initial Tank Retrieval Systems,'' retrieval control system (RCS) for tanks 241-AP-102 and 241-AP-104. The results confirm that the RCS has been programmed correctly and that the two related hardware enclosures have been assembled in accordance with the design documents.
ERIC Educational Resources Information Center
Bennertz, Richard K.
The document highlights in nontechnical language the development of the Defense Documentation Center (DDC) Remote On-Line Retrieval System from its inception in 1967 to what is planned. It describes in detail the current operating system, equipment configuration and associated costs, user training and system evaluation and may be of value to other…
Analyzing Document Retrievability in Patent Retrieval Settings
NASA Astrophysics Data System (ADS)
Bashir, Shariq; Rauber, Andreas
Most information retrieval settings, such as web search, are typically precision-oriented, i.e. they focus on retrieving a small number of highly relevant documents. However, in specific domains, such as patent retrieval or law, recall becomes more relevant than precision: in these cases the goal is to find all relevant documents, requiring algorithms to be tuned more towards recall at the cost of precision. This raises important questions with respect to retrievability and search engine bias: depending on how the similarity between a query and documents is measured, certain documents may be more or less retrievable in certain systems, up to some documents not being retrievable at all within common threshold settings. Biases may be oriented towards popularity of documents (increasing weight of references), towards length of documents, favour the use of rare or common words; rely on structural information such as metadata or headings, etc. Existing accessibility measurement techniques are limited as they measure retrievability with respect to all possible queries. In this paper, we improve accessibility measurement by considering sets of relevant and irrelevant queries for each document. This simulates how recall oriented users create their queries when searching for relevant information. We evaluate retrievability scores using a corpus of patents from US Patent and Trademark Office.
Development of a full-text information retrieval system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keizo Oyama; AKira Miyazawa, Atsuhiro Takasu; Kouji Shibano
The authors have executed a project to realize a full-text information retrieval system. The system is designed to deal with a document database comprising full text of a large number of documents such as academic papers. The document structures are utilized in searching and extracting appropriate information. The concept of structure handling and the configuration of the system are described in this paper.
Document Indexing for Image-Based Optical Information Systems.
ERIC Educational Resources Information Center
Thiel, Thomas J.; And Others
1991-01-01
Discussion of image-based information retrieval systems focuses on indexing. Highlights include computerized information retrieval; multimedia optical systems; optical mass storage and personal computers; and a case study that describes an optical disk system which was developed to preserve, access, and disseminate military documents. (19…
NASA Astrophysics Data System (ADS)
Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.
1999-10-01
Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provides the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited; e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.
The JPL Library information retrieval system
NASA Technical Reports Server (NTRS)
Walsh, J.
1975-01-01
The development, capabilities, and products of the computer-based retrieval system of the Jet Propulsion Laboratory Library are described. The system handles books and documents, produces a book catalog, and provides a machine search capability. Programs and documentation are available to the public through NASA's computer software dissemination program.
Autocorrelation and Regularization of Query-Based Information Retrieval Scores
2008-02-01
of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation
Web Mining for Web Image Retrieval.
ERIC Educational Resources Information Center
Chen, Zheng; Wenyin, Liu; Zhang, Feng; Li, Mingjing; Zhang, Hongjiang
2001-01-01
Presents a prototype system for image retrieval from the Internet using Web mining. Discusses the architecture of the Web image retrieval prototype; document space modeling; user log mining; and image retrieval experiments to evaluate the proposed system. (AEF)
ERIC Educational Resources Information Center
Lynch, Michael F.; Willett, Peter
1987-01-01
Discusses research into chemical information and document retrieval systems at the University of Sheffield. Highlights include the use of cluster analysis methods for document retrieval and drug design, representation and searching of files of generic chemical structures, and the application of parallel computer hardware to information retrieval.…
Recent Experiments with INQUERY
1995-11-01
were conducted with version of the INQUERY information retrieval system INQUERY is based on the Bayesian inference network retrieval model It is...corpus based query expansion For TREC a subset of of the adhoc document set was used to build the InFinder database None of the...experiments that showed signi cant improvements in retrieval eectiveness when document rankings based on the entire document text are combined with
A tutorial on information retrieval: basic terms and concepts
Zhou, Wei; Smalheiser, Neil R; Yu, Clement
2006-01-01
This informal tutorial is intended for investigators and students who would like to understand the workings of information retrieval systems, including the most frequently used search engines: PubMed and Google. Having a basic knowledge of the terms and concepts of information retrieval should improve the efficiency and productivity of searches. As well, this knowledge is needed in order to follow current research efforts in biomedical information retrieval and text mining that are developing new systems not only for finding documents on a given topic, but extracting and integrating knowledge across documents. PMID:16722601
Computer-Assisted Search Of Large Textual Data Bases
NASA Technical Reports Server (NTRS)
Driscoll, James R.
1995-01-01
"QA" denotes high-speed computer system for searching diverse collections of documents including (but not limited to) technical reference manuals, legal documents, medical documents, news releases, and patents. Incorporates previously available and emerging information-retrieval technology to help user intelligently and rapidly locate information found in large textual data bases. Technology includes provision for inquiries in natural language; statistical ranking of retrieved information; artificial-intelligence implementation of semantics, in which "surface level" knowledge found in text used to improve ranking of retrieved information; and relevance feedback, in which user's judgements of relevance of some retrieved documents used automatically to modify search for further information.
Strong Similarity Measures for Ordered Sets of Documents in Information Retrieval.
ERIC Educational Resources Information Center
Egghe, L.; Michel, Christine
2002-01-01
Presents a general method to construct ordered similarity measures in information retrieval based on classical similarity measures for ordinary sets. Describes a test of some of these measures in an information retrieval system that extracted ranked document sets and discuses the practical usability of the ordered similarity measures. (Author/LRW)
Performance Considerations for an Optical Jukebox in Document Archival/Retrieval Applications.
ERIC Educational Resources Information Center
Spenser, Peter
1991-01-01
Discusses the use of an optical jukebox in a retrieval-intensive application--i.e., for a law firm's litigation support--and examines factors affecting the performance of the jukebox. The imaging system's configuration is explained, document access from workstations is described, and expectations of retrieval times are discussed. (LRW)
Document Storage and Retrieval in the Electronic Office.
ERIC Educational Resources Information Center
Ashford, John
1985-01-01
Proposals are made for practical approaches to the design of electronic office systems to provide for the effective storage and retrieval of the documents that they generate. Problems of records management and requirements to be met by the designer of an electronic office system are highlighted. Nineteen references are cited. (EJS)
A comparison of Boolean-based retrieval to the WAIS system for retrieval of aeronautical information
NASA Technical Reports Server (NTRS)
Marchionini, Gary; Barlow, Diane
1994-01-01
An evaluation of an information retrieval system using a Boolean-based retrieval engine and inverted file architecture and WAIS, which uses a vector-based engine, was conducted. Four research questions in aeronautical engineering were used to retrieve sets of citations from the NASA Aerospace Database which was mounted on a WAIS server and available through Dialog File 108 which served as the Boolean-based system (BBS). High recall and high precision searches were done in the BBS and terse and verbose queries were used in the WAIS condition. Precision values for the WAIS searches were consistently above the precision values for high recall BBS searches and consistently below the precision values for high precision BBS searches. Terse WAIS queries gave somewhat better precision performance than verbose WAIS queries. In every case, a small number of relevant documents retrieved by one system were not retrieved by the other, indicating the incomplete nature of the results from either retrieval system. Relevant documents in the WAIS searches were found to be randomly distributed in the retrieved sets rather than distributed by ranks. Advantages and limitations of both types of systems are discussed.
Information retrieval for a document writing assistance program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corral, M.L.; Simon, A.; Julien, C.
This paper presents an Information Retrieval mechanism to facilitate the writing of technical documents in the space domain. To address the need for document exchange between partners in a given project, documents are standardized. The writing of a new document requires the re-use of existing documents or parts thereof. These parts can be identified by {open_quotes}tagging{close_quotes} the logical structure of documents and restored by means of a purpose-built Information Retrieval System (I.R.S.). The I.R.S. implemented in our writing assistance tool uses natural language queries and is based on a statistical linguistic approach which is enhanced by the use of documentmore » structure module.« less
Topological Aspects of Information Retrieval.
ERIC Educational Resources Information Center
Egghe, Leo; Rousseau, Ronald
1998-01-01
Discusses topological aspects of theoretical information retrieval, including retrieval topology; similarity topology; pseudo-metric topology; document spaces as topological spaces; Boolean information retrieval as a subsystem of any topological system; and proofs of theorems. (LRW)
Information Storage and Retrieval. Reports on Analysis, Search, and Iterative Retrieval.
ERIC Educational Resources Information Center
Salton, Gerard
As the fourteenth report in a series describing research in automatic information storage and retrieval, this document covers work carried out on the SMART project for approximately one year (summer 1967 to summer 1968). The document is divided into four main parts: (1) SMART systems design, (2) analysis and search experiments, (3) user feedback…
An Optical Disk-Based Information Retrieval System.
ERIC Educational Resources Information Center
Bender, Avi
1988-01-01
Discusses a pilot project by the Nuclear Regulatory Commission to apply optical disk technology to the storage and retrieval of documents related to its high level waste management program. Components and features of the microcomputer-based system which provides full-text and image access to documents are described. A sample search is included.…
QCS : a system for querying, clustering, and summarizing documents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunlavy, Daniel M.
2006-08-01
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence ''trimming'', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less
QCS: a system for querying, clustering and summarizing documents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunlavy, Daniel M.; Schlesinger, Judith D.; O'Leary, Dianne P.
2006-10-01
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence 'trimming', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less
Natural language information retrieval in digital libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.
In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less
The Effect of Indexing Exhaustivity on Retrieval Performance.
ERIC Educational Resources Information Center
Burgin, Robert
1991-01-01
Describes results of a study that investigated the effect of variations in indexing exhaustivity on retrieval performance in a vector space retrieval system. The test collection of documents in the National Library of Medicine's Medline file indexed under cystic fibrosis is described, and use of the SMART information retrieval system is discussed.…
The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.
ERIC Educational Resources Information Center
Peat, Helen J.; Willett, Peter
1991-01-01
Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…
The "Generality" Effect and the Retrieval Evaluation for Large Collections
ERIC Educational Resources Information Center
Salton, Gerard
1972-01-01
The role of the generality effect in retrieval system evaluation is assessed, and evaluation results are given for the comparison of several document collections of distinct size and generality in the areas of documentation and aerodynamics. (14 references) (Author)
Categorizing document by fuzzy C-Means and K-nearest neighbors approach
NASA Astrophysics Data System (ADS)
Priandini, Novita; Zaman, Badrus; Purwanti, Endah
2017-08-01
Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.
ERIC Educational Resources Information Center
Vasarhelyi, Paul
The new data retrieval system for the social sciences which has recently been installed in the UNESCO Secretariat in Paris is described in this comprehensive report. The computerized system is designed to facilitate the existing storage systems in the circulation of information, data retrieval, and indexing services. Basically, this report…
Sarrouti, Mourad; Ouatik El Alaoui, Said
2017-04-01
Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering (QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts. In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones. Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current state-of-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP). We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision. Copyright © 2017 Elsevier Inc. All rights reserved.
The JPL Library Information Retrieval System
ERIC Educational Resources Information Center
Walsh, Josephine
1975-01-01
The development, capabilities, and products of the computer-based retrieval system of the Jet Propulsion Laboratory Library are described. The system handles books and documents, produces a book catalog, and provides a machine search capability. (Author)
Learned Vector-Space Models for Document Retrieval.
ERIC Educational Resources Information Center
Caid, William R.; And Others
1995-01-01
The Latent Semantic Indexing and MatchPlus systems examine similar contexts in which words appear and create representational models that capture the similarity of meaning of terms and then use the representation for retrieval. Text Retrieval Conference experiments using these systems demonstrate the computational feasibility of using…
Facilitating access to information in large documents with an intelligent hypertext system
NASA Technical Reports Server (NTRS)
Mathe, Nathalie
1993-01-01
Retrieving specific information from large amounts of documentation is not an easy task. It could be facilitated if information relevant in the current problem solving context could be automatically supplied to the user. As a first step towards this goal, we have developed an intelligent hypertext system called CID (Computer Integrated Documentation) and tested it on the Space Station Freedom requirement documents. The CID system enables integration of various technical documents in a hypertext framework and includes an intelligent context-sensitive indexing and retrieval mechanism. This mechanism utilizes on-line user information requirements and relevance feedback either to reinforce current indexing in case of success or to generate new knowledge in case of failure. This allows the CID system to provide helpful responses, based on previous usage of the documentation, and to improve its performance over time.
Documents Similarity Measurement Using Field Association Terms.
ERIC Educational Resources Information Center
Atlam, El-Sayed; Fuketa, M.; Morita, K.; Aoe, Jun-ichi
2003-01-01
Discussion of text analysis and information retrieval and measurement of document similarity focuses on a new text manipulation system called FA (field association)-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. Discusses recall and precision, automatic indexing…
STATISTICAL DATA ON CHEMICAL COMPOUNDS.
DATA STORAGE SYSTEMS, FEASIBILITY STUDIES, COMPUTERS, STATISTICAL DATA , DOCUMENTS, ARMY...CHEMICAL COMPOUNDS, INFORMATION RETRIEVAL), (*INFORMATION RETRIEVAL, CHEMICAL COMPOUNDS), MOLECULAR STRUCTURE, BIBLIOGRAPHIES, DATA PROCESSING
iSMART: Ontology-based Semantic Query of CDA Documents
Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue
2009-01-01
The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883
Information Retrieval Using UMLS-based Structured Queries
Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith
2001-01-01
During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
ERIC Educational Resources Information Center
Liu, Chang
2012-01-01
When using information retrieval (IR) systems, users often pose short and ambiguous query terms. It is critical for IR systems to obtain more accurate representation of users' information need, their document preferences, and the context they are working in, and then incorporate them into the design of the systems to tailor retrieval to…
Data collection and preparation of authoritative reviews on space food and nutrition research
NASA Technical Reports Server (NTRS)
1972-01-01
The collection and classification of information for a manually operated information retrieval system on the subject of space food and nutrition research are described. The system as it currently exists is designed for retrieval of documents, either in hard copy or on microfiche, from the technical files of the MSC Food and Nutrition Section by accession number, author, and/or subject. The system could readily be extended to include retrieval by affiliation, report and contract number, and sponsoring agency should the need arise. It can also be easily converted to computerized retrieval. At present the information retrieval system contains nearly 3000 documents which consist of technical papers, contractors' reports, and reprints obtained from the food and nutrition files at MSC, Technical Library, the library at the Texas Medical Center in Houston, the BMI Technical Libraries, Dr. E. B. Truitt at MBI, and the OSU Medical Libraries. Additional work was done to compile 18 selected bibliographies on subjects of immediate interest on the MSC Food and Nutrition Section.
Imaged Document Optical Correlation and Conversion System (IDOCCS)
NASA Astrophysics Data System (ADS)
Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.
1999-03-01
Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). In addition, many organizations are converting their paper archives to electronic images, which are stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources. The Imaged Document Optical Correlation and Conversion System (IDOCCS) provides a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provides the search and retrieval capability of document images. The IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and can even determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo, or documents with a particular individual's signature block, can be singled out. With this dual capability, IDOCCS outperforms systems that rely on optical character recognition as a basis for indexing and storing only the textual content of documents for later retrieval.
Computer Program and User Documentation Medical Data Input System
NASA Technical Reports Server (NTRS)
Anderson, J.
1971-01-01
Several levels of documentation are presented for the program module of the NASA medical directorate minicomputer storage and retrieval system. The biomedical information system overview gives reasons for the development of the minicomputer storage and retrieval system. It briefly describes all of the program modules which constitute the system. A technical discussion oriented to the programmer is given. Each subroutine is described in enough detail to permit in-depth understanding of the routines and to facilitate program modifications. The program utilization section may be used as a users guide.
Robust keyword retrieval method for OCRed text
NASA Astrophysics Data System (ADS)
Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu
2011-01-01
Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
Information Storage and Retrieval, Scientific Report No. ISR-15.
ERIC Educational Resources Information Center
Salton, Gerard
Several algorithms were investigated which would allow a user to interact with an automatic document retrieval system by requesting relevance judgments on selected sets of documents. Two viewpoints were taken in evaluation. One measured the movement of queries toward the optimum query as defined by Rocchio; the other measured the retrieval…
Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar
2016-07-25
Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
ERIC Educational Resources Information Center
Proceedings of the ASIS Annual Meeting, 1993
1993-01-01
Presents abstracts of 34 special interest group (SIG) sessions. Highlights include humanities scholars and electronic texts; information retrieval and indexing systems design; automated indexing; domain analysis; query expansion in document retrieval systems; thesauri; business intelligence; Americans with Disabilities Act; management;…
Management of technical date in Nihon Doro kodan
NASA Astrophysics Data System (ADS)
Hanada, Jun'ichi
Nihon Doro Kodan Laboratory has collected and contributed technical data (microfiches, aerial photographs, books and literature) on plans, designs, constructions and maintenance of the national expressways and the ordinary toll roads since 1968. This work is systematized on computer to retrieve and contribute data faster. Now Laboratory operates Technical Data Management System which manages all of technical data and Technical Document Management System which manages technical documents. These systems stand on users' on-line retrieval and data accumuration by microfiches and optical disks.
FAPA: Faculty Appointment Policy Archive, 1998. [CD-ROM.
ERIC Educational Resources Information Center
Trower, C. Ann
This CD-ROM presents 220 documents collected in Harvard University's Faculty Appointment Policy Archive (FAPA), the ZyFIND search and retrieval system, and instructions for their use. The FAPA system and ZyFIND allow browsing through documents, inserting bookmarks in documents, attaching notes to documents without modifying them, and selecting…
An integrated information retrieval and document management system
NASA Technical Reports Server (NTRS)
Coles, L. Stephen; Alvarez, J. Fernando; Chen, James; Chen, William; Cheung, Lai-Mei; Clancy, Susan; Wong, Alexis
1993-01-01
This paper describes the requirements and prototype development for an intelligent document management and information retrieval system that will be capable of handling millions of pages of text or other data. Technologies for scanning, Optical Character Recognition (OCR), magneto-optical storage, and multiplatform retrieval using a Standard Query Language (SQL) will be discussed. The semantic ambiguity inherent in the English language is somewhat compensated-for through the use of coefficients or weighting factors for partial synonyms. Such coefficients are used both for defining structured query trees for routine queries and for establishing long-term interest profiles that can be used on a regular basis to alert individual users to the presence of relevant documents that may have just arrived from an external source, such as a news wire service. Although this attempt at evidential reasoning is limited in comparison with the latest developments in AI Expert Systems technology, it has the advantage of being commercially available.
NASA Technical Reports Server (NTRS)
Driscoll, James N.
1994-01-01
The high-speed data search system developed for KSC incorporates existing and emerging information retrieval technology to help a user intelligently and rapidly locate information found in large textual databases. This technology includes: natural language input; statistical ranking of retrieved information; an artificial intelligence concept called semantics, where 'surface level' knowledge found in text is used to improve the ranking of retrieved information; and relevance feedback, where user judgements about viewed information are used to automatically modify the search for further information. Semantics and relevance feedback are features of the system which are not available commercially. The system further demonstrates focus on paragraphs of information to decide relevance; and it can be used (without modification) to intelligently search all kinds of document collections, such as collections of legal documents medical documents, news stories, patents, and so forth. The purpose of this paper is to demonstrate the usefulness of statistical ranking, our semantic improvement, and relevance feedback.
Combining approaches to on-line handwriting information retrieval
NASA Astrophysics Data System (ADS)
Peña Saldarriaga, Sebastián; Viard-Gaudin, Christian; Morin, Emmanuel
2010-01-01
In this work, we propose to combine two quite different approaches for retrieving handwritten documents. Our hypothesis is that different retrieval algorithms should retrieve different sets of documents for the same query. Therefore, significant improvements in retrieval performances can be expected. The first approach is based on information retrieval techniques carried out on the noisy texts obtained through handwriting recognition, while the second approach is recognition-free using a word spotting algorithm. Results shows that for texts having a word error rate (WER) lower than 23%, the performances obtained with the combined system are close to the performances obtained on clean digital texts. In addition, for poorly recognized texts (WER > 52%), an improvement of nearly 17% can be observed with respect to the best available baseline method.
ERIC Educational Resources Information Center
Vickers, P. H.
1983-01-01
Examination of management information systems of three manufacturing firms highlights principal characteristics, document types and functions, main information flows, storage and retrieval systems, and common problems (corporate memory failure, records management, management information systems, general management). A literature review and…
The Electronic Documentation Project in the NASA mission control center environment
NASA Technical Reports Server (NTRS)
Wang, Lui; Leigh, Albert
1994-01-01
NASA's space programs like many other technical programs of its magnitude is supported by a large volume of technical documents. These documents are not only diverse but also abundant. Management, maintenance, and retrieval of these documents is a challenging problem by itself; but, relating and cross-referencing this wealth of information when it is all on a medium of paper is an even greater challenge. The Electronic Documentation Project (EDP) is to provide an electronic system capable of developing, distributing and controlling changes for crew/ground controller procedures and related documents. There are two primary motives for the solution. The first motive is to reduce the cost of maintaining the current paper based method of operations by replacing paper documents with electronic information storage and retrieval. And, the other is to improve the efficiency and provide enhanced flexibility in document usage. Initially, the current paper based system will be faithfully reproduced in an electronic format to be used in the document viewing system. In addition, this metaphor will have hypertext extensions. Hypertext features support basic functions such as full text searches, key word searches, data retrieval, and traversal between nodes of information as well as speeding up the data access rate. They enable related but separate documents to have relationships, and allow the user to explore information naturally through non-linear link traversals. The basic operational requirements of the document viewing system are to: provide an electronic corollary to the current method of paper based document usage; supplement and ultimately replace paper-based documents; maintain focused toward control center operations such as Flight Data File, Flight Rules and Console Handbook viewing; and be available NASA wide.
The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data.
ERIC Educational Resources Information Center
Popovic, Mirko; Willett, Peter
1992-01-01
Reports on the use of stemming for Slovene language documents and queries in free-text retrieval systems and demonstrates that an appropriate stemming algorithm results in an increase in retrieval effectiveness when compared with nonstemming processing. A comparison is made with stemming of English versions of the same documents and queries. (24…
ERIC Educational Resources Information Center
Fox, Edward A.
1987-01-01
Discusses the CODER system, which was developed to investigate the application of artificial intelligence methods to increase the effectiveness of information retrieval systems, particularly those involving heterogeneous documents. Highlights include the use of PROLOG programing, blackboard-based designs, knowledge engineering, lexicological…
Get It Right First Time: A Beginner's Guide to Document Management.
ERIC Educational Resources Information Center
Hayes, Mike
1997-01-01
Document management (DM) systems capture, store, index, retrieve, route, distribute, and archive information in organizations. Discusses "passive" electronic libraries and "active" systems; characteristics of effective systems; implementing a system; fitting a new system to an existing infrastructure; budgets; system…
ERIC Educational Resources Information Center
Parker, Edwin B.
The third annual report (covering the 18-month period from January 1969 to June 1970) of the Stanford Physics Information REtrieval System (SPIRES) project, which is developing an augmented bibliographic retrieval capability, is presented in this document. A first section describes the background of the project and its association with Project…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guyer, H.B.; McChesney, C.A.
The overall primary Objective of HDAR is to create a repository of historical personnel security documents and provide the functionality needed for archival and retrieval use by other software modules and application users of the DISS/ET system. The software product to be produced from this specification is the Historical Document Archival and Retrieval Subsystem The product will provide the functionality to capture, retrieve and manage documents currently contained in the personnel security folders in DOE Operations Offices vaults at various locations across the United States. The long-term plan for DISS/ET includes the requirement to allow for capture and storage ofmore » arbitrary, currently undefined, clearance-related documents that fall outside the scope of the ``cradle-to-grave`` electronic processing provided by DISS/ET. However, this requirement is not within the scope of the requirements specified in this document.« less
Comparing the Document Representations of Two IR-Systems: CLARIT and TOPIC.
ERIC Educational Resources Information Center
Paijmans, Hans
1993-01-01
Compares two information retrieval systems, CLARIT and TOPIC, in terms of assigned versus derived and precoordinate versus postcoordinate indexing. Models of information retrieval systems are discussed, and a test of the systems using a demonstration database of full-text articles from the "Wall Street Journal" is described. (Contains 21…
ERIC Educational Resources Information Center
Kurtz, Peter; And Others
This report is concerned with the implementation of two interrelated computer systems: an automatic document analysis and classification package, and an on-line interactive information retrieval system which utilizes the information gathered during the automatic classification phase. Well-known techniques developed by Salton and Dennis have been…
A hypertext system that learns from user feedback
NASA Technical Reports Server (NTRS)
Mathe, Nathalie
1994-01-01
Retrieving specific information from large amounts of documentation is not an easy task. It could be facilitated if information relevant in the current problem solving context could be automatically supplied to the user. As a first step towards this goal, we have developed an intelligent hypertext system called CID (Computer Integrated Documentation). Besides providing an hypertext interface for browsing large documents, the CID system automatically acquires and reuses the context in which previous searches were appropriate. This mechanism utilizes on-line user information requirements and relevance feedback either to reinforce current indexing in case of success or to generate new knowledge in case of failure. Thus, the user continually augments and refines the intelligence of the retrieval system. This allows the CID system to provide helpful responses, based on previous usage of the documentation, and to improve its performance over time. We successfully tested the CID system with users of the Space Station Freedom requirements documents. We are currently extending CID to other application domains (Space Shuttle operations documents, airplane maintenance manuals, and on-line training). We are also exploring the potential commercialization of this technique.
Electronic Document Delivery: OCLC's Prototype System.
ERIC Educational Resources Information Center
Hickey, Thomas B.; Calabrese, Andrew M.
1986-01-01
Describes development of system for retrieval of documents from magnetic storage that uses stored font definition codes to control an inexpensive laser printer in the production of copies that closely resemble original document. Trends in information equipment and printing industries that will govern future application of this technology are…
User centered and ontology based information retrieval system for life sciences.
Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent
2012-01-25
Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
User centered and ontology based information retrieval system for life sciences
2012-01-01
Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375
The Effects of Noisy Data on Text Retrieval.
ERIC Educational Resources Information Center
Taghva, Kazem; And Others
1994-01-01
Discusses the use of optical character recognition (OCR) for inputting documents in an information retrieval system and describes a study that used an OCR-generated database and its corresponding corrected version to examine query evaluation in the presence of noisy data. Scanning technology, recognition technology, and retrieval technology are…
An evaluation of information retrieval accuracy with simulated OCR output
DOE Office of Scientific and Technical Information (OSTI.GOV)
Croft, W.B.; Harding, S.M.; Taghva, K.
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the difficulty of constructing test collections to obtain this data, we have carried out evaluation using simulated OCR output on a variety of databases. The results show that high quality OCR devices have little effect on the accuracy of retrieval, but low quality devices used with databases of short documents canmore » result in significant degradation.« less
ERIC Educational Resources Information Center
Burton, Adrian P.
1995-01-01
Discusses accessing online electronic documents at the European Telecommunications Satellite Organization (EUTELSAT). Highlights include off-site paper document storage, the document management system, benefits, the EUTELSAT Standard IBM Access software, implementation, the development process, and future enhancements. (AEF)
SDC DOCUMENTS APPLICABLE TO STATE AND LOCAL GOVERNMENT PROBLEMS.
Public administration , Urban and regional planning, The administration of justice, Bio-medical systems, Educational systems, Computer program systems, The development and management of computer-based systems, Information retrieval, Simulation. AD numbers are provided for those documents which can be obtained from the Defense Documentation Center or the Department of Commerce’s Clearinghouse for Federal Scientific and Technical Information.
Kellogg Library and Archive Retrieval System (KLARS) Document Capture Manual. Draft Version.
ERIC Educational Resources Information Center
Hugo, Jane
This manual is designed to supply background information for Kellogg Library and Archive Retrieval System (KLARS) processors and others who might work with the system, outline detailed policies and procedures for processors who prepare and enter data into the adult education database on KLARS, and inform general readers about the system. KLARS is…
Quarantine document system indexing procedure
NASA Technical Reports Server (NTRS)
1972-01-01
The Quarantine Document System (QDS) is described including the indexing procedures and thesaurus of indexing terms. The QDS consists of these functional elements: acquisition, cataloging, indexing, storage, and retrieval. A complete listing of the collection, and the thesaurus are included.
Document image retrieval through word shape coding.
Lu, Shijian; Li, Linlin; Tan, Chew Lim
2008-11-01
This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
An automatic indexing method for medical documents.
Wagner, M. M.
1991-01-01
This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported. PMID:1807564
The Document Management Alliance.
ERIC Educational Resources Information Center
Fay, Chuck
1998-01-01
Describes the Document Management Alliance, a standards effort for document management systems that manages and tracks changes to electronic documents created and used by collaborative teams, provides secure access, and facilitates online information retrieval via the Internet and World Wide Web. Future directions are also discussed. (LRW)
Automated storage and retrieval of data obtained in the Interkosmos project
NASA Technical Reports Server (NTRS)
Ziolkovski, K.; Pakholski, V.
1975-01-01
The formation of a data bank and information retrieval system for scientific data is described. The stored data can be digital or documentation data. Data classification methods are discussed along with definition and compilation of the dictionary utilized, definition of the indexing scheme, and definition of the principles used in constructing a file for documents, data blocks, and tapes. Operating principles are also presented.
NASA Technical Reports Server (NTRS)
1977-01-01
Components of a videotape storage and retrieval system originally developed for NASA have been adapted as a tool for law enforcement agencies. Ampex Corp., Redwood City, Cal., built a unique system for NASA-Marshall. The first application of professional broadcast technology to computerized record-keeping, it incorporates new equipment for transporting tapes within the system. After completing the NASA system, Ampex continued development, primarily to improve image resolution. The resulting advanced system, known as the Ampex Videofile, offers advantages over microfilm for filing, storing, retrieving, and distributing large volumes of information. The system's computer stores information in digital code rather than in pictorial form. While microfilm allows visual storage of whole documents, it requires a step before usage--developing the film. With Videofile, the actual document is recorded, complete with photos and graphic material, and a picture of the document is available instantly.
INFORMATION RETRIEVAL EXPERIMENT. FINAL REPORT.
ERIC Educational Resources Information Center
SELYE, HANS
THIS REPORT IS A BRIEF REVIEW OF RESULTS OF AN EXPERIMENT TO DETERMINE THE INFORMATION RETRIEVAL EFFICIENCY OF A MANUAL SPECIALIZED INFORMATION SYSTEM BASED ON 700,000 DOCUMENTS IN THE FIELDS OF ENDOCRINOLOGY, STRESS, MAST CELLS, AND ANAPHYLACTOID REACTIONS. THE SYSTEM RECEIVES 30,000 PUBLICATIONS ANNUALLY. DETAILED INFORMATION IS REPRESENTED BY…
Jaccard Similarity Leads to the Marczewski-Steinhaus Topology for Information Retrieval.
ERIC Educational Resources Information Center
Rousseau, Ronald
1998-01-01
Demonstrates that if the similarity function of a retrieval system leads to a (pseudo-) metric, the retrieval, similarity and Everett-Cater metric topology coincide and are different from the discrete topology; this is the case if documents are represented by lists, using the Jaccard similarity measure. The corresponding metric is the…
What Friends Are For: Collaborative Intelligence Analysis and Search
2014-06-01
14. SUBJECT TERMS Intelligence Community, information retrieval, recommender systems , search engines, social networks, user profiling, Lucene...improvements over existing search systems . The improvements are shown to be robust to high levels of human error and low similarity between users ...precision NOLH nearly orthogonal Latin hypercubes P@ precision at documents RS recommender systems TREC Text REtrieval Conference USM user
A STORAGE AND RETRIEVAL SYSTEM FOR DOCUMENTS IN INSTRUCTIONAL RESOURCES. REPORT NO. 13.
ERIC Educational Resources Information Center
DIAMOND, ROBERT M.; LEE, BERTA GRATTAN
IN ORDER TO IMPROVE INSTRUCTION WITHIN TWO-YEAR LOWER DIVISION COURSES, A COMPREHENSIVE RESOURCE LIBRARY WAS DEVELOPED AND A SIMPLIFIED CATALOGING AND INFORMATION RETRIEVAL SYSTEM WAS APPLIED TO IT. THE ROYAL MCBEE "KEYDEX" SYSTEM, CONTAINING THREE MAJOR COMPONENTS--A PUNCH MACHINE, FILE CARDS, AND A LIGHT BOX--WAS USED. CARDS WERE HEADED WITH KEY…
Microsoft Research at TREC 2009. Web and Relevance Feedback Tracks
2009-11-01
Information Processing Systems, pages 193–200, 2006. [2] J . M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th...Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of the 3rd Text REtrieval Conference, 1994. [8] J . J . Rocchio. Relevance...feedback in information retrieval. In Gerard Salton , editor, The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall
An XML-based system for the flexible classification and retrieval of clinical practice guidelines.
Ganslandt, T.; Mueller, M. L.; Krieglstein, C. F.; Senninger, N.; Prokosch, H. U.
2002-01-01
Beneficial effects of clinical practice guidelines (CPGs) have not yet reached expectations due to limited routine adoption. Electronic distribution and reminder systems have the potential to overcome implementation barriers. Existing electronic CPG repositories like the National Guideline Clearinghouse (NGC) provide individual access but lack standardized computer-readable interfaces necessary for automated guideline retrieval. The aim of this paper was to facilitate automated context-based selection and presentation of CPGs. Using attributes from the NGC classification scheme, an XML-based metadata repository was successfully implemented, providing document storage, classification and retrieval functionality. Semi-automated extraction of attributes was implemented for the import of XML guideline documents using XPath. A hospital information system interface was exemplarily implemented for diagnosis-based guideline invocation. Limitations of the implemented system are discussed and possible future work is outlined. Integration of standardized computer-readable search interfaces into existing CPG repositories is proposed. PMID:12463831
Words, concepts, or both: optimal indexing units for automated information retrieval.
Hersh, W. R.; Hickam, D. H.; Leone, T. J.
1992-01-01
What is the best way to represent the content of documents in an information retrieval system? This study compares the retrieval effectiveness of five different methods for automated (machine-assigned) indexing using three test collections. The consistently best methods are those that use indexing based on the words that occur in the available text of each document. Methods used to map text into concepts from a controlled vocabulary showed no advantage over the word-based methods. This study also looked at an approach to relevance feedback which showed benefit for both word-based and concept-based methods. PMID:1482951
ERIC Educational Resources Information Center
Girill, T. R.
1991-01-01
This article continues the description of DFT (Document, Find, Theseus), an online documentation system that provides computer-managed on-demand printing of software manuals as well as the interactive retrieval of reference passages. Document boundaries in the hypertext database are discussed, search vocabulary complexities are described, and text…
NASA Technical Reports Server (NTRS)
Perry, Charleen M.; Vansteenberg, Michael E.
1992-01-01
The National Space Science Data Center (NSSDC) has developed an automated data retrieval request service utilizing our Data Archive and Distribution Service (NDADS) computer system. NDADS currently has selected project data written to optical disk platters with the disks residing in a robotic 'jukebox' near-line environment. This allows for rapid and automated access to the data with no staff intervention required. There are also automated help information and user services available that can be accessed. The request system permits an average-size data request to be completed within minutes of the request being sent to NSSDC. A mail message, in the format described in this document, retrieves the data and can send it to a remote site. Also listed in this document are the data currently available.
Medical Language Processing for Knowledge Representation and Retrievals
Lyman, Margaret; Sager, Naomi; Chi, Emile C.; Tick, Leo J.; Nhan, Ngo Thanh; Su, Yun; Borst, Francois; Scherrer, Jean-Raoul
1989-01-01
The Linguistic String Project-Medical Language Processor, a system for computer analysis of narrative patient documents in English, is being adapted for French Lettres de Sortie. The system converts the free-text input to a semantic representation which is then mapped into a relational database. Retrievals of clinical data from the database are described.
ERIC Educational Resources Information Center
Buchholz, James L.
This document summarizes the selection, configuration, implementation, and evaluation of BiblioFile, a CD-ROM based bibliographic retrieval system used to catalog and process library materials for 103 school centers in the Palm Beach County Schools (Florida). Technical processing included the production of spine labels, check-out cards and…
Enhancing Retrieval with Hyperlinks: A General Model Based on Propositional Argumentation Systems.
ERIC Educational Resources Information Center
Picard, Justin; Savoy, Jacques
2003-01-01
Discusses the use of hyperlinks for improving information retrieval on the World Wide Web and proposes a general model for using hyperlinks based on Probabilistic Argumentation Systems. Topics include propositional logic, knowledge, and uncertainty; assumptions; using hyperlinks to modify document score and rank; and estimating the popularity of a…
ERIC Educational Resources Information Center
Salton, G.
1972-01-01
The author emphasized that one cannot conclude from the experiments reported upon that term clusters (or equivalently, keyword classifications or thesauruses) are not useful in retrieval. (2 references) (Author)
Image/text automatic indexing and retrieval system using context vector approach
NASA Astrophysics Data System (ADS)
Qing, Kent P.; Caid, William R.; Ren, Clara Z.; McCabe, Patrick
1995-11-01
Thousands of documents and images are generated daily both on and off line on the information superhighway and other media. Storage technology has improved rapidly to handle these data but indexing this information is becoming very costly. HNC Software Inc. has developed a technology for automatic indexing and retrieval of free text and images. This technique is demonstrated and is based on the concept of `context vectors' which encode a succinct representation of the associated text and features of sub-image. In this paper, we will describe the Automated Librarian System which was designed for free text indexing and the Image Content Addressable Retrieval System (ICARS) which extends the technique from the text domain into the image domain. Both systems have the ability to automatically assign indices for a new document and/or image based on the content similarities in the database. ICARS also has the capability to retrieve images based on similarity of content using index terms, text description, and user-generated images as a query without performing segmentation or object recognition.
NASA Astrophysics Data System (ADS)
Taira, Ricky K.; Wong, Clement; Johnson, David; Bhushan, Vikas; Rivera, Monica; Huang, Lu J.; Aberle, Denise R.; Cardenas, Alfonso F.; Chu, Wesley W.
1995-05-01
With the increase in the volume and distribution of images and text available in PACS and medical electronic health-care environments it becomes increasingly important to maintain indexes that summarize the content of these multi-media documents. Such indices are necessary to quickly locate relevant patient cases for research, patient management, and teaching. The goal of this project is to develop an intelligent document retrieval system that allows researchers to request for patient cases based on document content. Thus we wish to retrieve patient cases from electronic information archives that could include a combined specification of patient demographics, low level radiologic findings (size, shape, number), intermediate-level radiologic findings (e.g., atelectasis, infiltrates, etc.) and/or high-level pathology constraints (e.g., well-differentiated small cell carcinoma). The cases could be distributed among multiple heterogeneous databases such as PACS, RIS, and HIS. Content- based retrieval systems go beyond the capabilities of simple key-word or string-based retrieval matching systems. These systems require a knowledge base to comprehend the generality/specificity of a concept (thus knowing the subclasses or related concepts to a given concept) and knowledge of the various string representations for each concept (i.e., synonyms, lexical variants, etc.). We have previously reported on a data integration mediation layer that allows transparent access to multiple heterogeneous distributed medical databases (HIS, RIS, and PACS). The data access layer of our architecture currently has limited query processing capabilities. Given a patient hospital identification number, the access mediation layer collects all documents in RIS and HIS and returns this information to a specified workstation location. In this paper we report on our efforts to extend the query processing capabilities of the system by creation of custom query interfaces, an intelligent query processing engine, and a document-content index that can be generated automatically (i.e., no manual authoring or changes to the normal clinical protocols).
26 CFR 1.1471-1 - Scope of chapter 4 and definitions.
Code of Federal Regulations, 2013 CFR
2013-04-01
... an image retrieval system (such as portable document format (.pdf) or scanned documents). (35) Entity..., custodial institution, or specified insurance company. (124) TIN. The term TIN means the tax identifying...
26 CFR 1.1471-1 - Scope of chapter 4 and definitions.
Code of Federal Regulations, 2014 CFR
2014-04-01
... an image retrieval system (such as portable document format (.pdf) or scanned documents). (39) Entity..., custodial institution, or specified insurance company. (133) TIN. The term TIN means the tax identifying...
A Re-Unification of Two Competing Models for Document Retrieval.
ERIC Educational Resources Information Center
Bodoff, David
1999-01-01
Examines query-oriented versus document-oriented information retrieval and feedback learning. Highlights include a reunification of the two approaches for probabilistic document retrieval and for vector space model (VSM) retrieval; learning in VSM and in probabilistic models; multi-dimensional scaling; and ongoing field studies. (LRW)
Let Documents Talk to Each Other: A Computer Model for Connection of Short Documents.
ERIC Educational Resources Information Center
Chen, Z.
1993-01-01
Discusses the integration of scientific texts through the connection of documents and describes a computer model that can connect short documents. Information retrieval and artificial intelligence are discussed; a prototype system of the model is explained; and the model is compared to other computer models. (17 references) (LRW)
Automated Management Of Documents
NASA Technical Reports Server (NTRS)
Boy, Guy
1995-01-01
Report presents main technical issues involved in computer-integrated documentation. Problems associated with automation of management and maintenance of documents analyzed from perspectives of artificial intelligence and human factors. Technologies that may prove useful in computer-integrated documentation reviewed: these include conventional approaches to indexing and retrieval of information, use of hypertext, and knowledge-based artificial-intelligence systems.
NASA Technical Reports Server (NTRS)
Lee, T.; Boland, D. F., Jr.
1980-01-01
This document presents the results of an extensive survey and comparative evaluation of current atmosphere and wind models for inclusion in the Langley Atmospheric Information Retrieval System (LAIRS). It includes recommended models for use in LAIRS, estimated accuracies for the recommended models, and functional specifications for the development of LAIRS.
Medication order communication using fax and document-imaging technologies.
Simonian, Armen I
2008-03-15
The implementation of fax and document-imaging technology to electronically communicate medication orders from nursing stations to the pharmacy is described. The evaluation of a commercially available pharmacy order imaging system to improve order communication and to make document retrieval more efficient led to the selection and customization of a system already licensed and used in seven affiliated hospitals. The system consisted of existing fax machines and document-imaging software that would capture images of written orders and send them from nursing stations to a central database server. Pharmacists would then retrieve the images and enter the orders in an electronic medical record system. The pharmacy representatives from all seven hospitals agreed on the configuration and functionality of the custom application. A 30-day trial of the order imaging system was successfully conducted at one of the larger institutions. The new system was then implemented at the remaining six hospitals over a period of 60 days. The transition from a paper-order system to electronic communication via a standardized pharmacy document management application tailored to the specific needs of this health system was accomplished. A health system with seven affiliated hospitals successfully implemented electronic communication and the management of inpatient paper-chart orders by using faxes and document-imaging technology. This standardized application eliminated the problems associated with the hand delivery of paper orders, the use of the pneumatic tube system, and the printing of traditional faxes.
Langley Atmospheric Information Retrieval System (LAIRS): System description and user's guide
NASA Technical Reports Server (NTRS)
Boland, D. E., Jr.; Lee, T.
1982-01-01
This document presents the user's guide, system description, and mathematical specifications for the Langley Atmospheric Information Retrieval System (LAIRS). It also includes a description of an optimal procedure for operational use of LAIRS. The primary objective of the LAIRS Program is to make it possible to obtain accurate estimates of atmospheric pressure, density, temperature, and winds along Shuttle reentry trajectories for use in postflight data reduction.
Aquaculture Thesaurus: Descriptors Used in the National Aquaculture Information System.
ERIC Educational Resources Information Center
Lanier, James A.; And Others
This document provides a listing of descriptors used in the National Aquaculture Information System (NAIS), a computer information storage and retrieval system on marine, brackish, and freshwater organisms. Included are an explanation of how to use the document, subject index terms, and a brief bibliography of the literature used in developing the…
ERIC Educational Resources Information Center
Egghe, L.; Michel, C.
2003-01-01
Ordered sets (OS) of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents need to be extended to these ordered sets. This is done in this article using fuzzy set techniques. The practical usability of the OS-measures is…
Document image database indexing with pictorial dictionary
NASA Astrophysics Data System (ADS)
Akbari, Mohammad; Azimi, Reza
2010-02-01
In this paper we introduce a new approach for information retrieval from Persian document image database without using Optical Character Recognition (OCR).At first an attribute called subword upper contour label is defined then, a pictorial dictionary is constructed based on this attribute for the subwords. By this approach we address two issues in document image retrieval: keyword spotting and retrieval according to the document similarities. The proposed methods have been evaluated on a Persian document image database. The results have proved the ability of this approach in document image information retrieval.
A Strategy for Reusing the Data of Electronic Medical Record Systems for Clinical Research.
Matsumura, Yasushi; Hattori, Atsushi; Manabe, Shiro; Tsuda, Tsutomu; Takeda, Toshihiro; Okada, Katsuki; Murata, Taizo; Mihara, Naoki
2016-01-01
There is a great need to reuse data stored in electronic medical records (EMR) databases for clinical research. We previously reported the development of a system in which progress notes and case report forms (CRFs) were simultaneously recorded using a template in the EMR in order to exclude redundant data entry. To make the data collection process more efficient, we are developing a system in which the data originally stored in the EMR database can be populated within a frame in a template. We developed interface plugin modules that retrieve data from the databases of other EMR applications. A universal keyword written in a template master is converted to a local code using a data conversion table, then the objective data is retrieved from the corresponding database. The template element data, which are entered by a template, are stored in the template element database. To retrieve the data entered by other templates, the objective data is designated by the template element code with the template code, or by the concept code if it is written for the element. When the application systems in the EMR generate documents, they also generate a PDF file and a corresponding document profile XML, which includes important data, and send them to the document archive server and the data sharing saver, respectively. In the data sharing server, the data are represented by an item with an item code with a document class code and its value. By linking a concept code to an item identifier, an objective data can be retrieved by designating a concept code. We employed a flexible strategy in which a unique identifier for a hospital is initially attached to all of the data that the hospital generates. The identifier is secondarily linked with concept codes. The data that are not linked with a concept code can also be retrieved using the unique identifier of the hospital. This strategy makes it possible to reuse any of a hospital's data.
Design Package for Fuel Retrieval System Fuel Handling Tool Modification
DOE Office of Scientific and Technical Information (OSTI.GOV)
TEDESCHI, D.J.
This design package documents design, fabrication, and testing of new stinger tool design. Future revisions will document further development of the stinger tool and incorporate various developmental stages, and final test results.
Inverted File Compression through Document Identifier Reassignment.
ERIC Educational Resources Information Center
Shieh, Wann-Yun; Chen, Tien-Fu; Shann, Jean Jyh-Jiun; Chung, Chung-Ping
2003-01-01
Discusses the use of inverted files in information retrieval systems and proposes a document identifier reassignment method to reduce the average gap values in an inverted file. Highlights include the d-gap technique; document similarity; heuristic algorithms; file compression; and performance evaluation from a simulation environment. (LRW)
A Novel Navigation Paradigm for XML Repositories.
ERIC Educational Resources Information Center
Azagury, Alain; Factor, Michael E.; Maarek, Yoelle S.; Mandler, Benny
2002-01-01
Discusses data exchange over the Internet and describes the architecture and implementation of an XML document repository that promotes a navigation paradigm for XML documents based on content and context. Topics include information retrieval and semistructured documents; and file systems as information storage infrastructure, particularly XMLFS.…
Automated documentation generator for advanced protein crystal growth
NASA Technical Reports Server (NTRS)
Maddux, Gary A.; Provancha, Anna; Chattam, David
1994-01-01
To achieve an environment less dependent on the flow of paper, automated techniques of data storage and retrieval must be utilized. This software system, 'Automated Payload Experiment Tool,' seeks to provide a knowledge-based, hypertext environment for the development of NASA documentation. Once developed, the final system should be able to guide a Principal Investigator through the documentation process in a more timely and efficient manner, while supplying more accurate information to the NASA payload developer. The current system is designed for the development of the Science Requirements Document (SRD), the Experiment Requirements Document (ERD), the Project Plan, and the Safety Requirements Document.
Electronic Document Management Systems: Where Are They Today?
ERIC Educational Resources Information Center
Koulopoulos, Thomas M.; Frappaolo, Carl
1993-01-01
Discusses developments in document management systems based on a survey of over 400 corporations and government agencies. Text retrieval and imaging markets, architecture and integration, purchasing plans, and vendor market leaders are covered. Five graphs present data on user preferences for improvements. A sidebar article reviews the development…
Case retrieval in medical databases by fusing heterogeneous information.
Quellec, Gwénolé; Lamard, Mathieu; Cazuguel, Guy; Roux, Christian; Cochener, Béatrice
2011-01-01
A novel content-based heterogeneous information retrieval framework, particularly well suited to browse medical databases and support new generation computer aided diagnosis (CADx) systems, is presented in this paper. It was designed to retrieve possibly incomplete documents, consisting of several images and semantic information, from a database; more complex data types such as videos can also be included in the framework. The proposed retrieval method relies on image processing, in order to characterize each individual image in a document by their digital content, and information fusion. Once the available images in a query document are characterized, a degree of match, between the query document and each reference document stored in the database, is defined for each attribute (an image feature or a metadata). A Bayesian network is used to recover missing information if need be. Finally, two novel information fusion methods are proposed to combine these degrees of match, in order to rank the reference documents by decreasing relevance for the query. In the first method, the degrees of match are fused by the Bayesian network itself. In the second method, they are fused by the Dezert-Smarandache theory: the second approach lets us model our confidence in each source of information (i.e., each attribute) and take it into account in the fusion process for a better retrieval performance. The proposed methods were applied to two heterogeneous medical databases, a diabetic retinopathy database and a mammography screening database, for computer aided diagnosis. Precisions at five of 0.809 ± 0.158 and 0.821 ± 0.177, respectively, were obtained for these two databases, which is very promising.
Content Recognition and Context Modeling for Document Analysis and Retrieval
ERIC Educational Resources Information Center
Zhu, Guangyu
2009-01-01
The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval.…
ERIC Educational Resources Information Center
Melton, Jessica S.
Objectives of this project were to develop and test a method for automatically processing the text of abstracts for a document retrieval system. The test corpus consisted of 768 abstracts from the metallurgical section of Chemical Abstracts (CA). The system, based on a subject indexing rational, had two components: (1) a stored dictionary of words…
Computer Program and User Documentation Medical Data Update System
NASA Technical Reports Server (NTRS)
Anderson, J.
1971-01-01
The update system for the NASA medical data minicomputer storage and retrieval system is described. The discussion includes general and technical specifications, a subroutine list, and programming instructions.
Creating and indexing teaching files from free-text patient reports.
Johnson, D. B.; Chu, W. W.; Dionisio, J. D.; Taira, R. K.; Kangarloo, H.
1999-01-01
Teaching files based on real patient data can enhance the education of students, staff and other colleagues. Although information retrieval system can index free-text documents using keywords, these systems do not work well where content bearing terms (e.g., anatomy descriptions) frequently appears. This paper describes a system that uses multi-word indexing terms to provide access to free-text patient reports. The utilization of multi-word indexing allows better modeling of the content of medical reports, thus improving retrieval performance. The method used to select indexing terms as well as early evaluation of retrieval performance is discussed. PMID:10566473
Project W-211 initial tank retrieval systems year 2000 compliance assessment project plan
DOE Office of Scientific and Technical Information (OSTI.GOV)
BUSSELL, J.H.
1999-08-24
This assessment describes the potential Year 2000 (Y2K) problems and describes the methods for achieving Y2K Compliance for Project W-211, Initial Tank Retrieval Systems (ITRS). The purpose of this assessment is to give an overview of the project. This document will not be updated and any dates contained in this document are estimates and may change. The scope of project W-211 is to provide systems for retrieval of radioactive wastes from ten double-shell tanks (DST). systems will be installed in tanks 102-AP, 104-AP, 105-AN, 104-AN, 102-AZ, 101-AW, 103-AN, 107-AN, 102-AY, and 102-SY. The current tank selection and sequence supports phasemore » I feed delivery to privatized processing plants. A detailed description of system dates, functions, interfaces, potential Y2K problems, and date resolutions can not be described since the project is in the definitive design phase. This assessment will describe the methods, protocols, and practices to assure that equipment and systems do not have Y2K problems.« less
How to Handle the Avalanche of Online Documentation.
ERIC Educational Resources Information Center
Nolan, Maureen P.
1981-01-01
The method of handling the printed documentation associated with online information retrieval, which is described, involves the use of a series of separate but related files: database files, system files, network files, index sheets, and equipment files. (FM)
Health consumer-oriented information retrieval.
Claveau, Vincent; Hamon, Thierry; Le Maguer, Sébastien; Grabar, Natalia
2015-01-01
While patients can freely access their Electronic Health Records or online health information, they may not be able to correctly understand the content of these documents. One of the challenges is related to the difference between expert and non-expert languages. We propose to investigate this issue within the Information Retrieval field. The patient queries have to be associated with the corresponding expert documents, that provide trustworthy information. Our approach relies on a state-of-the-art IR system called Indri and on semantic resources. Different query expansion strategies are explored. Our system shows up to 0.6740 P@10, up to 0.7610 R@10, and up to 0.6793 NDCG@10.
Dynamic reduction of dimensions of a document vector in a document search and retrieval system
Jiao, Yu; Potok, Thomas E.
2011-05-03
The method and system of the invention involves processing each new document (20) coming into the system into a document vector (16), and creating a document vector with reduced dimensionality (17) for comparison with the data model (15) without recomputing the data model (15). These operations are carried out by a first computer (11) while a second computer (12) updates the data model (18), which can be comprised of an initial large group of documents (19) and is premised on the computing an initial data model (13, 14, 15) to provide a reference point for determining document vectors from documents processed from the data stream (20).
Statistical Techniques for Efficient Indexing and Retrieval of Document Images
ERIC Educational Resources Information Center
Bhardwaj, Anurag
2010-01-01
We have developed statistical techniques to improve the performance of document image search systems where the intermediate step of OCR based transcription is not used. Previous research in this area has largely focused on challenges pertaining to generation of small lexicons for processing handwritten documents and enhancement of poor quality…
Information retrieval system utilizing wavelet transform
Brewster, Mary E.; Miller, Nancy E.
2000-01-01
A method for automatically partitioning an unstructured electronically formatted natural language document into its sub-topic structure. Specifically, the document is converted to an electronic signal and a wavelet transform is then performed on the signal. The resultant signal may then be used to graphically display and interact with the sub-topic structure of the document.
ERIC Educational Resources Information Center
Freeman, Robert R.
A set of twenty five questions was processed against a computer-stored file of 9159 document references in the field of ferrous metallurgy, representing the 1965 coverage of the Iron and Steel Institute (London) information service. A basis for evaluation of system performance characteristics and analysis of system failures was provided by using…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
The Gridded Model Information Support System (GMISS) is a data base management system for selected Regional Oxidant Model (ROM) input data and species concentrations produced by gridded photochemical air pollution models. The Model Concentration Data Retrieval Subsystem allows State and local air pollution control agencies to retrieve these hourly data for use in support of their regulatory programs. These hourly data may be used to calculate initial and boundary conditions for the Empirical Kinetics Modeling Approach (EKMA). They may be used for other modeling application needs as well as to support evaluation of regional emission controls strategies. Both temporal andmore » spatial subsets of the data may be retrieved. The document describes how to invoke and execute the Model Concentration Data Retrieval Subsystem using the full screen menus.« less
Applying Hypertext Structures to Software Documentation.
ERIC Educational Resources Information Center
French, James C.; And Others
1997-01-01
Describes a prototype system for software documentation management called SLEUTH (Software Literacy Enhancing Usefulness to Humans) being developed at the University of Virginia. Highlights include information retrieval techniques, hypertext links that are installed automatically, a WAIS (Wide Area Information Server) search engine, user…
Using Replicates in Information Retrieval Evaluation.
Voorhees, Ellen M; Samarov, Daniel; Soboroff, Ian
2017-09-01
This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions-something not possible without replicates-yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness.
Using Replicates in Information Retrieval Evaluation
VOORHEES, ELLEN M.; SAMAROV, DANIEL; SOBOROFF, IAN
2018-01-01
This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions—something not possible without replicates—yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness. PMID:29905334
An overview of selected information storage and retrieval issues in computerized document processing
NASA Technical Reports Server (NTRS)
Dominick, Wayne D. (Editor); Ihebuzor, Valentine U.
1984-01-01
The rapid development of computerized information storage and retrieval techniques has introduced the possibility of extending the word processing concept to document processing. A major advantage of computerized document processing is the relief of the tedious task of manual editing and composition usually encountered by traditional publishers through the immense speed and storage capacity of computers. Furthermore, computerized document processing provides an author with centralized control, the lack of which is a handicap of the traditional publishing operation. A survey of some computerized document processing techniques is presented with emphasis on related information storage and retrieval issues. String matching algorithms are considered central to document information storage and retrieval and are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berglin, E.J.
1996-09-17
Westinghouse Hanford Company (WHC) is exploring commercial methods for retrieving waste from the underground storage tanks at the Hanford site in south central Washington state. WHC needs data on commercial retrieval systems equipment in order to make programmatic decisions for waste retrieval. Full system testing of retrieval processes is to be demonstrated in phases through September 1997 in support of programs aimed to Acquire Commercial Technology for Retrieval (ACTR) and at the Hanford Tanks Initiative (HTI). One of the important parts of the integrated testing will be the deployment of retrieval tools using manipulator-based systems. WHC requires an assessment ofmore » a number of commercial deployment systems that have been identified by the ACTR program as good candidates to be included in an integrated testing effort. Included in this assessment should be an independent evaluation of manipulator tests performed to date, so that WHC can construct an integrated test based on these systems. The objectives of this document are to provide a description of the need, requirements, and constraints for a manipulator-based retrieval system; to evaluate manipulator-based concepts and testing performed to date by a number of commercial organizations; and to identify issues to be resolved through testing and/or analysis for each concept.« less
ERIC Educational Resources Information Center
Girill, T. R.; And Others
1991-01-01
Describes enhancements made to a hypertext information retrieval system at the National Energy Research Supercomputer Center (NERSC) called DFT (Document, Find, and Theseus). The enrichment of DFT's entry vocabulary is described, DFT and other hypertext systems are compared, and problems that occur due to the need for frequent updates are…
Topic Models in Information Retrieval
2007-08-01
Information Processing Systems, Cambridge, MA, MIT Press, 2004. Brown, P.F., Della Pietra, V.J., deSouza, P.V., Lai, J.C. and Mercer, R.L., Class-based...2003. http://www.wkap.nl/prod/b/1-4020-1216-0. Croft, W.B., Lucia , T.J., Cringean, J., and Willett, P., Retrieving Documents By Plausible Inference
Using Taxonomic Indexing Trees to Efficiently Retrieve SCORM-Compliant Documents in e-Learning Grids
ERIC Educational Resources Information Center
Shih, Wen-Chung; Tseng, Shian-Shyong; Yang, Chao-Tung
2008-01-01
With the flourishing development of e-Learning, more and more SCORM-compliant teaching materials are developed by institutes and individuals in different sites. In addition, the e-Learning grid is emerging as an infrastructure to enhance traditional e-Learning systems. Therefore, information retrieval schemes supporting SCORM-compliant documents…
A Study of Adaptive Relevance Feedback - UIUC TREC-2008 Relevance Feedback Experiments
2008-11-01
terms. Journal of the American Society for Information Science, 27(3):129–146, 1976. [7] J . J . Rocchio. Relevance feedback in information retrieval. In...In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall Inc., 1971. [8] Gerard Salton and Chris
75 FR 72829 - Los Alamos Historical Document Retrieval and Assessment (LAHDRA) Project
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-26
... Historical Document Retrieval and Assessment (LAHDRA) Project The Centers for Disease Control and Prevention... release of the Final Report of the Los Alamos Historical Document Retrieval and Assessment (LAHDRA)Project... information about historical chemical or radionuclide releases from facilities at the Los Alamos National...
ERIC Educational Resources Information Center
Cox, John
This paper documents the program used in the application of the INFO system for data storage and retrieval in schools, from the viewpoints of both the unsophisticated user and the experienced programmer interested in using the INFO system or modifying it for use within an existing school's computer system. The opening user's guide presents simple…
Scalable ranked retrieval using document images
NASA Astrophysics Data System (ADS)
Jain, Rajiv; Oard, Douglas W.; Doermann, David
2013-12-01
Despite the explosion of text on the Internet, hard copy documents that have been scanned as images still play a significant role for some tasks. The best method to perform ranked retrieval on a large corpus of document images, however, remains an open research question. The most common approach has been to perform text retrieval using terms generated by optical character recognition. This paper, by contrast, examines whether a scalable segmentation-free image retrieval algorithm, which matches sub-images containing text or graphical objects, can provide additional benefit in satisfying a user's information needs on a large, real world dataset. Results on 7 million scanned pages from the CDIP v1.0 test collection show that content based image retrieval finds a substantial number of documents that text retrieval misses, and that when used as a basis for relevance feedback can yield improvements in retrieval effectiveness.
Information retrieval system utilizing wavelet transform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brewster, M.E.; Miller, N.E.
A method is disclosed for automatically partitioning an unstructured electronically formatted natural language document into its sub-topic structure. Specifically, the document is converted to an electronic signal and a wavelet transform is then performed on the signal. The resultant signal may then be used to graphically display and interact with the sub-topic structure of the document.
A Study and Model of Machine-Like Indexing Behavior by Human Indexers.
ERIC Educational Resources Information Center
McAllister, Caryl
Although a large part of a document retrieval system's resources are devoted to indexing, the question of how people do subject indexing has been the subject of much conjecture and only a little experimentation. This dissertation examines the relationships between a document being indexed and the index terms assigned to that document in an attempt…
Font adaptive word indexing of modern printed documents.
Marinai, Simone; Marino, Emanuele; Soda, Giovanni
2006-08-01
We propose an approach for the word-level indexing of modern printed documents which are difficult to recognize using current OCR engines. By means of word-level indexing, it is possible to retrieve the position of words in a document, enabling queries involving proximity of terms. Web search engines implement this kind of indexing, allowing users to retrieve Web pages on the basis of their textual content. Nowadays, digital libraries hold collections of digitized documents that can be retrieved either by browsing the document images or relying on appropriate metadata assembled by domain experts. Word indexing tools would therefore increase the access to these collections. The proposed system is designed to index homogeneous document collections by automatically adapting to different languages and font styles without relying on OCR engines for character recognition. The approach is based on three main ideas: the use of Self Organizing Maps (SOM) to perform unsupervised character clustering, the definition of one suitable vector-based word representation whose size depends on the word aspect-ratio, and the run-time alignment of the query word with indexed words to deal with broken and touching characters. The most appropriate applications are for processing modern printed documents (17th to 19th centuries) where current OCR engines are less accurate. Our experimental analysis addresses six data sets containing documents ranging from books of the 17th century to contemporary journals.
Full-scale system impact analysis: Digital document storage project
NASA Technical Reports Server (NTRS)
1989-01-01
The Digital Document Storage Full Scale System can provide cost effective electronic document storage, retrieval, hard copy reproduction, and remote access for users of NASA Technical Reports. The desired functionality of the DDS system is highly dependent on the assumed requirements for remote access used in this Impact Analysis. It is highly recommended that NASA proceed with a phased, communications requirement analysis to ensure that adequate communications service can be supplied at a reasonable cost in order to validate recent working assumptions upon which the success of the DDS Full Scale System is dependent.
Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease
Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.
1998-01-01
The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.
Pilot production system cost/benefit analysis: Digital document storage project
NASA Technical Reports Server (NTRS)
1989-01-01
The Digital Document Storage (DDS)/Pilot Production System (PPS) will provide cost effective electronic document storage, retrieval, hard copy reproduction, and remote access for users of NASA Technical Reports. The DDS/PPS will result in major benefits, such as improved document reproduction quality within a shorter time frame than is currently possible. In addition, the DDS/PPS will provide an important strategic value through the construction of a digital document archive. It is highly recommended that NASA proceed with the DDS Prototype System and a rapid prototyping development methodology in order to validate recent working assumptions upon which the success of the DDS/PPS is dependent.
A Survey in Indexing and Searching XML Documents.
ERIC Educational Resources Information Center
Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James
2002-01-01
Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…
JANE, A new information retrieval system for the Radiation Shielding Information Center
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trubey, D.K.
A new information storage and retrieval system has been developed for the Radiation Shielding Information Center (RSIC) at Oak Ridge National Laboratory to replace mainframe systems that have become obsolete. The database contains citations and abstracts of literature which were selected by RSIC analysts and indexed with terms from a controlled vocabulary. The database, begun in 1963, has been maintained continuously since that time. The new system, called JANE, incorporates automatic indexing techniques and on-line retrieval using the RSIC Data General Eclipse MV/4000 minicomputer, Automatic indexing and retrieval techniques based on fuzzy-set theory allow the presentation of results in ordermore » of Retrieval Status Value. The fuzzy-set membership function depends on term frequency in the titles and abstracts and on Term Discrimination Values which indicate the resolving power of the individual terms. These values are determined by the Cover Coefficient method. The use of a commercial database base to store and retrieve the indexing information permits rapid retrieval of the stored documents. Comparisons of the new and presently-used systems for actual searches of the literature indicate that it is practical to replace the mainframe systems with a minicomputer system similar to the present version of JANE. 18 refs., 10 figs.« less
An overview of the National Space Science data Center Standard Information Retrieval System (SIRS)
NASA Technical Reports Server (NTRS)
Shapiro, A.; Blecher, S.; Verson, E. E.; King, M. L. (Editor)
1974-01-01
A general overview is given of the National Space Science Data Center (NSSDC) Standard Information Retrieval System. A description, in general terms, the information system that contains the data files and the software system that processes and manipulates the files maintained at the Data Center. Emphasis is placed on providing users with an overview of the capabilities and uses of the NSSDC Standard Information Retrieval System (SIRS). Examples given are taken from the files at the Data Center. Detailed information about NSSDC data files is documented in a set of File Users Guides, with one user's guide prepared for each file processed by SIRS. Detailed information about SIRS is presented in the SIRS Users Guide.
Imaged document information location and extraction using an optical correlator
NASA Astrophysics Data System (ADS)
Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.
1999-12-01
Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). Many of these organizations are converting their paper archives to electronic images, which are then stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources and provide for rapid access to the information contained within these imaged documents. To meet this need, Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provide a means for the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and has the potential to determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.
The integration of system specifications and program coding
NASA Technical Reports Server (NTRS)
Luebke, W. R.
1970-01-01
Experience in maintaining up-to-date documentation for one module of the large-scale Medical Literature Analysis and Retrieval System 2 (MEDLARS 2) is described. Several innovative techniques were explored in the development of this system's data management environment, particularly those that use PL/I as an automatic documenter. The PL/I data description section can provide automatic documentation by means of a master description of data elements that has long and highly meaningful mnemonic names and a formalized technique for the production of descriptive commentary. The techniques discussed are practical methods that employ the computer during system development in a manner that assists system implementation, provides interim documentation for customer review, and satisfies some of the deliverable documentation requirements.
Video Information Communication and Retrieval/Image Based Information System (VICAR/IBIS)
NASA Technical Reports Server (NTRS)
Wherry, D. B.
1981-01-01
The acquisition, operation, and planning stages of installing a VICAR/IBIS system are described. The system operates in an IBM mainframe environment, and provides image processing of raster data. System support problems with software and documentation are discussed.
Automated payload experiment tool feasibility study
NASA Technical Reports Server (NTRS)
Maddux, Gary A.; Clark, James; Delugach, Harry; Hammons, Charles; Logan, Julie; Provancha, Anna
1991-01-01
To achieve an environment less dependent on the flow of paper, automated techniques of data storage and retrieval must be utilized. The prototype under development seeks to demonstrate the ability of a knowledge-based, hypertext computer system. This prototype is concerned with the logical links between two primary NASA support documents, the Science Requirements Document (SRD) and the Engineering Requirements Document (ERD). Once developed, the final system should have the ability to guide a principal investigator through the documentation process in a more timely and efficient manner, while supplying more accurate information to the NASA payload developer.
Personal Information Management for Nurses Returning to School.
Bowman, Katherine
2015-12-01
Registered nurses with a diploma or an associate's degree are encouraged to return to school to earn a Bachelor of Science in Nursing degree. Until they return to school, many RNs have little need to regularly write, store, and retrieve work-related papers, but they are expected to complete the majority of assignments using a computer when in the student role. Personal information management (PIM) is a system of organizing and managing electronic information that will reduce computer clutter, while enhancing time use, task management, and productivity. This article introduces three PIM strategies for managing school work. Nesting is the creation of a system of folders to form a hierarchy for storing and retrieving electronic documents. Each folder, subfolder, and document must be given a meaningful unique name. Numbering is used to create different versions of the same paper, while preserving the original document. Copyright 2015, SLACK Incorporated.
ERIC Educational Resources Information Center
Zull, Carolyn Gifford, Ed.; And Others
This third volume of the Comparative Systems Laboratory (CSL) Final Technical Report is a collection of relatively independent studies performed on CSL materials. Covered in this document are studies on: (1) properties of files, including a study of the growth rate of a dictionary of index terms as influenced by number of documents in the file and…
Commercial applications for optical data storage
NASA Astrophysics Data System (ADS)
Tas, Jeroen
1991-03-01
Optical data storage has spurred the market for document imaging systems. These systems are increasingly being used to electronically manage the processing, storage and retrieval of documents. Applications range from straightforward archives to sophisticated workflow management systems. The technology is developing rapidly and within a few years optical imaging facilities will be incorporated in most of the office information systems. This paper gives an overview of the status of the market, the applications and the trends of optical imaging systems.
A framework for biomedical figure segmentation towards image-based document retrieval
2013-01-01
The figures included in many of the biomedical publications play an important role in understanding the biological experiments and facts described within. Recent studies have shown that it is possible to integrate the information that is extracted from figures in classical document classification and retrieval tasks in order to improve their accuracy. One important observation about the figures included in biomedical publications is that they are often composed of multiple subfigures or panels, each describing different methodologies or results. The use of these multimodal figures is a common practice in bioscience, as experimental results are graphically validated via multiple methodologies or procedures. Thus, for a better use of multimodal figures in document classification or retrieval tasks, as well as for providing the evidence source for derived assertions, it is important to automatically segment multimodal figures into subfigures and panels. This is a challenging task, however, as different panels can contain similar objects (i.e., barcharts and linecharts) with multiple layouts. Also, certain types of biomedical figures are text-heavy (e.g., DNA sequences and protein sequences images) and they differ from traditional images. As a result, classical image segmentation techniques based on low-level image features, such as edges or color, are not directly applicable to robustly partition multimodal figures into single modal panels. In this paper, we describe a robust solution for automatically identifying and segmenting unimodal panels from a multimodal figure. Our framework starts by robustly harvesting figure-caption pairs from biomedical articles. We base our approach on the observation that the document layout can be used to identify encoded figures and figure boundaries within PDF files. Taking into consideration the document layout allows us to correctly extract figures from the PDF document and associate their corresponding caption. We combine pixel-level representations of the extracted images with information gathered from their corresponding captions to estimate the number of panels in the figure. Thus, our approach simultaneously identifies the number of panels and the layout of figures. In order to evaluate the approach described here, we applied our system on documents containing protein-protein interactions (PPIs) and compared the results against a gold standard that was annotated by biologists. Experimental results showed that our automatic figure segmentation approach surpasses pure caption-based and image-based approaches, achieving a 96.64% accuracy. To allow for efficient retrieval of information, as well as to provide the basis for integration into document classification and retrieval systems among other, we further developed a web-based interface that lets users easily retrieve panels containing the terms specified in the user queries. PMID:24565394
Large Scale Document Inversion using a Multi-threaded Computing System
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2018-01-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2017-06-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
System Description for Tank 241-AZ-101 Waste Retrieval Data Acquisition System
DOE Office of Scientific and Technical Information (OSTI.GOV)
ROMERO, S.G.
2000-02-14
The proposed activity provides the description of the Data Acquisition System for Tank 241-AZ-101. This description is documented in HNF-5572, Tank 241-AZ-101 Waste Retrieval Data Acquisition System (DAS). This activity supports the planned mixer pump tests for Tank 241-AZ-101. Tank 241-AZ-101 has been selected for the first full-scale demonstration of a mixer pump system. The tank currently holds over 960,000 gallons of neutralized current acid waste, including approximately 12.7 inches of settling solids (sludge) at the bottom of the tank. As described in Addendum 4 of the FSAR (LMHC 2000a), two 300 HP mixer pumps with associated measurement and monitoringmore » equipment have been installed in Tank 241-AZ-101. The purpose of the Tank 241-AZ-101 retrieval system Data Acquisition System (DAS) is to provide monitoring and data acquisition of key parameters in order to confirm the effectiveness of the mixer pumps utilized for suspending solids in the tank. The suspension of solids in Tank 241-AZ-101 is necessary for pretreatment of the neutralized current acid waste and eventual disposal as glass via the Hanford Waste Vitrification Plant. HNF-5572 provides a basic description of the Tank 241-AZ-101 retrieval system DAS, including the field instrumentation and application software. The DAS is provided to fulfill requirements for data collection and monitoring. This document is not an operations procedure or is it intended to describe the mixing operation. This USQ screening provides evaluation of HNF-5572 (Revision 1) including the changes as documented on ECN 654001. The changes include (1) add information on historical trending and data backup, (2) modify DAS I/O list in Appendix E to reflect actual conditions in the field, and (3) delete IP address in Appendix F per Lockheed Martin Services, Inc. request.« less
Implementation of the common phrase index method on the phrase query for information retrieval
NASA Astrophysics Data System (ADS)
Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah
2017-08-01
As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.
'SON-GO-KU' : a dream of automated library
NASA Astrophysics Data System (ADS)
Sato, Mamoru; Kishimoto, Juji
In the process of automating libraries, the retrieval of books through the browsing of shelves is being overlooked. The telematic library is a document based DBMS which can deliver the content of books by simulating the browsing process. The retrieval actually simulates the process a person would use in selecting a book in a real library, where a visual presentation using a graphic display is substituted. The characteristics of prototype system "Son-Go-Ku" for such retrieval implemented in 1988 are mentioned.
FUB at TREC 2008 Relevance Feedback Track: Extending Rocchio with Distributional Term Analysis
2008-11-01
starting point is the improved version [ Salton and Buckley 1990] of the original Rocchio’s formula [Rocchio 1971]: newQ = α ⋅ origQ + β R r r∈R ∑ − γR...earlier studies about the low effect of the main relevance feedback parameters on retrieval performance (e.g., Salton and Buckley 1990), while they seem...Relevance feedback in information retrieval. In The SMART retrieval system - experiments in automatic document processing, Salton , G., Ed., Prentice Hall
A knowledgebase system to enhance scientific discovery: Telemakus
Fuller, Sherrilynne S; Revere, Debra; Bugni, Paul F; Martin, George M
2004-01-01
Background With the rapid expansion of scientific research, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. Efforts to improve and speed up scientific discovery are being explored on a number of fronts. However, much of this work is based on traditional search and retrieval approaches and the bibliographic citation presentation format remains unchanged. Methods Case study. Results The Telemakus KnowledgeBase System provides flexible new tools for creating knowledgebases to facilitate retrieval and review of scientific research reports. In formalizing the representation of the research methods and results of scientific reports, Telemakus offers a potential strategy to enhance the scientific discovery process. While other research has demonstrated that aggregating and analyzing research findings across domains augments knowledge discovery, the Telemakus system is unique in combining document surrogates with interactive concept maps of linked relationships across groups of research reports. Conclusion Based on how scientists conduct research and read the literature, the Telemakus KnowledgeBase System brings together three innovations in analyzing, displaying and summarizing research reports across a domain: (1) research report schema, a document surrogate of extracted research methods and findings presented in a consistent and structured schema format which mimics the research process itself and provides a high-level surrogate to facilitate searching and rapid review of retrieved documents; (2) research findings, used to index the documents, allowing searchers to request, for example, research studies which have studied the relationship between neoplasms and vitamin E; and (3) visual exploration interface of linked relationships for interactive querying of research findings across the knowledgebase and graphical displays of what is known as well as, through gaps in the map, what is yet to be tested. The rationale and system architecture are described and plans for the future are discussed. PMID:15507158
Can Visualizing Document Space Improve Users' Information Foraging?
ERIC Educational Resources Information Center
Song, Min
1998-01-01
This study shows how users access relevant information in a visualized document space and determine whether BiblioMapper, a visualization tool, strengthens an information retrieval (IR) system and makes it more usable. BiblioMapper, developed for a CISI collection, was evaluated by accuracy, time, and user satisfaction. Users' navigation…
A Synchronous Search for Documents
An algorithm is described of a synchronous search in a complex system of selective retrieval of documents, with an allowance for exclusion of...stored on a magnetic tape. The number of topics served by the synchronous search goes into thousands; a search within 500-600 topics is performed without additional access to the tape.
Semi-Automated Methods for Refining a Domain-Specific Terminology Base
2011-02-01
only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National
Radiology-led Follow-up System for IVC Filters: Effects on Retrieval Rates and Times
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, L.; Taylor, J.; Munneke, G.
Purpose: Successful IVC filter retrieval rates fall with time. Serious complications have been reported following attempts to remove filters after 3-18 months. Failed retrieval may be associated with adverse clinical sequelae. This study explored whether retrieval rates are improved if interventional radiologists organize patient follow-up, rather than relying on the referring clinicians. Methods: Proactive follow-up of patients who undergo filter placement was implemented in May 2008. At the time of filter placement, a report was issued to the referring consultant notifying them of the advised timeframe for filter retrieval. Clinicians were contacted to arrange retrieval within 30 days. We comparedmore » this with our practice for the preceding year. Results: The numbers of filters inserted during the two time periods was similar, as were the numbers of retrieval attempts and the time scale at which they occurred. The rate of successful retrievals increased but not significantly. The major changes were better documentation of filter types and better clinical follow-up. After the change in practice, only one patient was lost to follow-up compared with six the preceding year. Conclusions: Although there was no significant improvement in retrieval rates, the proactive, radiology-led approach improved follow-up and documentation, ensuring that a clinical decision was made about how long the filter was required and whether retrieval should be attempted and ensuring that patients were not lost to follow-up.« less
ERIC Educational Resources Information Center
Commission des Communautes Europeennes (Luxembourg).
The conference proceedings contained in this document include invited papers, transcripts of discussions following those papers, and the reports of topical committees that met during the three day conference held in Luxembourg, May 1973. The focus of the conference was on the design and use of information retrieval and data base systems in various…
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
Automated documentation generator for advanced protein crystal growth
NASA Technical Reports Server (NTRS)
Maddux, Gary A.; Provancha, Anna; Chattam, David; Ford, Ronald
1993-01-01
The System Management and Production Laboratory at the Research Institute, the University of Alabama in Huntsville (UAH), was tasked by the Microgravity Experiment Projects (MEP) Office of the Payload Projects Office (PPO) at Marshall Space Flight Center (MSFC) to conduct research in the current methods of written documentation control and retrieval. The goals of this research were to determine the logical interrelationships within selected NASA documentation, and to expand on a previously developed prototype system to deliver a distributable, electronic knowledge-based system. This computer application would then be used to provide a paperless interface between the appropriate parties for the required NASA document.
Scaling Up High-Value Retrieval to Medium-Volume Data
NASA Astrophysics Data System (ADS)
Cunningham, Hamish; Hanbury, Allan; Rüger, Stefan
We summarise the scientific work presented at the first Information Retrieval Facility Conference [3] and argue that high-value retrieval with medium-volume data, exemplified by patent search, is a thriving topic in a multidisciplinary area that sits between Information Retrieval, Natural Language Processing and Semantic Web Technologies. We analyse the parameters that condition choices of retrieval technology for different sizes and values of document space, and we present the patent document space and some of its characteristics for retrieval work.
Automated MeSH indexing of the World-Wide Web.
Fowler, J.; Kouramajian, V.; Maram, S.; Devadhar, V.
1995-01-01
To facilitate networked discovery and information retrieval in the biomedical domain, we have designed a system for automatic assignment of Medical Subject Headings to documents retrieved from the World-Wide Web. Our prototype implementations show significant promise. We describe our methods and discuss the further development of a completely automated indexing tool called the "Web-MeSH Medibot." PMID:8563421
ERIC Educational Resources Information Center
Wilder, Dolores J., Comp.; Hines, Rella, Comp.
The Tennessee Research Coordinating Unit (RCU) has implemented a computerized information retrieval system known as "Query," which allows for the retrieval of documents indexed in Research in Education (RIE), Current Index to Journals in Education (CIJE), and Abstracts of Instructional and Research Materials (AIM/ARM). The document…
LDEF: 69 Months in Space. Second Post-Retrieval Symposium, part 2
NASA Technical Reports Server (NTRS)
Levine, Arlene S. (Editor)
1993-01-01
This document is a compilation of papers presented at the Second Long Duration Exposure Facility (LDEF) Post-Retrieval Symposium. The papers represent the data analysis of the 57 experiments flown on the LDEF. The experiments include materials, coatings, thermal systems, power and propulsion, science (cosmic ray, interstellar gas, heavy ions, micrometeoroid, etc.), electronics, optics, and life science.
HARV ANSER Flight Test Data Retrieval and Processing Procedures
NASA Technical Reports Server (NTRS)
Yeager, Jessie C.
1997-01-01
Under the NASA High-Alpha Technology Program the High Alpha Research Vehicle (HARV) was used to conduct flight tests of advanced control effectors, advanced control laws, and high-alpha design guidelines for future super-maneuverable fighters. The High-Alpha Research Vehicle is a pre-production F/A-18 airplane modified with a multi-axis thrust-vectoring system for augmented pitch and yaw control power and Actuated Nose Strakes for Enhanced Rolling (ANSER) to augment body-axis yaw control power. Flight testing at the Dryden Flight Research Center (DFRC) began in July 1995 and continued until May 1996. Flight data will be utilized to evaluate control law performance and aircraft dynamics, determine aircraft control and stability derivatives using parameter identification techniques, and validate design guidelines. To accomplish these purposes, essential flight data parameters were retrieved from the DFRC data system and stored on the Dynamics and Control Branch (DCB) computer complex at Langley. This report describes the multi-step task used to retrieve and process this data and documents the results of these tasks. Documentation includes software listings, flight information, maneuver information, time intervals for which data were retrieved, lists of data parameters and definitions, and example data plots.
Repository of not readily available documents for project W-320
DOE Office of Scientific and Technical Information (OSTI.GOV)
Conner, J.C.
1997-04-18
The purpose of this document is to provide a readily available source of the technical reports needed for the development of the safety documentation provided for the waste retrieval sluicing system (WRSS), designed to remove the radioactive and chemical sludge from tank 241-C-106, and transport that material to double-shell tank 241-AY-102 via a new, temporary, shielded, encased transfer line.
Relevance similarity: an alternative means to monitor information retrieval systems
Dong, Peng; Loh, Marie; Mondry, Adrian
2005-01-01
Background Relevance assessment is a major problem in the evaluation of information retrieval systems. The work presented here introduces a new parameter, "Relevance Similarity", for the measurement of the variation of relevance assessment. In a situation where individual assessment can be compared with a gold standard, this parameter is used to study the effect of such variation on the performance of a medical information retrieval system. In such a setting, Relevance Similarity is the ratio of assessors who rank a given document same as the gold standard over the total number of assessors in the group. Methods The study was carried out on a collection of Critically Appraised Topics (CATs). Twelve volunteers were divided into two groups of people according to their domain knowledge. They assessed the relevance of retrieved topics obtained by querying a meta-search engine with ten keywords related to medical science. Their assessments were compared to the gold standard assessment, and Relevance Similarities were calculated as the ratio of positive concordance with the gold standard for each topic. Results The similarity comparison among groups showed that a higher degree of agreements exists among evaluators with more subject knowledge. The performance of the retrieval system was not significantly different as a result of the variations in relevance assessment in this particular query set. Conclusion In assessment situations where evaluators can be compared to a gold standard, Relevance Similarity provides an alternative evaluation technique to the commonly used kappa scores, which may give paradoxically low scores in highly biased situations such as document repositories containing large quantities of relevant data. PMID:16029513
Document creation, linking, and maintenance system
Claghorn, Ronald [Pasco, WA
2011-02-15
A document creation and citation system designed to maintain a database of reference documents. The content of a selected document may be automatically scanned and indexed by the system. The selected documents may also be manually indexed by a user prior to the upload. The indexed documents may be uploaded and stored within a database for later use. The system allows a user to generate new documents by selecting content within the reference documents stored within the database and inserting the selected content into a new document. The system allows the user to customize and augment the content of the new document. The system also generates citations to the selected content retrieved from the reference documents. The citations may be inserted into the new document in the appropriate location and format, as directed by the user. The new document may be uploaded into the database and included with the other reference documents. The system also maintains the database of reference documents so that when changes are made to a reference document, the author of a document referencing the changed document will be alerted to make appropriate changes to his document. The system also allows visual comparison of documents so that the user may see differences in the text of the documents.
Geiger, Linda H.
1983-01-01
The report is an update of U.S. Geological Survey Open-File Report 77-703, which described a retrieval program for administrative index of active data-collection sites in Florida. Extensive changes to the Findex system have been made since 1977 , making the previous report obsolete. A description of the data base and computer programs that are available in the Findex system are documented in this report. This system serves a vital need in the administration of the many and diverse water-data collection activities. District offices with extensive data-collection activities will benefit from the documentation of the system. Largely descriptive, the report tells how a file of computer card images has been established which contains entries for all sites in Florida at which there is currently a water-data collection activity. Entries include information such as identification number, station name, location, type of site, county, frequency of data collection, funding, and other pertinent details. The computer program FINDEX selectively retrieves entries and lists them in a format suitable for publication. The index is updated routinely. (USGS)
ERIC Educational Resources Information Center
Borko, Harold
1985-01-01
Defines artificial intelligence (AI) and expert systems; describes library applications utilizing AI to automate creation of document representations, request formulations, and design and modify search strategies for information retrieval systems; discusses expert system development for information services; and reviews impact of these…
The Bartlesville System; TGISS Software Documentation.
ERIC Educational Resources Information Center
Roberts, Tommy L.; And Others
TGISS (Total Guidance Information Support System) is an information storage and retrieval system specifically designed to meet the needs and requirements of a counselor in the Bartlesville Public School environment. The system, which is a combination of man/machine capabilities, includes the hardware and software necessary to extend the…
Mining for Evidence in Enterprise Corpora
ERIC Educational Resources Information Center
Almquist, Brian Alan
2011-01-01
The primary research aim of this dissertation is to identify the strategies that best meet the information retrieval needs as expressed in the "e-discovery" scenario. This task calls for a high-recall system that, in response to a request for all available relevant documents to a legal complaint, effectively prioritizes documents from an…
World-Wide Web: The Information Universe.
ERIC Educational Resources Information Center
Berners-Lee, Tim; And Others
1992-01-01
Describes the World-Wide Web (W3) project, which is designed to create a global information universe using techniques of hypertext, information retrieval, and wide area networking. Discussion covers the W3 data model, W3 architecture, the document naming scheme, protocols, document formats, comparison with other systems, experience with the W3…
New public dataset for spotting patterns in medieval document images
NASA Astrophysics Data System (ADS)
En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent
2017-01-01
With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.
ERIC Educational Resources Information Center
Losada, David E.; Barreiro, Alvaro
2003-01-01
Proposes an approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. Highlights include document representation and matching; incorporating term similarity into the measure of distance; new algorithms for implementation; inverse document frequency; and logical versus classical models of…
Techniques of Document Management: A Review of Text Retrieval and Related Technologies.
ERIC Educational Resources Information Center
Veal, D. C.
2001-01-01
Reviews present and possible future developments in the techniques of electronic document management, the major ones being text retrieval and scanning and OCR (optical character recognition). Also addresses document acquisition, indexing and thesauri, publishing and dissemination standards, impact of the Internet, and the document management…
LDEF grappled by remote manipulator system (RMS) during STS-32 retrieval
1990-01-20
This view taken through overhead window W7 on Columbia's, Orbiter Vehicle (OV) 102's, aft flight deck shows the Long Duration Exposure Facility (LDEF) in the grasp of the remote manipulator system (RMS) during STS-32 retrieval activities. Other cameras at eye level were documenting the bus-sized spacecraft at various angles as the RMS manipulated LDEF for a lengthy photo survey. The glaring celestial body in the upper left is the sun with the Earth's surface visible below.
CAESAR : an expert system for evaluation of scour and stream stability
DOT National Transportation Integrated Search
1999-01-01
This report documents the development and testing of a field-deployable, knowledge-based decision support system that assists bridge inspectors by acquiring, cataloging, storing, and retrieving information necessary for the evaluation of a bridge for...
A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)
2008-11-01
retrieve relevant documents. For the Opinion Retrieval subtask, we propose a hybrid model of lexicon-based approach and machine learning approach for...estimating and ranking the opinionated documents. For the Polarized Opinion Retrieval subtask, we employ machine learning for predicting the polarity...and linear combination technique for ranking polar documents. The hybrid model which utilize both lexicon-based approach and machine learning approach
Organ donation in the ICU: A document analysis of institutional policies, protocols, and order sets.
Oczkowski, Simon J W; Centofanti, John E; Durepos, Pamela; Arseneau, Erika; Kelecevic, Julija; Cook, Deborah J; Meade, Maureen O
2018-04-01
To better understand how local policies influence organ donation rates. We conducted a document analysis of our ICU organ donation policies, protocols and order sets. We used a systematic search of our institution's policy library to identify documents related to organ donation. We used Mindnode software to create a publication timeline, basic statistics to describe document characteristics, and qualitative content analysis to extract document themes. Documents were retrieved from Hamilton Health Sciences, an academic hospital system with a high volume of organ donation, from database inception to October 2015. We retrieved 12 active organ donation documents, including six protocols, two policies, two order sets, and two unclassified documents, a majority (75%) after the introduction of donation after circulatory death in 2006. Four major themes emerged: organ donation process, quality of care, patient and family-centred care, and the role of the institution. These themes indicate areas where documented institutional standards may be beneficial. Further research is necessary to determine the relationship of local policies, protocols, and order sets to actual organ donation practices, and to identify barriers and facilitators to improving donation rates. Copyright © 2017 Elsevier Ltd. All rights reserved.
Exploiting salient semantic analysis for information retrieval
NASA Astrophysics Data System (ADS)
Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui
2016-11-01
Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.
Automatic generation of stop word lists for information retrieval and analysis
Rose, Stuart J
2013-01-08
Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
User Manual for the AZ-101 Data Acquisition System (AS-101 DAS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
BRAYTON, D.D.
2000-02-17
User manual for the TK AZ-101 Waste Retrieval System Data Acquisition System. The purpose of this document is to describe use of the AZ-101 Data Acquisition System (AZ-101 DAS). The AZ-101 DAS is provided to fulfill the requirements for data collection and monitoring as defined in Letters of Instruction (LOI) from Numatec Hanford Corporation (NHC) to Fluor Federal Services (FFS). For a complete description of the system, including design, please refer to the AZ-101 DAS System Description document, RPP-5572.
Dynamic "inline" images: context-sensitive retrieval and integration of images into Web documents.
Kahn, Charles E
2008-09-01
Integrating relevant images into web-based information resources adds value for research and education. This work sought to evaluate the feasibility of using "Web 2.0" technologies to dynamically retrieve and integrate pertinent images into a radiology web site. An online radiology reference of 1,178 textual web documents was selected as the set of target documents. The ARRS GoldMiner image search engine, which incorporated 176,386 images from 228 peer-reviewed journals, retrieved images on demand and integrated them into the documents. At least one image was retrieved in real-time for display as an "inline" image gallery for 87% of the web documents. Each thumbnail image was linked to the full-size image at its original web site. Review of 20 randomly selected Collaborative Hypertext of Radiology documents found that 69 of 72 displayed images (96%) were relevant to the target document. Users could click on the "More" link to search the image collection more comprehensively and, from there, link to the full text of the article. A gallery of relevant radiology images can be inserted easily into web pages on any web server. Indexing by concepts and keywords allows context-aware image retrieval, and searching by document title and subject metadata yields excellent results. These techniques allow web developers to incorporate easily a context-sensitive image gallery into their documents.
TES: A Text Extraction System.
ERIC Educational Resources Information Center
Goh, A.; Hui, S. C.
1996-01-01
Describes how TES, a text extraction system, is able to electronically retrieve a set of sentences from a document to form an indicative abstract. Discusses various text abstraction techniques and related work in the area, provides an overview of the TES system, and compares system results against manually produced abstracts. (LAM)
Jahn, Michelle A; Porter, Brian W; Patel, Himalaya; Zillich, Alan J; Simon, Steven R; Russ, Alissa L
2018-04-01
Web-based patient portals feature secure messaging systems that enable health care providers and patients to communicate information. However, little is known about the usability of these systems for clinical document sharing. This article evaluates the usability of a secure messaging system for providers and patients in terms of its ability to support sharing of electronic clinical documents. We conducted usability testing with providers and patients in a human-computer interaction laboratory at a Midwestern U.S. hospital. Providers sent a medication list document to a fictitious patient via secure messaging. Separately, patients retrieved the clinical document from a secure message and returned it to a fictitious provider. We collected use errors, task completion, task time, and satisfaction. Twenty-nine individuals participated: 19 providers (6 physicians, 6 registered nurses, and 7 pharmacists) and 10 patients. Among providers, 11 (58%) attached and sent the clinical document via secure messaging without requiring assistance, in a median (range) of 4.5 (1.8-12.7) minutes. No patients completed tasks without moderator assistance. Patients accessed the secure messaging system within 3.6 (1.2-15.0) minutes; retrieved the clinical document within 0.8 (0.5-5.7) minutes; and sent the attached clinical document in 6.3 (1.5-18.1) minutes. Although median satisfaction ratings were high, with 5.8 for providers and 6.0 for patients (scale, 0-7), we identified 36 different use errors. Physicians and pharmacists requested additional features to support care coordination via health information technology, while nurses requested features to support efficiency for their tasks. This study examined the usability of clinical document sharing, a key feature of many secure messaging systems. Our results highlight similarities and differences between provider and patient end-user groups, which can inform secure messaging design to improve learnability and efficiency. The observations suggest recommendations for improving the technical aspects of secure messaging for clinical document sharing. Schattauer GmbH Stuttgart.
Concept-Based Retrieval from Critical Incident Reports.
Denecke, Kerstin
2017-01-01
Critical incident reporting systems (CIRS) are used as a means to collect anonymously entered information of incidents that occurred for example in a hospital. Analyzing this information helps to identify among others problems in the workflow, in the infrastructure or in processes. The entire potential of these sources of experiential knowledge remains often unconsidered since retrieval of relevant reports and their analysis is difficult and time-consuming, and the reporting systems often do not provide support for these tasks. The objective of this work is to develop a method for retrieving reports from the CIRS related to a specific user query. atural language processing (NLP) and information retrieval (IR) methods are exploited for realizing the retrieval. We compare standard retrieval methods that rely upon frequency of words with an approach that includes a semantic mapping of natural language to concepts of a medical ontology. By an evaluation, we demonstrate the feasibility of semantic document enrichment to improve recall in incident reporting retrieval. It is shown that a combination of standard keyword-based retrieval with semantic search results in highly satisfactory recall values. In future work, the evaluation should be repeated on a larger data set and real-time user evaluation need to be performed to assess user satisfactory with the system and results.
Beyond Information Retrieval—Medical Question Answering
Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong
2006-01-01
Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385
Agent-based method for distributed clustering of textual information
Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN
2010-09-28
A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.
Evaluation of an Automated Keywording System.
ERIC Educational Resources Information Center
Malone, Linda C.; And Others
1990-01-01
Discussion of automated indexing techniques focuses on ways to statistically document improvements in the development of an automated keywording system over time. The system developed by the Joint Chiefs of Staff to automate the storage, categorization, and retrieval of information from military exercises is explained, and performance measures are…
Repetition and Diversification in Multi-Session Task Oriented Search
ERIC Educational Resources Information Center
Tyler, Sarah K.
2013-01-01
As the number of documents and the availability of information online grows, so to can the difficulty in sifting through documents to find what we're searching for. Traditional Information Retrieval (IR) systems consider the query as the representation of the user's needs, and as such are limited to the user's ability to describe the information…
Web document ranking via active learning and kernel principal component analysis
NASA Astrophysics Data System (ADS)
Cai, Fei; Chen, Honghui; Shu, Zhen
2015-09-01
Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.
Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.
Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin
2017-05-18
Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.
NASA Technical Reports Server (NTRS)
Ambur, Manjula Y.; Adams, David L.; Trinidad, P. Paul
1997-01-01
NASA Langley Technical Library has been involved in developing systems for full-text information delivery of NACA/NASA technical reports since 1991. This paper will describe the two prototypes it has developed and the present production system configuration. The prototype systems are a NACA CD-ROM of thirty-three classic paper NACA reports and a network-based Full-text Electronic Reports Documents System (FEDS) constructed from both paper and electronic formats of NACA and NASA reports. The production system is the DigiDoc System (DIGItal Documents) presently being developed based on the experiences gained from the two prototypes. DigiDoc configuration integrates the on-line catalog database World Wide Web interface and PDF technology to provide a powerful and flexible search and retrieval system. It describes in detail significant achievements and lessons learned in terms of data conversion, storage technologies, full-text searching and retrieval, and image databases. The conclusions from the experiences of digitization and full- text access and future plans for DigiDoc system implementation are discussed.
Selected Mechanized Scientific and Technical Information Systems.
ERIC Educational Resources Information Center
Ackerman, Lynn, Ed.; And Others
The publication describes the following thirteen computer-based, operational systems designed primarily for the announcement, storage, retrieval and secondary distribution of scientific and technical reports: Defense Documentation Center; Highway Research Board; National Aeronautics and Space Administration; National Library of Medicine; U.S.…
ERIC Educational Resources Information Center
Illinois Univ., Urbana. Coordinated Science Lab.
In contrast to conventional information storage and retrieval systems in which a body of knowledge is thought of as an indexed codex of documents to which access is obtained by an appropriately indexed query, this interdisciplinary study aims at an understanding of what is "knowledge" as distinct from a "data file," how this knowledge is acquired,…
A LDA-based approach to promoting ranking diversity for genomics information retrieval.
Chen, Yan; Yin, Xiaoshi; Li, Zhoujun; Hu, Xiaohua; Huang, Jimmy Xiangji
2012-06-11
In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search. Moreover, we proposes a distance measure to quantify how much a passage can increase topical diversity by considering both topical importance and topical coefficient by LDA, and the distance measure is a modified Euclidean distance.
COSMOS (County of San Mateo Online System). A Searcher's Manual.
ERIC Educational Resources Information Center
San Mateo County Superintendent of Schools, Redwood City, CA. Educational Resources Center.
Operating procedures are explained for COSMOS (County of San Mateo Online System), a computerized information retrieval system designed for the San Mateo Educational Resources Center (SMERC), which provides interactive access to both ERIC and a local file of fugitive documents. COSMOS hardware and modem compatibility requirements are reviewed,…
Bibliometric analysis of global migration health research in peer-reviewed literature (2000-2016).
Sweileh, Waleed M; Wickramage, Kolitha; Pottie, Kevin; Hui, Charles; Roberts, Bayard; Sawalha, Ansam F; Zyoud, Saed H
2018-06-20
The health of migrants has become an important issue in global health and foreign policy. Assessing the current status of research activity and identifying gaps in global migration health (GMH) is an important step in mapping the evidence-base and on advocating health needs of migrants and mobile populations. The aim of this study was to analyze globally published peer-reviewed literature in GMH. A bibliometric analysis methodology was used. The Scopus database was used to retrieve documents in peer-reviewed journals in GMH for the study period from 2000 to 2016. A group of experts in GMH developed the needed keywords and validated the final search strategy. The number of retrieved documents was 21,457. Approximately one third (6878; 32.1%) of the retrieved documents were published in the last three years of the study period. In total, 5451 (25.4%) documents were about refugees and asylum seekers, while 1328 (6.2%) were about migrant workers, 440 (2.1%) were about international students, 679 (3.2%) were about victims of human trafficking/smuggling, 26 (0.1%) were about patients' mobility across international borders, and the remaining documents were about unspecified categories of migrants. The majority of the retrieved documents (10,086; 47.0%) were in psychosocial and mental health domain, while 2945 (13.7%) documents were in infectious diseases, 6819 (31.8%) documents were in health policy and systems, 2759 (12.8%) documents were in maternal and reproductive health, and 1918 (8.9%) were in non-communicable diseases. The contribution of authors and institutions in Asian countries, Latin America, Africa, Middle East, and Eastern European countries was low. Literature in GMH represents the perspectives of high-income migrant destination countries. Our heat map of research output shows that despite the ever-growing prominence of human mobility across the globe, and Sustainable Development Goals of leaving no one behind, research output on migrants' health is not consistent with the global migration pattern. A stronger evidence base is needed to enable authorities to make evidence-informed decisions on migration health policy and practice. Research collaboration and networks should be encouraged to prioritize research in GMH.
Information Retrieval and Graph Analysis Approaches for Book Recommendation.
Benkoussas, Chahinez; Bellot, Patrice
2015-01-01
A combination of multiple information retrieval approaches is proposed for the purpose of book recommendation. In this paper, book recommendation is based on complex user's query. We used different theoretical retrieval models: probabilistic as InL2 (Divergence from Randomness model) and language model and tested their interpolated combination. Graph analysis algorithms such as PageRank have been successful in Web environments. We consider the application of this algorithm in a new retrieval approach to related document network comprised of social links. We called Directed Graph of Documents (DGD) a network constructed with documents and social information provided from each one of them. Specifically, this work tackles the problem of book recommendation in the context of INEX (Initiative for the Evaluation of XML retrieval) Social Book Search track. A series of reranking experiments demonstrate that combining retrieval models yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments.
Information Retrieval and Graph Analysis Approaches for Book Recommendation
Benkoussas, Chahinez; Bellot, Patrice
2015-01-01
A combination of multiple information retrieval approaches is proposed for the purpose of book recommendation. In this paper, book recommendation is based on complex user's query. We used different theoretical retrieval models: probabilistic as InL2 (Divergence from Randomness model) and language model and tested their interpolated combination. Graph analysis algorithms such as PageRank have been successful in Web environments. We consider the application of this algorithm in a new retrieval approach to related document network comprised of social links. We called Directed Graph of Documents (DGD) a network constructed with documents and social information provided from each one of them. Specifically, this work tackles the problem of book recommendation in the context of INEX (Initiative for the Evaluation of XML retrieval) Social Book Search track. A series of reranking experiments demonstrate that combining retrieval models yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments. PMID:26504899
A novel architecture for information retrieval system based on semantic web
NASA Astrophysics Data System (ADS)
Zhang, Hui
2011-12-01
Nowadays, the web has enabled an explosive growth of information sharing (there are currently over 4 billion pages covering most areas of human endeavor) so that the web has faced a new challenge of information overhead. The challenge that is now before us is not only to help people locating relevant information precisely but also to access and aggregate a variety of information from different resources automatically. Current web document are in human-oriented formats and they are suitable for the presentation, but machines cannot understand the meaning of document. To address this issue, Berners-Lee proposed a concept of semantic web. With semantic web technology, web information can be understood and processed by machine. It provides new possibilities for automatic web information processing. A main problem of semantic web information retrieval is that when these is not enough knowledge to such information retrieval system, the system will return to a large of no sense result to uses due to a huge amount of information results. In this paper, we present the architecture of information based on semantic web. In addiction, our systems employ the inference Engine to check whether the query should pose to Keyword-based Search Engine or should pose to the Semantic Search Engine.
Translation lexicon acquisition from bilingual dictionaries
NASA Astrophysics Data System (ADS)
Doermann, David S.; Ma, Huanfeng; Karagol-Ayan, Burcu; Oard, Douglas W.
2001-12-01
Bilingual dictionaries hold great potential as a source of lexical resources for training automated systems for optical character recognition, machine translation and cross-language information retrieval. In this work we describe a system for extracting term lexicons from printed copies of bilingual dictionaries. We describe our approach to page and definition segmentation and entry parsing. We have used the approach to parse a number of dictionaries and demonstrate the results for retrieval using a French-English Dictionary to generate a translation lexicon and a corpus of English queries applied to French documents to evaluation cross-language IR.
Schurr, K.M.; Cox, S.E.
1994-01-01
The Pesticide-Application Data-Base Management System was created as a demonstration project and was tested with data submitted to the Washington State Department of Agriculture by pesticide applicators from a small geographic area. These data were entered into the Department's relational data-base system and uploaded into the system's ARC/INFO files. Locations for pesticide applica- tions are assigned within the Public Land Survey System grids, and ARC/INFO programs in the Pesticide-Application Data-Base Management System can subdivide each survey section into sixteen idealized quarter-quarter sections for display map grids. The system provides data retrieval and geographic information system plotting capabilities from a menu of seven basic retrieval options. Additionally, ARC/INFO coverages can be created from the retrieved data when required for particular applications. The Pesticide-Application Data-Base Management System, or the general principles used in the system, could be adapted to other applica- tions or to other states.
Névéol, Aurélie; Pereira, Suzanne; Kerdelhué, Gaetan; Dahamna, Badisse; Joubert, Michel; Darmoni, Stéfan J
2007-01-01
The growing number of resources to be indexed in the catalogue of online health resources in French (CISMeF) calls for curating strategies involving automatic indexing tools while maintaining the catalogue's high indexing quality standards. To develop a simple automatic tool that retrieves MeSH descriptors from documents titles. In parallel to research on advanced indexing methods, a bag-of-words tool was developed for timely inclusion in CISMeF's maintenance system. An evaluation was carried out on a corpus of 99 documents. The indexing sets retrieved by the automatic tool were compared to manual indexing based on the title and on the full text of resources. 58% of the major main headings were retrieved by the bag-of-words algorithm and the precision on main heading retrieval was 69%. Bag-of-words indexing has effectively been used on selected resources to be included in CISMeF since August 2006. Meanwhile, on going work aims at improving the current version of the tool.
The Cybernetics of Bibliographic Control: Toward a Theory of Document Retrieval Systems.
ERIC Educational Resources Information Center
Wellisch, Hans H.
1980-01-01
Explores the concept of cataloging, analyzes its functions and operations, and holds that as a control system bibliographic organization is subject to the laws of cybernetics. The role of relevance and the limitations of some regulatory devices are examined. (FM)
Nurses using futuristic technology in today's healthcare setting.
Wolf, Debra M; Kapadia, Amar; Kintzel, Jessie; Anton, Bonnie B
2009-01-01
Human computer interaction (HCI) equates nurses using voice assisted technology within a clinical setting to document patient care real time, retrieve patient information from care plans, and complete routine tasks. This is a reality currently utilized by clinicians today in acute and long term care settings. Voice assisted documentation provides hands & eyes free accurate documentation while enabling effective communication and task management. The speech technology increases the accuracy of documentation, while interfacing directly into the electronic health record (EHR). Using technology consisting of a light weight headset and small fist size wireless computer, verbal responses to easy to follow cues are converted into a database systems allowing staff to obtain individualized care status reports on demand. To further assist staff in their daily process, this innovative technology allows staff to send and receive pages as needed. This paper will discuss how leading edge and award winning technology is being integrated within the United States. Collaborative efforts between clinicians and analyst will be discussed reflecting the interactive design and build functionality. Features such as the system's voice responses and directed cues will be shared and how easily data can be documented, viewed and retrieved. Outcome data will be presented on how the technology impacted organization's quality outcomes, financial reimbursement, and employee's level of satisfaction.
Mission analysis for cross-site transfer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riesenweber, S.D.; Fritz, R.L.; Shipley, L.E.
1995-11-01
The Mission Analysis Report describes the requirements and constraints associated with the Transfer Waste Function as necessary to support the Manage Tank Waste, Retrieve Waste, and Process Tank Waste Functions described in WHC-SD-WM-FRD-020, Tank Waste Remediation System (TWRS) Functions and Requirements Document and DOE/RL-92-60, Revision 1, TWRS Functions and Requirements Document, March 1994. It further assesses the ability of the ``initial state`` (or current cross-site transfer system) to meet the requirements and constraints.
2009-09-21
specified by contract no. W7714-040875/001/SV. This document contains the design of the JNDMS software to the system architecture level. Other...alternative for the presentation functions. ASP, Java, ActiveX , DLL, HTML, DHTML, SOAP, .NET HTML, DHTML, XML, Jscript, VBScript, SOAP, .NET...retrieved through the network, typically by a network management console. Information is contained in a Management Information Base (MIB), which is a data
Web information retrieval for health professionals.
Ting, S L; See-To, Eric W K; Tse, Y K
2013-06-01
This paper presents a Web Information Retrieval System (WebIRS), which is designed to assist the healthcare professionals to obtain up-to-date medical knowledge and information via the World Wide Web (WWW). The system leverages the document classification and text summarization techniques to deliver the highly correlated medical information to the physicians. The system architecture of the proposed WebIRS is first discussed, and then a case study on an application of the proposed system in a Hong Kong medical organization is presented to illustrate the adoption process and a questionnaire is administrated to collect feedback on the operation and performance of WebIRS in comparison with conventional information retrieval in the WWW. A prototype system has been constructed and implemented on a trial basis in a medical organization. It has proven to be of benefit to healthcare professionals through its automatic functions in classification and summarizing the medical information that the physicians needed and interested. The results of the case study show that with the use of the proposed WebIRS, significant reduction of searching time and effort, with retrieval of highly relevant materials can be attained.
Unified modeling language and design of a case-based retrieval system in medical imaging.
LeBozec, C.; Jaulent, M. C.; Zapletal, E.; Degoulet, P.
1998-01-01
One goal of artificial intelligence research into case-based reasoning (CBR) systems is to develop approaches for designing useful and practical interactive case-based environments. Explaining each step of the design of the case-base and of the retrieval process is critical for the application of case-based systems to the real world. We describe herein our approach to the design of IDEM--Images and Diagnosis from Examples in Medicine--a medical image case-based retrieval system for pathologists. Our approach is based on the expressiveness of an object-oriented modeling language standard: the Unified Modeling Language (UML). We created a set of diagrams in UML notation illustrating the steps of the CBR methodology we used. The key aspect of this approach was selecting the relevant objects of the system according to user requirements and making visualization of cases and of the components of the case retrieval process. Further evaluation of the expressiveness of the design document is required but UML seems to be a promising formalism, improving the communication between the developers and users. Images Figure 6 Figure 7 PMID:9929346
Unified modeling language and design of a case-based retrieval system in medical imaging.
LeBozec, C; Jaulent, M C; Zapletal, E; Degoulet, P
1998-01-01
One goal of artificial intelligence research into case-based reasoning (CBR) systems is to develop approaches for designing useful and practical interactive case-based environments. Explaining each step of the design of the case-base and of the retrieval process is critical for the application of case-based systems to the real world. We describe herein our approach to the design of IDEM--Images and Diagnosis from Examples in Medicine--a medical image case-based retrieval system for pathologists. Our approach is based on the expressiveness of an object-oriented modeling language standard: the Unified Modeling Language (UML). We created a set of diagrams in UML notation illustrating the steps of the CBR methodology we used. The key aspect of this approach was selecting the relevant objects of the system according to user requirements and making visualization of cases and of the components of the case retrieval process. Further evaluation of the expressiveness of the design document is required but UML seems to be a promising formalism, improving the communication between the developers and users.
Document retrieval on repetitive string collections.
Gagie, Travis; Hartikainen, Aleksi; Karhu, Kalle; Kärkkäinen, Juha; Navarro, Gonzalo; Puglisi, Simon J; Sirén, Jouni
2017-01-01
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists , that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top- k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.
PubMed Interact: an Interactive Search Application for MEDLINE/PubMed
Muin, Michael; Fontelo, Paul; Ackerman, Michael
2006-01-01
Online search and retrieval systems are important resources for medical literature research. Progressive Web 2.0 technologies provide opportunities to improve search strategies and user experience. Using PHP, Document Object Model (DOM) manipulation and Asynchronous JavaScript and XML (Ajax), PubMed Interact allows greater functionality so users can refine search parameters with ease and interact with the search results to retrieve and display relevant information and related articles. PMID:17238658
Signature detection and matching for document image retrieval.
Zhu, Guangyu; Zheng, Yefeng; Doermann, David; Jaeger, Stefan
2009-11-01
As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.
DOT National Transportation Integrated Search
2011-08-01
GeoGIS is a web-based geotechnical database management system that is being developed for the Alabama : Department of Transportation (ALDOT). The purpose of GeoGIS is to facilitate the efficient storage and retrieval of : geotechnical documents for A...
Accident Analyses in Support of the Sludge Water System Safety Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
FINFROCK, S.H.
This document quantifies the potential health effects of the unmitigated hazards identified Hey (2002) for retrieval of sludge from the KE basin. It also identifies potential controls and any supporting mitigative analyses.
CDAPubMed: a browser extension to retrieve EHR-based biomedical literature.
Perez-Rey, David; Jimenez-Castellanos, Ana; Garcia-Remesal, Miguel; Crespo, Jose; Maojo, Victor
2012-04-05
Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
CDAPubMed: a browser extension to retrieve EHR-based biomedical literature
2012-01-01
Background Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems. PMID:22480327
Mining knowledge from corpora: an application to retrieval and indexing.
Soualmia, Lina F; Dahamna, Badisse; Darmoni, Stéfan
2008-01-01
The present work aims at discovering new associations between medical concepts to be exploited as input in retrieval and indexing. Association rules method is applied to documents. The process is carried out on three major document categories referring to e-health information consumers: health professionals, students and lay people. Association rules evaluation is founded on statistical measures combined with domain knowledge. Association rules represent existing relations between medical concepts (60.62%) and new knowledge (54.21%). Based on observations, 463 expert rules are defined by medical librarians for retrieval and indexing. Association rules bear out existing relations, produce new knowledge and support users and indexers in document retrieval and indexing.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-01-13
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention Study Team for the Los Alamos Historical Document Retrieval and Assessment (LAHDRA) Project The Centers for Disease... the following meeting. Name: Public Meeting of the Study Team for the Los Alamos Historical Document...
ERIC Educational Resources Information Center
Roberts, Tommy L.; And Others
The Total Guidance Information Support System (TGISS), is an information storage and retrieval system for counselors. The total TGISS, including hardware and software, extends the counselor's capabilities by providing ready access to student information under secure conditions. The hardware required includes: (1) IBM 360/50 central processing…
ERIC Educational Resources Information Center
Dia, Ahmed
The guide to the computer management system for individualized instructional strategy associated with the clinical teacher curriculum at Florida State University is presented. The system is described in terms of 27 Cobol programs and the Multiple Access and Retrieval System (MARS VI), which were adapted to requirements of the clinical teacher…
Instance-Based Question Answering
2006-12-01
answer clustering, composition, and scoring. Moreover, with the effort dedicated to improving monolingual system performance, system parameters are...text collections: document type, manual or automatic annotations (if any), and stylistic and notational differences in technical terms. Monolingual ...forum in which cross language retrieval systems and question answering systems are tested for various Eu- ropean languages. The CLEF QA monolingual task
A semantic medical multimedia retrieval approach using ontology information hiding.
Guo, Kehua; Zhang, Shigeng
2013-01-01
Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches.
Essie: A Concept-based Search Engine for Structured Biomedical Text
Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina
2007-01-01
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729
Unified System Of Data On Materials And Processes
NASA Technical Reports Server (NTRS)
Key, Carlo F.
1989-01-01
Wide-ranging sets of data for aerospace industry described. Document describes Materials and Processes Technical Information System (MAPTIS), computerized set of integrated data bases for use by NASA and aerospace industry. Stores information in standard format for fast retrieval in searches and surveys of data. Helps engineers select materials and verify their properties. Promotes standardized nomenclature as well as standarized tests and presentation of data. Format of document of photographic projection slides used in lectures. Presents examples of reports from various data bases.
Takeda, Toshihiro; Ueda, Kanayo; Manabe, Shiro; Teramoto, Kei; Mihara, Naoki; Matsumura, Yasushi
2013-01-01
Standard Japanese electronic medical record (EMR) systems are associated with major shortcomings. For example, they do not assure lifelong readability of records because each document requires its own viewing software program, a system that is difficult to maintain over long periods of time. It can also be difficult for users to comprehend a patient's clinical history because different classes of documents can only be accessed from their own window. To address these problems, we developed a document-based electronic medical record that aggregates all documents for a patient in a PDF or DocuWorks format. We call this system the Document Archiving and Communication System (DACS). There are two types of viewers in the DACS: the Matrix View, which provides a time line of a patient's history, and the Tree View, which stores the documents in hierarchical document classes. We placed 2,734 document classes into 11 categories. A total of 22,3972 documents were entered per month. The frequency of use of the DACS viewer was 268,644 instances per month. The DACS viewer was used to assess a patient's clinical history.
Information Retrieval and Text Mining Technologies for Chemistry.
Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso
2017-06-28
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Crawler Acquisition and Testing Demonstration Project Management Plan
DOE Office of Scientific and Technical Information (OSTI.GOV)
DEFIGH-PRICE, C.
2000-10-23
If the crawler based retrieval system is selected, this project management plan identifies the path forward for acquiring a crawler/track pump waste retrieval system, and completing sufficient testing to support deploying the crawler for as part of a retrieval technology demonstration for Tank 241-C-104. In the balance of the document, these activities will be referred to as the Crawler Acquisition and Testing Demonstration. During recent Tri-Party Agreement negotiations, TPA milestones were proposed for a sludge/hard heel waste retrieval demonstration in tank C-104. Specifically one of the proposed milestones requires completion of a cold demonstration of sufficient scale to support finalmore » design and testing of the equipment (M-45-03G) by 6/30/2004. A crawler-based retrieval system was one of the two options evaluated during the pre-conceptual engineering for C-104 retrieval (RPP-6843 Rev. 0). The alternative technology procurement initiated by the Hanford Tanks Initiative (HTI) project, combined with the pre-conceptual engineering for C-104 retrieval provide an opportunity to achieve compliance with the proposed TPA milestone M-45-03H. This Crawler Acquisition and Testing Demonstration project management plan identifies the plans, organizational interfaces and responsibilities, management control systems, reporting systems, timeline and requirements for the acquisition and testing of the crawler based retrieval system. This project management plan is complimentary to and supportive of the Project Management Plan for Retrieval of C-104 (RPP-6557). This project management plan focuses on utilizing and completing the efforts initiated under the Hanford Tanks Initiative (HTI) to acquire and cold test a commercial crawler based retrieval system. The crawler-based retrieval system will be purchased on a schedule to support design of the waste retrieval from tank C-104 (project W-523) and to meet the requirement of proposed TPA milestone M-45-03H. This Crawler Acquisition and Testing Demonstration project management plan includes the following: (1) Identification of acquisition strategy and plan to obtain a crawler based retrieval system; (2) Plan for sufficient cold testing to make a decision for W-523 and to comply with TPA Milestone M-45-03H; (3) Cost and schedule for path forward; (4) Responsibilities of the participants; and (5) The plan is supported by updated Level 1 logics, a Relative Order of Magnitude cost estimate and preliminary project schedule.« less
Context as a Factor in Personal Information Management Systems.
ERIC Educational Resources Information Center
Barreau, Deborah K.
1995-01-01
Examines context as a factor in personal information management systems to suggest how it may influence classification decisions and ultimately retrieval. A study of seven managers is described that explored the factors that influence the way individuals manage electronic documents, and results are compared with an earlier study of physical…
A Natural Documentation Retrieval System for Macromolecular Chemistry
ERIC Educational Resources Information Center
Ulbrich, Raimund; Wierer, Jutta
1972-01-01
An indexing system for chemistry and technology of macromolecular substances is sketched out, whose characteristics are convenience of use and low cost. The selection mechanism consists of a set of optical coincidence cards. The selection is a result of 15 years experience in the German Plastics Institute. (13 references) (Author)
A Nugget-Based Test Collection Construction Paradigm
ERIC Educational Resources Information Center
Rajput, Shahzad K.
2012-01-01
The problem of building test collections is central to the development of information retrieval systems such as search engines. The primary use of test collections is the evaluation of IR systems. The widely employed "Cranfield paradigm" dictates that the information relevant to a topic be encoded at the level of documents, therefore…
ERIC Educational Resources Information Center
Debons, Anthony; and Others
A proposed classification system was studied to determine its efficacy to the Air Force Control-Display Area. Based on negative outcomes from a logical assessment of the proposed system, an alternate system was proposed to include the coordinate index concept. Upon development of a thesaurus and an index system for 106 documents on VSTOL/VTOL…
Improving program documentation quality through the application of continuous improvement processes.
Lovlien, Cheryl A; Johansen, Martha; Timm, Sandra; Eversman, Shari; Gusa, Dorothy; Twedell, Diane
2007-01-01
Maintaining the integrity of record keeping and retrievable information related to the provision of continuing education credit creates challenges for a large organization. Accurate educational program documentation is vital to support the knowledge and professional development of nursing staff. Quality review and accurate documentation of programs for nursing staff development occurred at one institution through the use of continuous improvement principles. Integration of the new process into the current system maintains the process of providing quality record keeping.
Content-based retrieval of historical Ottoman documents stored as textual images.
Saykol, Ediz; Sinop, Ali Kemal; Güdükbay, Ugur; Ulusoy, Ozgür; Cetin, A Enis
2004-03-01
There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
Leveraging Terminologies for Retrieval of Radiology Reports with Critical Imaging Findings
Warden, Graham I.; Lacson, Ronilda; Khorasani, Ramin
2011-01-01
Introduction: Communication of critical imaging findings is an important component of medical quality and safety. A fundamental challenge includes retrieval of radiology reports that contain these findings. This study describes the expressiveness and coverage of existing medical terminologies for critical imaging findings and evaluates radiology report retrieval using each terminology. Methods: Four terminologies were evaluated: National Cancer Institute Thesaurus (NCIT), Radiology Lexicon (RadLex), Systemized Nomenclature of Medicine (SNOMED-CT), and International Classification of Diseases (ICD-9-CM). Concepts in each terminology were identified for 10 critical imaging findings. Three findings were subsequently selected to evaluate document retrieval. Results: SNOMED-CT consistently demonstrated the highest number of overall terms (mean=22) for each of ten critical findings. However, retrieval rate and precision varied between terminologies for the three findings evaluated. Conclusion: No single terminology is optimal for retrieving radiology reports with critical findings. The expressiveness of a terminology does not consistently correlate with radiology report retrieval. PMID:22195212
Support Vector Machines: Relevance Feedback and Information Retrieval.
ERIC Educational Resources Information Center
Drucker, Harris; Shahrary, Behzad; Gibbon, David C.
2002-01-01
Compares support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred. Includes nine tables. (Contains 24…
A Semantic Medical Multimedia Retrieval Approach Using Ontology Information Hiding
Guo, Kehua; Zhang, Shigeng
2013-01-01
Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches. PMID:24082915
NoSQL: collection document and cloud by using a dynamic web query form
NASA Astrophysics Data System (ADS)
Abdalla, Hemn B.; Lin, Jinzhao; Li, Guoquan
2015-07-01
Mongo-DB (from "humongous") is an open-source document database and the leading NoSQL database. A NoSQL (Not Only SQL, next generation databases, being non-relational, deal, open-source and horizontally scalable) presenting a mechanism for storage and retrieval of documents. Previously, we stored and retrieved the data using the SQL queries. Here, we use the MonogoDB that means we are not utilizing the MySQL and SQL queries. Directly importing the documents into our Drives, retrieving the documents on that drive by not applying the SQL queries, using the IO BufferReader and Writer, BufferReader for importing our type of document files to my folder (Drive). For retrieving the document files, the usage is BufferWriter from the particular folder (or) Drive. In this sense, providing the security for those storing files for what purpose means if we store the documents in our local folder means all or views that file and modified that file. So preventing that file, we are furnishing the security. The original document files will be changed to another format like in this paper; Binary format is used. Our documents will be converting to the binary format after that direct storing in one of our folder, that time the storage space will provide the private key for accessing that file. Wherever any user tries to discover the Document files means that file data are in the binary format, the document's file owner simply views that original format using that personal key from receive the secret key from the cloud.
Nosql for Storage and Retrieval of Large LIDAR Data Collections
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.
2015-08-01
Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
Tags Extarction from Spatial Documents in Search Engines
NASA Astrophysics Data System (ADS)
Borhaninejad, S.; Hakimpour, F.; Hamzei, E.
2015-12-01
Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.
MPEG-7 audio-visual indexing test-bed for video retrieval
NASA Astrophysics Data System (ADS)
Gagnon, Langis; Foucher, Samuel; Gouaillier, Valerie; Brun, Christelle; Brousseau, Julie; Boulianne, Gilles; Osterrath, Frederic; Chapdelaine, Claude; Dutrisac, Julie; St-Onge, Francis; Champagne, Benoit; Lu, Xiaojian
2003-12-01
This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
Substance use disorders in Arab countries: research activity and bibliometric analysis
2014-01-01
Background Substance use disorders, which include substance abuse and substance dependence, are present in all regions of the world including Middle Eastern Arab countries. Bibliometric analysis is an increasingly used tool for research assessment. The main objective of this study was to assess research productivity in the field of substance use disorders in Arab countries using bibliometric indicators. Methodology Original or review research articles authored or co-authored by investigators from Arab countries about substance use disorders during the period 1900 – 2013 were retrieved using the ISI Web of Science database. Research activity was assessed by analyzing the annual research productivity, contribution of each Arab country, names of journals, citations, and types of abused substances. Results Four hundred and thirteen documents in substance use disorders were retrieved. Annual research productivity was low but showed a significant increase in the last few years. In terms of quantity, Kingdom of Saudi Arabia (83 documents) ranked first in research about substance use disorders while Lebanon (17.4 documents per million) ranked first in terms of number of documents published per million inhabitants. Retrieved documents were found in different journal titles and categories, mostly in Drug and Alcohol Dependence Journal. Authors from USA appeared in 117 documents published by investigators from Arab countries. Citation analysis of retrieved documents showed that the average citation per document was 10.76 and the h - index was 35. The majority of retrieved documents were about tobacco and smoking (175 documents) field while alcohol consumption and abuse research was the least with 69 documents. Conclusion The results obtained suggest that research in this field was largely neglected in the past. However, recent research interest was observed. Research output on tobacco and smoking was relatively high compared to other substances of abuse like illicit drugs and medicinal agents. Governmental funding for academics and mental health graduate programs to do research in the field of substance use disorders is highly recommended. PMID:25148888
Spatial Paradigm for Information Retrieval and Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
The SPIRE system consists of software for visual analysis of primarily text based information sources. This technology enables the content analysis of text documents without reading all the documents. It employs several algorithms for text and word proximity analysis. It identifies the key themes within the text documents. From this analysis, it projects the results onto a visual spatial proximity display (Galaxies or Themescape) where items (documents and/or themes) visually close to each other are known to have content which is close to each other. Innovative interaction techniques then allow for dynamic visual analysis of large text based information spaces.
SPIRE1.03. Spatial Paradigm for Information Retrieval and Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, K.J.; Bohn, S.; Crow, V.
The SPIRE system consists of software for visual analysis of primarily text based information sources. This technology enables the content analysis of text documents without reading all the documents. It employs several algorithms for text and word proximity analysis. It identifies the key themes within the text documents. From this analysis, it projects the results onto a visual spatial proximity display (Galaxies or Themescape) where items (documents and/or themes) visually close to each other are known to have content which is close to each other. Innovative interaction techniques then allow for dynamic visual analysis of large text based information spaces.
An exponentiation method for XML element retrieval.
Wichaiwong, Tanakorn
2014-01-01
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP.
The SAPHIRE server: a new algorithm and implementation.
Hersh, W.; Leone, T. J.
1995-01-01
SAPHIRE is an experimental information retrieval system implemented to test new approaches to automated indexing and retrieval of medical documents. Due to limitations in its original concept-matching algorithm, a modified algorithm has been implemented which allows greater flexibility in partial matching and different word order within concepts. With the concomitant growth in client-server applications and the Internet in general, the new algorithm has been implemented as a server that can be accessed via other applications on the Internet. PMID:8563413
Issues and solutions for storage, retrieval, and searching of MPEG-7 documents
NASA Astrophysics Data System (ADS)
Chang, Yuan-Chi; Lo, Ming-Ling; Smith, John R.
2000-10-01
The ongoing MPEG-7 standardization activity aims at creating a standard for describing multimedia content in order to facilitate the interpretation of the associated information content. Attempting to address a broad range of applications, MPEG-7 has defined a flexible framework consisting of Descriptors, Description Schemes, and Description Definition Language. Descriptors and Description Schemes describe features, structure and semantics of multimedia objects. They are written in the Description Definition Language (DDL). In the most recent revision, DDL applies XML (Extensible Markup Language) Schema with MPEG-7 extensions. DDL has constructs that support inclusion, inheritance, reference, enumeration, choice, sequence, and abstract type of Description Schemes and Descriptors. In order to enable multimedia systems to use MPEG-7, a number of important problems in storing, retrieving and searching MPEG-7 documents need to be solved. This paper reports on initial finding on issues and solutions of storing and accessing MPEG-7 documents. In particular, we discuss the benefits of using a virtual document management framework based on XML Access Server (XAS) in order to bridge the MPEG-7 multimedia applications and database systems. The need arises partly because MPEG-7 descriptions need customized storage schema, indexing and search engines. We also discuss issues arising in managing dependence and cross-description scheme search.
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.
ERIC Educational Resources Information Center
Stewart, Mark; Willett, Peter
1987-01-01
Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
ERIC Educational Resources Information Center
Schultz, Louise, Ed.
The 31 papers in this proceedings cover social as well as technical issues, mathematical models and formal logic systems, applications descriptions, program design, cost analysis, and predictions. The papers are grouped into sessions including: privacy and information technology, describing documents, information dissemination systems, public and…
76 FR 4431 - Privacy Act of 1974; Report of Modified or Altered System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-25
..., records on biological specimens (e.g. blood, urine, etc.), and related documents such as [[Page 4433... Disease Control and Prevention (CDC), for laboratory analysis of samples and for collaborative efforts (i... PRACTICES FOR STORING, RETRIEVING, ACCESSING, RETAINING, AND DISPOSING OF RECORDS IN THE SYSTEM: STORAGE...
Implementation of Imaging Technology for Recordkeeping at the World Bank.
ERIC Educational Resources Information Center
Smith, Clive D.
1997-01-01
Describes the evolution of an electronic document management system for the World Bank, including record-keeping components, and how the Pittsburgh requirements for evidence in record keeping were used to evaluate it. Discusses imaging technology for scanning paper records, metadata for retrieval and record keeping, and extending the system to…
Love, Erika; Butzin, Diane; Robinson, Robert E.; Lee, Soo
1971-01-01
A project to recatalog and reclassify the book collection of the Bowman Gray School of Medicine Library utilizing the Magnetic Tape/Selectric Typwriter system for simultaneous catalog card production and computer stored data acquisition marks the beginning of eventual computerization of all library operations. A keyboard optical display system will be added by late 1970. Major input operations requiring the creation of “hard copy” will continue via the MTST system. Updating, editing and retrieval operations as well as input without hard copy production will be done through the “on-line” keyboard optical display system. Once the library's first data bank, the book catalog, has been established the computer may be consulted directly for library holdings from any optical display terminal throughout the medical center. Three basic information retrieval operations may be carried out through “on-line” optical display terminals. Output options include the reproduction of part or all of a given document, or the generation of statistical data, which are derived from two Acquisition Code lines. The creation of a central bibliographic record of Bowman Gray Faculty publications patterned after the cataloging program is presently under way. The cataloging and computer storage of serial holdings records will begin after completion of the reclassification project. All acquisitions added to the collection since October 1967 are computer-stored and fully retrievable. Reclassification of older titles will be completed in early 1971. PMID:5542915
2015-01-01
Background PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users’ knowledge of information retrieval (IR) functions and their usage in query formulation. Objective The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. Methods To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. Results To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Conclusions Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length. PMID:26139516
Information Retrieval in Biomedical Research: From Articles to Datasets
ERIC Educational Resources Information Center
Wei, Wei
2017-01-01
Information retrieval techniques have been applied to biomedical research for a variety of purposes, such as textual document retrieval and molecular data retrieval. As biomedical research evolves over time, information retrieval is also constantly facing new challenges, including the growing number of available data, the emerging new data types,…
Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.
Segura Bedmar, Isabel; Martínez, Paloma; Carruana Martín, Adrián
2017-12-01
Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. Our experiments show promising results with an F1 of 69% on the test dataset. To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time. ©Isabel Segura Bedmar, Paloma Martínez, Adrián Carruana Martín. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.12.2017.
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
2006-01-01
The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina Demner-Fushman Department of Computer Science... dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that...in which the documents are written. In dictionary -based CLIR techniques, the princi- pal source of translation knowledge is a translation lexicon
Third Annual Symposium on Document Analysis and Information Retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This document presents papers of the Third Annual Symposium on Document Analysis and Information Retrieval at the Information Science Research-l Institute at the University of Nevada, Las Vegas (UNLV/ISRI). Of the 60 papers submitted, 25 were accepted for oral presentation and 9 as poster papers. Both oral presentations and poster papers are included in these Proceedings. The individual papers have been cataloged separately.
SLUDGE PARTICLE SEPAPATION EFFICIENCIES DURING SETTLER TANK RETRIEVAL INTO SCS-CON-230
DOE Office of Scientific and Technical Information (OSTI.GOV)
DEARING JI; EPSTEIN M; PLYS MG
2009-07-16
The purpose of this document is to release, into the Hanford Document Control System, FA1/0991, Sludge Particle Separation Efficiencies for the Rectangular SCS-CON-230 Container, by M. Epstein and M. G. Plys, Fauske & Associates, LLC, June 2009. The Sludge Treatment Project (STP) will retrieve sludge from the 105-K West Integrated Water Treatment System (IWTS) Settler Tanks and transfer it to container SCS-CON-230 using the Settler Tank Retrieval System (STRS). The sludge will enter the container through two distributors. The container will have a filtration system that is designed to minimize the overflow of sludge fines from the container to themore » basin. FAI/09-91 was performed to quantify the effect of the STRS on sludge distribution inside of and overflow out of SCS-CON-230. Selected results of the analysis and a system description are discussed. The principal result of the analysis is that the STRS filtration system reduces the overflow of sludge from SCS-CON-230 to the basin by roughly a factor of 10. Some turbidity can be expected in the center bay where the container is located. The exact amount of overflow and subsequent turbidity is dependent on the density of the sludge (which will vary with location in the Settler Tanks) and the thermal gradient between the SCS-CON-230 and the basin. Attachment A presents the full analytical results. These results are applicable specifically to SCS-CON-230 and the STRS filtration system's expected operating duty cycles.« less
NASA Technical Reports Server (NTRS)
Chang, L. Aron
1995-01-01
This document describes the progress of the task of the Millimeter-wave Imaging Radiometer (MIR) data processing and the development of water vapor retrieval algorithms, for the second six-month performing period. Aircraft MIR data from two 1995 field experiments were collected and processed with a revised data processing software. Two revised versions of water vapor retrieval algorithm were developed, one for the execution of retrieval on a supercomputer platform, and one for using pressure as the vertical coordinate. Two implementations of incorporating products from other sensors into the water vapor retrieval system, one from the Special Sensor Microwave Imager (SSM/I), the other from the High-resolution Interferometer Sounder (HIS). Water vapor retrievals were performed for both airborne MIR data and spaceborne SSM/T-2 data, during field experiments of TOGA/COARE, CAMEX-1, and CAMEX-2. The climatology of water vapor during TOGA/COARE was examined by SSM/T-2 soundings and conventional rawinsonde.
Web information retrieval based on ontology
NASA Astrophysics Data System (ADS)
Zhang, Jian
2013-03-01
The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.
Challenges and methodology for indexing the computerized patient record.
Ehrler, Frédéric; Ruch, Patrick; Geissbuhler, Antoine; Lovis, Christian
2007-01-01
Patient records contain most crucial documents for managing the treatments and healthcare of patients in the hospital. Retrieving information from these records in an easy, quick and safe way helps care providers to save time and find important facts about their patient's health. This paper presents the scalability issues induced by the indexing and the retrieval of the information contained in the patient records. For this study, EasyIR, an information retrieval tool performing full text queries and retrieving the related documents has been used. An evaluation of the performance reveals that the indexing process suffers from overhead consequence of the particular structure of the patient records. Most IR tools are designed to manage very large numbers of documents in a single index whereas in our hypothesis, one index per record, which usually implies few documents, has been imposed. As the number of modifications and creations of patient records are significant in a day, using a specialized and efficient indexation tool is required.
Functions and requirements for tank farm restoration and safe operations, Project W-314. Revision 3
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrison, R.C.
1995-02-01
This Functions and Requirements document (FRD) establishes the basic performance criteria for Project W-314, in accordance with the guidance outlined in the letter from R.W. Brown, RL, to President, WHC, ``Tank Waste Remediation System (TWRS) Project Documentation Methodology,`` 94-PRJ-018, dated 3/18/94. The FRD replaces the Functional Design Criteria (FDC) as the project technical baseline documentation. Project W-314 will improve the reliability of safety related systems, minimize onsite health and safety hazards, and support waste retrieval and disposal activities by restoring and/or upgrading existing Tank Farm facilities and systems. The scope of Project W-314 encompasses the necessary restoration upgrades of themore » Tank Farms` instrumentation, ventilation, electrical distribution, and waste transfer systems.« less
NASA Technical Reports Server (NTRS)
1986-01-01
The Johnson Space Center Management Information System (JSCMIS) is an interface to computer data bases at NASA Johnson which allows an authorized user to browse and retrieve information from a variety of sources with minimum effort. This issue gives requirements definition and design specifications for versions 2.1 and 2.1.1, along with documented test scenario environments, and security object design and specifications.
Dinh, Duy; Tamine, Lynda; Boubekeur, Fatiha
2013-02-01
The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p<0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance. Copyright © 2012 Elsevier B.V. All rights reserved.
On the Application of Syntactic Methodologies in Automatic Text Analysis.
ERIC Educational Resources Information Center
Salton, Gerard; And Others
1990-01-01
Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…
An Approach to a Digital Library of Newspapers.
ERIC Educational Resources Information Center
Arambura Cabo, Maria Jose; Berlanga Llavori, Rafael
1997-01-01
Presents a new application for retrieving news from a large electronic bank of newspapers that is intended to manage past issues of newspapers. Highlights include a data model for newspapers, including metadata and metaclasses; document definition language; document retrieval language; and memory organization and indexes. (Author/LRW)
Querying and Ranking XML Documents.
ERIC Educational Resources Information Center
Schlieder, Torsten; Meuss, Holger
2002-01-01
Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
Menon, K Venugopal; Kumar, Dinesh; Thomas, Tessamma
2014-02-01
Study Design Preliminary evaluation of new tool. Objective To ascertain whether the newly developed content-based image retrieval (CBIR) software can be used successfully to retrieve images of similar cases of adolescent idiopathic scoliosis (AIS) from a database to help plan treatment without adhering to a classification scheme. Methods Sixty-two operated cases of AIS were entered into the newly developed CBIR database. Five new cases of different curve patterns were used as query images. The images were fed into the CBIR database that retrieved similar images from the existing cases. These were analyzed by a senior surgeon for conformity to the query image. Results Within the limits of variability set for the query system, all the resultant images conformed to the query image. One case had no similar match in the series. The other four retrieved several images that were matching with the query. No matching case was left out in the series. The postoperative images were then analyzed to check for surgical strategies. Broad guidelines for treatment could be derived from the results. More precise query settings, inclusion of bending films, and a larger database will enhance accurate retrieval and better decision making. Conclusion The CBIR system is an effective tool for accurate documentation and retrieval of scoliosis images. Broad guidelines for surgical strategies can be made from the postoperative images of the existing cases without adhering to any classification scheme.
Poster — Thur Eve — 52: A Web-based Platform for Collaborative Document Management in Radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kildea, J.; Joseph, A.
We describe DepDocs, a web-based platform that we have developed to manage the committee meetings, policies, procedures and other documents within our otherwise paperless radiotherapy clinic. DepDocs is essentially a document management system based on the popular Drupal content management software. For security and confidentiality, it is hosted on a linux server internal to our hospital network such that documents are never sent to the cloud or outside of the hospital firewall. We used Drupal's in-built role-based user rights management system to assign a role, and associated document editing rights, to each user. Documents are accessed for viewing using eithermore » a simple Google-like search or by generating a list of related documents from a taxonomy of categorization terms. Our system provides document revision tracking and an document review and approval mechanism for all official policies and procedures. Committee meeting schedules, agendas and minutes are maintained by committee chairs and are restricted to committee members. DepDocs has been operational within our department for over six months and has already 45 unique users and an archive of over 1000 documents, mostly policies and procedures. Documents are easily retrievable from the system using any web browser within our hospital's network.« less
The computer integrated documentation project: A merge of hypermedia and AI techniques
NASA Technical Reports Server (NTRS)
Mathe, Nathalie; Boy, Guy
1993-01-01
To generate intelligent indexing that allows context-sensitive information retrieval, a system must be able to acquire knowledge directly through interaction with users. In this paper, we present the architecture for CID (Computer Integrated Documentation). CID is a system that enables integration of various technical documents in a hypertext framework and includes an intelligent browsing system that incorporates indexing in context. CID's knowledge-based indexing mechanism allows case based knowledge acquisition by experimentation. It utilizes on-line user information requirements and suggestions either to reinforce current indexing in case of success or to generate new knowledge in case of failure. This allows CID's intelligent interface system to provide helpful responses, based on previous experience (user feedback). We describe CID's current capabilities and provide an overview of our plans for extending the system.
Final Inventory Work-Off Plan for ORNL transuranic wastes (1986 version)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dickerson, L.S.
1988-05-01
The Final Inventory Work-Off Plan (IWOP) for ORNL Transuranic Wastes addresses ORNL's strategy for retrieval, certification, and shipment of its stored and newly generated contact-handled (CH) and remote-handled (RH) transuranic (TRU) wastes to the Waste Isolation Pilot Plant (WIPP), the proposed geologic repository near Carlsbad, New Mexico. This document considers certification compliance with the WIPP waste acceptance criteria (WAC) and is consistent with the US Department of Energy's Long-Range Master Plan for Defense Transuranic Waste Management. This document characterizes Oak Ridge National Laboratory's (ORNL's) TRU waste by type and estimates the number of shipments required to dispose of it; describesmore » the methods, facilities, and systems required for its certification and shipment; presents work-off strategies and schedules for retrieval, certification, and transportation; discusses the resource needs and additions that will be required for the effort and forecasts costs for the long-term TRU waste management program; and lists public documentation required to support certification facilities and strategies. 22 refs., 6 figs., 10 tabs.« less
Independent Orbiter Assessment (IOA): Analysis of the remote manipulator system
NASA Technical Reports Server (NTRS)
Tangorra, F.; Grasmeder, R. F.; Montgomery, A. D.
1987-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items (PCIs). To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. The independent analysis results for the Orbiter Remote Manipulator System (RMS) are documented. The RMS hardware and software are primarily required for deploying and/or retrieving up to five payloads during a single mission, capture and retrieve free-flying payloads, and for performing Manipulator Foot Restraint operations. Specifically, the RMS hardware consists of the following components: end effector; displays and controls; manipulator controller interface unit; arm based electronics; and the arm. The IOA analysis process utilized available RMS hardware drawings, schematics and documents for defining hardware assemblies, components and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. Of the 574 failure modes analyzed, 413 were determined to be PCIs.
NASA Astrophysics Data System (ADS)
Henze, F.; Magdalinski, N.; Schwarzbach, F.; Schulze, A.; Gerth, Ph.; Schäfer, F.
2013-07-01
Information systems play an important role in historical research as well as in heritage documentation. As part of a joint research project of the German Archaeological Institute, the Brandenburg University of Technology Cottbus and the Dresden University of Applied Sciences a web-based documentation system is currently being developed, which can easily be adapted to the needs of different projects with individual scientific concepts, methods and questions. Based on open source and standardized technologies it will focus on open and well-documented interfaces to ease the dissemination and re-use of its content via web-services and to communicate with desktop applications for further evaluation and analysis. Core of the system is a generic data model that represents a wide range of topics and methods of archaeological work. By the provision of a concerted amount of initial themes and attributes a cross project analysis of research data will be possible. The development of enhanced search and retrieval functionalities will simplify the processing and handling of large heterogeneous data sets. To achieve a high degree of interoperability with existing external data, systems and applications, standardized interfaces will be integrated. The analysis of spatial data shall be possible through the integration of web-based GIS functions. As an extension to this, customized functions for storage, processing and provision of 3D geo data are being developed. As part of the contribution system requirements and concepts will be presented and discussed. A particular focus will be on introducing the generic data model and the derived database schema. The research work on enhanced search and retrieval capabilities will be illustrated by prototypical developments, as well as concepts and first implementations for an integrated 2D/3D Web-GIS.
Automation of the CAS Document Delivery Service.
ERIC Educational Resources Information Center
Steensland, M. C.; Soukup, K. M.
1986-01-01
The automation of online order retrieval for Chemical Abstracts Service Document Delivery Service was accomplished by shifting to an order retrieval/dispatch process linked to a Unix network. The Unix-based environment, its terminal emulation, page-break, and user-friendly interface software, and later enhancements are reviewed. Resultant increase…
A Vector Space Model for Automatic Indexing.
ERIC Educational Resources Information Center
Salton, G.; And Others
In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other, or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; that is, retrieval performance correlates inversely…
ASSOCIATIVE ADJUSTMENTS TO REDUCE ERRORS IN DOCUMENT SEARCHING.
ERIC Educational Resources Information Center
BRYANT, EDWARD C.; AND OTHERS
ASSOCIATIVE ADJUSTMENTS TO A DOCUMENT FILE ARE CONSIDERED AS A MEANS FOR IMPROVING RETRIEVAL. A THEORETICAL INVESTIGATION OF THE STATISTICAL PROPERTIES OF A GENERALIZED MISMATCH MEASURE WAS CARRIED OUT AND IMPROVEMENTS IN RETRIEVAL RESULTING FROM PERFORMING ASSOCIATIVE REGRESSION ADJUSTMENTS ON DATA FILE WERE EXAMINED BOTH FROM THE THEORETICAL AND…
An Exponentiation Method for XML Element Retrieval
2014-01-01
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP. PMID:24696643
Computer integrated documentation
NASA Technical Reports Server (NTRS)
Boy, Guy
1991-01-01
The main technical issues of the Computer Integrated Documentation (CID) project are presented. The problem of automation of documents management and maintenance is analyzed both from an artificial intelligence viewpoint and from a human factors viewpoint. Possible technologies for CID are reviewed: conventional approaches to indexing and information retrieval; hypertext; and knowledge based systems. A particular effort was made to provide an appropriate representation for contextual knowledge. This representation is used to generate context on hypertext links. Thus, indexing in CID is context sensitive. The implementation of the current version of CID is described. It includes a hypertext data base, a knowledge based management and maintenance system, and a user interface. A series is also presented of theoretical considerations as navigation in hyperspace, acquisition of indexing knowledge, generation and maintenance of a large documentation, and relation to other work.
Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches
Wang, Yanshan; Shen, Feichen; Liu, Sijia; Rastegar-Mojarad, Majid; Wang, Liwei
2017-01-01
Background Self-management is crucial to diabetes care and providing expert-vetted content for answering patients’ questions is crucial in facilitating patient self-management. Objective The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. Methods We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. Results We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. Conclusions This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients’ questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums. PMID:29038097
A Phrase-Based Matching Function.
ERIC Educational Resources Information Center
Galbiati, Giulia
1991-01-01
Describes the development of an information retrieval system designed for nonspecialist users that is based on the binary vector model. The syntactic structure of phrases used for indexing is examined, queries using an experimental collection of documents are described, and precision values are examined. (19 references) (LRW)
Distributed Multimedia Computing: An Assessment of the State of the Art.
ERIC Educational Resources Information Center
Williams, Neil; And Others
1991-01-01
Describes multimedia computing and the characteristics of multimedia information. Trends in information technology are reviewed; distributed multimedia computing is explained; media types are described, including digital media; and multimedia applications are examined, including office systems, documents, information storage and retrieval,…
76 FR 49805 - Submission for OMB Review; Comment Request
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-11
... request for extension of the previously approved collection of information discussed below. Regulation S-T... submission of documents on the Electronic Data Gathering, Analysis and Retrieval (``EDGAR'') system... any information collection requirements. An agency may not conduct or sponsor, and a person is not...
Document Delivery from Full-Text Online Files: A Pilot Project.
ERIC Educational Resources Information Center
Gillikin, David P.
1990-01-01
Describes the Electronic Journal Retrieval Project (EJRP) developed at the University of Tennessee, Knoxville Libraries, to provide full-text journal articles from online systems. Highlights include costs of various search strategies; implications for library services; collection development and interlibrary loan considerations; and suggestions…
Framing Electronic Medical Records as Polylingual Documents in Query Expansion
Huang, Edward W; Wang, Sheng; Lee, Doris Jung-Lin; Zhang, Runshun; Liu, Baoyan; Zhou, Xuezhong; Zhai, ChengXiang
2017-01-01
We present a study of electronic medical record (EMR) retrieval that emulates situations in which a doctor treats a new patient. Given a query consisting of a new patient’s symptoms, the retrieval system returns the set of most relevant records of previously treated patients. However, due to semantic, functional, and treatment synonyms in medical terminology, queries are often incomplete and thus require enhancement. In this paper, we present a topic model that frames symptoms and treatments as separate languages. Our experimental results show that this method improves retrieval performance over several baselines with statistical significance. These baselines include methods used in prior studies as well as state-of-the-art embedding techniques. Finally, we show that our proposed topic model discovers all three types of synonyms to improve medical record retrieval. PMID:29854161
Structuring Broadcast Audio for Information Access
NASA Astrophysics Data System (ADS)
Gauvain, Jean-Luc; Lamel, Lori
2003-12-01
One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks.
Software Assists in Responding to Anomalous Conditions
NASA Technical Reports Server (NTRS)
James, Mark; Kronbert, F.; Weiner, A.; Morgan, T.; Stroozas, B.; Girouard, F.; Hopkins, A.; Wong, L.; Kneubuhl, J.; Malina, R.
2004-01-01
Fault Induced Document Retrieval Officer (FIDO) is a computer program that reduces the need for a large and costly team of engineers and/or technicians to monitor the state of a spacecraft and associated ground systems and respond to anomalies. FIDO includes artificial-intelligence components that imitate the reasoning of human experts with reference to a knowledge base of rules that represent failure modes and to a database of engineering documentation. These components act together to give an unskilled operator instantaneous expert assistance and access to information that can enable resolution of most anomalies, without the need for highly paid experts. FIDO provides a system state summary (a configurable engineering summary) and documentation for diagnosis of a potentially failing component that might have caused a given error message or anomaly. FIDO also enables high-level browsing of documentation by use of an interface indexed to the particular error message. The collection of available documents includes information on operations and associated procedures, engineering problem reports, documentation of components, and engineering drawings. FIDO also affords a capability for combining information on the state of ground systems with detailed, hierarchically-organized, hypertext- enabled documentation.
Sampling criteria in multicollection searching.
NASA Astrophysics Data System (ADS)
Gilio, A.; Scozzafava, R.; Marchetti, P. G.
In the first stage of the document retrieval process, no information concerning relevance of a particular document is available. On the other hand, computer implementation requires that the analysis be made only for a sample of retrieved documents. This paper addresses the significance and suitability of two different sampling criteria for a multicollection online search facility. The inevitability of resorting to a logarithmic criterion in order to achieve a "spread of representativeness" from the multicollection is demonstrated.
Cieslewicz, Artur; Dutkiewicz, Jakub; Jedrzejek, Czeslaw
2018-01-01
Abstract Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data PMID:29688372
An architecture for diversity-aware search for medical web content.
Denecke, K
2012-01-01
The Web provides a huge source of information, also on medical and health-related issues. In particular the content of medical social media data can be diverse due to the background of an author, the source or the topic. Diversity in this context means that a document covers different aspects of a topic or a topic is described in different ways. In this paper, we introduce an approach that allows to consider the diverse aspects of a search query when providing retrieval results to a user. We introduce a system architecture for a diversity-aware search engine that allows retrieving medical information from the web. The diversity of retrieval results is assessed by calculating diversity measures that rely upon semantic information derived from a mapping to concepts of a medical terminology. Considering these measures, the result set is diversified by ranking more diverse texts higher. The methods and system architecture are implemented in a retrieval engine for medical web content. The diversity measures reflect the diversity of aspects considered in a text and its type of information content. They are used for result presentation, filtering and ranking. In a user evaluation we assess the user satisfaction with an ordering of retrieval results that considers the diversity measures. It is shown through the evaluation that diversity-aware retrieval considering diversity measures in ranking could increase the user satisfaction with retrieval results.
Mann, G; Birkmann, C; Schmidt, T; Schaeffler, V
1999-01-01
Introduction Present solutions for the representation and retrieval of medical information from online sources are not very satisfying. Either the retrieval process lacks of precision and completeness the representation does not support the update and maintenance of the represented information. Most efforts are currently put into improving the combination of search engines and HTML based documents. However, due to the current shortcomings of methods for natural language understanding there are clear limitations to this approach. Furthermore, this approach does not solve the maintenance problem. At least medical information exceeding a certain complexity seems to afford approaches that rely on structured knowledge representation and corresponding retrieval mechanisms. Methods Knowledge-based information systems are based on the following fundamental ideas. The representation of information is based on ontologies that define the structure of the domain's concepts and their relations. Views on domain models are defined and represented as retrieval schemata. Retrieval schemata can be interpreted as canonical query types focussing on specific aspects of the provided information (e.g. diagnosis or therapy centred views). Based on these retrieval schemata it can be decided which parts of the information in the domain model must be represented explicitly and formalised to support the retrieval process. As representation language propositional logic is used. All other information can be represented in a structured but informal way using text, images etc. Layout schemata are used to assign layout information to retrieved domain concepts. Depending on the target environment HTML or XML can be used. Results Based on this approach two knowledge-based information systems have been developed. The 'Ophthalmologic Knowledge-based Information System for Diabetic Retinopathy' (OKIS-DR) provides information on diagnoses, findings, examinations, guidelines, and reference images related to diabetic retinopathy. OKIS-DR uses combinations of findings to specify the information that must be retrieved. The second system focuses on nutrition related allergies and intolerances. Information on allergies and intolerances of a patient are used to retrieve general information on the specified combination of allergies and intolerances. As a special feature the system generates tables showing food types and products that are tolerated or not tolerated by patients. Evaluation by external experts and user groups showed that the described approach of knowledge-based information systems increases the precision and completeness of knowledge retrieval. Due to the structured and non-redundant representation of information the maintenance and update of the information can be simplified. Both systems are available as WWW based online knowledge bases and CD-ROMs (cf. http://mta.gsf.de topic: products).
Automation of Design Engineering Processes
NASA Technical Reports Server (NTRS)
Torrey, Glenn; Sawasky, Gerald; Courey, Karim
2004-01-01
A method, and a computer program that helps to implement the method, have been developed to automate and systematize the retention and retrieval of all the written records generated during the process of designing a complex engineering system. It cannot be emphasized strongly enough that all the written records as used here is meant to be taken literally: it signifies not only final drawings and final engineering calculations but also such ancillary documents as minutes of meetings, memoranda, requests for design changes, approval and review documents, and reports of tests. One important purpose served by the method is to make the records readily available to all involved users via their computer workstations from one computer archive while eliminating the need for voluminous paper files stored in different places. Another important purpose served by the method is to facilitate the work of engineers who are charged with sustaining the system and were not involved in the original design decisions. The method helps the sustaining engineers to retrieve information that enables them to retrace the reasoning that led to the original design decisions, thereby helping them to understand the system better and to make informed engineering choices pertaining to maintenance and/or modifications of the system. The software used to implement the method is written in Microsoft Access. All of the documents pertaining to the design of a given system are stored in one relational database in such a manner that they can be related to each other via a single tracking number.
Automatic generation of Web mining environments
NASA Astrophysics Data System (ADS)
Cibelli, Maurizio; Costagliola, Gennaro
1999-02-01
The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
Toward Medical Documentation That Enhances Situational Awareness Learning
Lenert, Leslie A.
2016-01-01
The purpose of writing medical notes in a computer system goes beyond documentation for medical-legal purposes or billing. The structure of documentation is a checklist that serves as a cognitive aid and a potential index to retrieve information for learning from the record. For the past 50 years, one of the primary organizing structures for physicians’ clinical documentation have been the SOAP note (Subjective, Objective, Assessment, Plan). The cognitive check list is well-suited to differential diagnosis but may not support detection of changes in systems and/or learning from cases. We describe an alternative cognitive checklist called the OODA Loop (Observe, Orient, Decide, Act. Through incorporation of projections of anticipated course events with and without treatment and by making “Decisions” an explicit category of documentation in the medical record in the context of a variable temporal cycle for observations, OODA may enhance opportunities to learn from clinical care. PMID:28269872
36 CFR 1238.12 - What documentation is required for microfilmed records?
Code of Federal Regulations, 2011 CFR
2011-07-01
... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR § 1238.12 - What documentation is required for microfilmed records?
Code of Federal Regulations, 2013 CFR
2013-07-01
... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR 1238.12 - What documentation is required for microfilmed records?
Code of Federal Regulations, 2014 CFR
2014-07-01
... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR 1238.12 - What documentation is required for microfilmed records?
Code of Federal Regulations, 2012 CFR
2012-07-01
... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
36 CFR 1238.12 - What documentation is required for microfilmed records?
Code of Federal Regulations, 2010 CFR
2010-07-01
... microforms capture all information contained on the source documents and that they can be used for the... retrieval and use. Agencies must: (a) Arrange, describe, and index the filmed records to permit retrieval of... titling target or header. For fiche, place the titling information in the first frame if the information...
2010-02-01
1 Charles E. Wilhelm, Expeditionary Warfare.marine corps gazette, 79(6), 28-30. Retrieved October 15, 2009, from Career and Technical Education . (Document...Expeditionary warfare.marine corps gazette, 79(6), 28- 30. Retrieved October 15, 2009, from Career and Technical Education . (Document ID: 4455650
On the Delusiveness of Adopting a Common Space for Modeling IR Objects: Are Queries Documents?
ERIC Educational Resources Information Center
Bollmann-Sdorra, Peter; Raghavan, Vjay V.
1993-01-01
Proposes that document space and query space have different structures in information retrieval and discusses similarity measures, term independence, and linear structure. Examples are given using the retrieval functions of dot-product, the cosine measure, the coefficient of Jaccard, and the overlap function. (Contains 28 references.) (LRW)
NASA Technical Reports Server (NTRS)
Suarez, Max J. (Editor); daSilva, Arlindo; Dee, Dick; Bloom, Stephen; Bosilovich, Michael; Pawson, Steven; Schubert, Siegfried; Wu, Man-Li; Sienkiewicz, Meta; Stajner, Ivanka
2005-01-01
This document describes the structure and validation of a frozen version of the Goddard Earth Observing System Data Assimilation System (GEOS DAS): GEOS-4.0.3. Significant features of GEOS-4 include: version 3 of the Community Climate Model (CCM3) with the addition of a finite volume dynamical core; version two of the Community Land Model (CLM2); the Physical-space Statistical Analysis System (PSAS); and an interactive retrieval system (iRET) for assimilating TOVS radiance data. Upon completion of the GEOS-4 validation in December 2003, GEOS-4 became operational on 15 January 2004. Products from GEOS-4 have been used in supporting field campaigns and for reprocessing several years of data for CERES.
Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval
2014-11-01
the same score, another singal will be used to rank these documents to break the ties , but the relative orders of other documents against these...documents remain the same. The tie- breaking step above is repeatedly applied to further break ties until all candidate signals are applied and the ranking...searched it on the Yahoo! search engine, which returned some query sug- gestions for the query. The original queries as well as their query suggestions
Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines.
ERIC Educational Resources Information Center
Pathak, Praveen; Gordon, Michael
1999-01-01
Describes a study that examined the effectiveness of eight search engines for the World Wide Web. Calculated traditional information-retrieval measures of recall and precision at varying numbers of retrieved documents to use as the bases for statistical comparisons of retrieval effectiveness. Also examined the overlap between search engines.…
A novel methodology for querying web images
NASA Astrophysics Data System (ADS)
Prabhakara, Rashmi; Lee, Ching Cheng
2005-01-01
Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.
A novel methodology for querying web images
NASA Astrophysics Data System (ADS)
Prabhakara, Rashmi; Lee, Ching Cheng
2004-12-01
Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.
The Ecological Approach to Text Visualization.
ERIC Educational Resources Information Center
Wise, James A.
1999-01-01
Presents both theoretical and technical bases on which to build a "science of text visualization." The Spatial Paradigm for Information Retrieval and Exploration (SPIRE) text-visualization system, which images information from free-text documents as natural terrains, serves as an example of the "ecological approach" in its visual metaphor, its…
Incorporating Non-Relevance Information in the Estimation of Query Models
2008-11-01
experiments in relevance feedback. In Salton , G., editor, The SMART Retrieval System – Exper- iments in Automatic Document Processing, pages 337– 354...W. (2001). Relevance based lan- guage models. In SIGIR ’01. Rocchio, J. (1971). Relevance feedback in information re- trieval. In Salton , G., editor
Luo, Jake; Chen, Weiheng; Wu, Min; Weng, Chunhua
2018-01-01
Background Prior studies of clinical trial planning indicate that it is crucial to search and screen recruitment sites before starting to enroll participants. However, currently there is no systematic method developed to support clinical investigators to search candidate recruitment sites according to their interested clinical trial factors. Objective In this study, we aim at developing a new approach to integrating the location data of over one million heterogeneous recruitment sites that are stored in clinical trial documents. The integrated recruitment location data can be searched and visualized using a map-based information retrieval method. The method enables systematic search and analysis of recruitment sites across a large amount of clinical trials. Methods The location data of more than 1.4 million recruitment sites of over 183,000 clinical trials was normalized and integrated using a geocoding method. The integrated data can be used to support geographic information retrieval of recruitment sites. Additionally, the information of over 6000 clinical trial target disease conditions and close to 4000 interventions was also integrated into the system and linked to the recruitment locations. Such data integration enabled the construction of a novel map-based query system. The system will allow clinical investigators to search and visualize candidate recruitment sites for clinical trials based on target conditions and interventions. Results The evaluation results showed that the coverage of the geographic location mapping for the 1.4 million recruitment sites was 99.8%. The evaluation of 200 randomly retrieved recruitment sites showed that the correctness of geographic information mapping was 96.5%. The recruitment intensities of the top 30 countries were also retrieved and analyzed. The data analysis results indicated that the recruitment intensity varied significantly across different countries and geographic areas. Conclusion This study contributed a new data processing framework to extract and integrate the location data of heterogeneous recruitment sites from clinical trial documents. The developed system can support effective retrieval and analysis of potential recruitment sites using target clinical trial factors. PMID:29132636
Luo, Jake; Chen, Weiheng; Wu, Min; Weng, Chunhua
2017-12-01
Prior studies of clinical trial planning indicate that it is crucial to search and screen recruitment sites before starting to enroll participants. However, currently there is no systematic method developed to support clinical investigators to search candidate recruitment sites according to their interested clinical trial factors. In this study, we aim at developing a new approach to integrating the location data of over one million heterogeneous recruitment sites that are stored in clinical trial documents. The integrated recruitment location data can be searched and visualized using a map-based information retrieval method. The method enables systematic search and analysis of recruitment sites across a large amount of clinical trials. The location data of more than 1.4 million recruitment sites of over 183,000 clinical trials was normalized and integrated using a geocoding method. The integrated data can be used to support geographic information retrieval of recruitment sites. Additionally, the information of over 6000 clinical trial target disease conditions and close to 4000 interventions was also integrated into the system and linked to the recruitment locations. Such data integration enabled the construction of a novel map-based query system. The system will allow clinical investigators to search and visualize candidate recruitment sites for clinical trials based on target conditions and interventions. The evaluation results showed that the coverage of the geographic location mapping for the 1.4 million recruitment sites was 99.8%. The evaluation of 200 randomly retrieved recruitment sites showed that the correctness of geographic information mapping was 96.5%. The recruitment intensities of the top 30 countries were also retrieved and analyzed. The data analysis results indicated that the recruitment intensity varied significantly across different countries and geographic areas. This study contributed a new data processing framework to extract and integrate the location data of heterogeneous recruitment sites from clinical trial documents. The developed system can support effective retrieval and analysis of potential recruitment sites using target clinical trial factors. Copyright © 2017 Elsevier B.V. All rights reserved.
Recognition techniques for extracting information from semistructured documents
NASA Astrophysics Data System (ADS)
Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna
2000-12-01
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
Lin, Jimmy
2008-01-01
Background Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web. Results We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics. Conclusion The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain. PMID:18538027
Improve Biomedical Information Retrieval using Modified Learning to Rank Methods.
Xu, Bo; Lin, Hongfei; Lin, Yuan; Ma, Yunlong; Yang, Liang; Wang, Jian; Yang, Zhihao
2016-06-14
In these years, the number of biomedical articles has increased exponentially, which becomes a problem for biologists to capture all the needed information manually. Information retrieval technologies, as the core of search engines, can deal with the problem automatically, providing users with the needed information. However, it is a great challenge to apply these technologies directly for biomedical retrieval, because of the abundance of domain specific terminologies. To enhance biomedical retrieval, we propose a novel framework based on learning to rank. Learning to rank is a series of state-of-the-art information retrieval techniques, and has been proved effective in many information retrieval tasks. In the proposed framework, we attempt to tackle the problem of the abundance of terminologies by constructing ranking models, which focus on not only retrieving the most relevant documents, but also diversifying the searching results to increase the completeness of the resulting list for a given query. In the model training, we propose two novel document labeling strategies, and combine several traditional retrieval models as learning features. Besides, we also investigate the usefulness of different learning to rank approaches in our framework. Experimental results on TREC Genomics datasets demonstrate the effectiveness of our framework for biomedical information retrieval.
A data-management system for detailed areal interpretive data
Ferrigno, C.F.
1986-01-01
A data storage and retrieval system has been developed to organize and preserve areal interpretive data. This system can be used by any study where there is a need to store areal interpretive data that generally is presented in map form. This system provides the capability to grid areal interpretive data for input to groundwater flow models at any spacing and orientation. The data storage and retrieval system is designed to be used for studies that cover small areas such as counties. The system is built around a hierarchically structured data base consisting of related latitude-longitude blocks. The information in the data base can be stored at different levels of detail, with the finest detail being a block of 6 sec of latitude by 6 sec of longitude (approximately 0.01 sq mi). This system was implemented on a mainframe computer using a hierarchical data base management system. The computer programs are written in Fortran IV and PL/1. The design and capabilities of the data storage and retrieval system, and the computer programs that are used to implement the system are described. Supplemental sections contain the data dictionary, user documentation of the data-system software, changes that would need to be made to use this system for other studies, and information on the computer software tape. (Lantz-PTT)
Preparing a collection of radiology examinations for distribution and retrieval.
Demner-Fushman, Dina; Kohli, Marc D; Rosenman, Marc B; Shooshan, Sonya E; Rodriguez, Laritza; Antani, Sameer; Thoma, George R; McDonald, Clement J
2016-03-01
Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/). Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
Clinician search behaviors may be influenced by search engine design.
Lau, Annie Y S; Coiera, Enrico; Zrimec, Tatjana; Compton, Paul
2010-06-30
Searching the Web for documents using information retrieval systems plays an important part in clinicians' practice of evidence-based medicine. While much research focuses on the design of methods to retrieve documents, there has been little examination of the way different search engine capabilities influence clinician search behaviors. Previous studies have shown that use of task-based search engines allows for faster searches with no loss of decision accuracy compared with resource-based engines. We hypothesized that changes in search behaviors may explain these differences. In all, 75 clinicians (44 doctors and 31 clinical nurse consultants) were randomized to use either a resource-based or a task-based version of a clinical information retrieval system to answer questions about 8 clinical scenarios in a controlled setting in a university computer laboratory. Clinicians using the resource-based system could select 1 of 6 resources, such as PubMed; clinicians using the task-based system could select 1 of 6 clinical tasks, such as diagnosis. Clinicians in both systems could reformulate search queries. System logs unobtrusively capturing clinicians' interactions with the systems were coded and analyzed for clinicians' search actions and query reformulation strategies. The most frequent search action of clinicians using the resource-based system was to explore a new resource with the same query, that is, these clinicians exhibited a "breadth-first" search behaviour. Of 1398 search actions, clinicians using the resource-based system conducted 401 (28.7%, 95% confidence interval [CI] 26.37-31.11) in this way. In contrast, the majority of clinicians using the task-based system exhibited a "depth-first" search behavior in which they reformulated query keywords while keeping to the same task profiles. Of 585 search actions conducted by clinicians using the task-based system, 379 (64.8%, 95% CI 60.83-68.55) were conducted in this way. This study provides evidence that different search engine designs are associated with different user search behaviors.
User-Centered Indexing for Adaptive Information Access
NASA Technical Reports Server (NTRS)
Chen, James R.; Mathe, Nathalie
1996-01-01
We are focusing on information access tasks characterized by large volume of hypermedia connected technical documents, a need for rapid and effective access to familiar information, and long-term interaction with evolving information. The problem for technical users is to build and maintain a personalized task-oriented model of the information to quickly access relevant information. We propose a solution which provides user-centered adaptive information retrieval and navigation. This solution supports users in customizing information access over time. It is complementary to information discovery methods which provide access to new information, since it lets users customize future access to previously found information. It relies on a technique, called Adaptive Relevance Network, which creates and maintains a complex indexing structure to represent personal user's information access maps organized by concepts. This technique is integrated within the Adaptive HyperMan system, which helps NASA Space Shuttle flight controllers organize and access large amount of information. It allows users to select and mark any part of a document as interesting, and to index that part with user-defined concepts. Users can then do subsequent retrieval of marked portions of documents. This functionality allows users to define and access personal collections of information, which are dynamically computed. The system also supports collaborative review by letting users share group access maps. The adaptive relevance network provides long-term adaptation based both on usage and on explicit user input. The indexing structure is dynamic and evolves over time. Leading and generalization support flexible retrieval of information under similar concepts. The network is geared towards more recent information access, and automatically manages its size in order to maintain rapid access when scaling up to large hypermedia space. We present results of simulated learning experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crain, Steven P.; Yang, Shuang-Hong; Zha, Hongyuan
Access to health information by consumers is ham- pered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, com- mon and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations us- ing consumermore » questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.« less
Indexing and Retrieval for the Web.
ERIC Educational Resources Information Center
Rasmussen, Edie M.
2003-01-01
Explores current research on indexing and ranking as retrieval functions of search engines on the Web. Highlights include measuring search engine stability; evaluation of Web indexing and retrieval; Web crawlers; hyperlinks for indexing and ranking; ranking for metasearch; document structure; citation indexing; relevance; query evaluation;…
Theoretical and Philosophical Aspects of Knowledge Management (SIG KM).
ERIC Educational Resources Information Center
Day, Ronald E.
2000-01-01
This session abstract discusses the history, philosophy, and theories of knowledge management to better understand its social and organizational potentials and limitations. Topics include determinacy of sense, information retrieval, and the Data Retrieval Model versus the Document Retrieval Model; discussions about knowledge; and surplus…
Advanced Feedback Methods in Information Retrieval.
ERIC Educational Resources Information Center
Salton, G.; And Others
1985-01-01
In this study, automatic feedback techniques are applied to Boolean query statements in online information retrieval to generate improved query statements based on information contained in previously retrieved documents. Feedback operations are carried out using conventional Boolean logic and extended logic. Experimental output is included to…
On-Demand Associative Cross-Language Information Retrieval
NASA Astrophysics Data System (ADS)
Geraldo, André Pinto; Moreira, Viviane P.; Gonçalves, Marcos A.
This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.
Deep Borehole Field Test Requirements and Controlled Assumptions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hardin, Ernest
2015-07-01
This document presents design requirements and controlled assumptions intended for use in the engineering development and testing of: 1) prototype packages for radioactive waste disposal in deep boreholes; 2) a waste package surface handling system; and 3) a subsurface system for emplacing and retrieving packages in deep boreholes. Engineering development and testing is being performed as part of the Deep Borehole Field Test (DBFT; SNL 2014a). This document presents parallel sets of requirements for a waste disposal system and for the DBFT, showing the close relationship. In addition to design, it will also inform planning for drilling, construction, and scientificmore » characterization activities for the DBFT. The information presented here follows typical preparations for engineering design. It includes functional and operating requirements for handling and emplacement/retrieval equipment, waste package design and emplacement requirements, borehole construction requirements, sealing requirements, and performance criteria. Assumptions are included where they could impact engineering design. Design solutions are avoided in the requirements discussion. Deep Borehole Field Test Requirements and Controlled Assumptions July 21, 2015 iv ACKNOWLEDGEMENTS This set of requirements and assumptions has benefited greatly from reviews by Gordon Appel, Geoff Freeze, Kris Kuhlman, Bob MacKinnon, Steve Pye, David Sassani, Dave Sevougian, and Jiann Su.« less
Waters, Keith P; Zuber, Alexandra; Willy, Rankesh M; Kiriinya, Rose N; Waudo, Agnes N; Oluoch, Tom; Kimani, Francis M; Riley, Patricia L
2013-09-01
Countries worldwide are challenged by health worker shortages, skill mix imbalances, and maldistribution. Human resources information systems (HRIS) are used to monitor and address these health workforce issues, but global understanding of such systems is minimal and baseline information regarding their scope and capability is practically non-existent. The Kenya Health Workforce Information System (KHWIS) has been identified as a promising example of a functioning HRIS. The objective of this paper is to document the impact of KHWIS data on human resources policy, planning and management. Sources for this study included semi-structured interviews with senior officials at Kenya's Ministry of Medical Services (MOMS), Ministry of Public Health and Sanitation (MOPHS), the Department of Nursing within MOMS, the Nursing Council of Kenya, Kenya Medical Practitioners and Dentists Board, Kenya's Clinical Officers Council, and Kenya Medical Laboratory Technicians and Technologists Board. Additionally, quantitative data were extracted from KHWIS databases to supplement the interviews. Health sector policy documents were retrieved from MOMS and MOPHS websites, and reviewed to assess whether they documented any changes to policy and practice as having been impacted by KHWIS data. Interviews with Kenyan government and regulatory officials cited health workforce data provided by KHWIS influenced policy, regulation, and management. Policy changes include extension of Kenya's age of mandatory civil service retirement from 55 to 60 years. Data retrieved from KHWIS document increased relicensing of professional nurses, midwives, medical practitioners and dentists, and interviewees reported this improved compliance raised professional regulatory body revenues. The review of Government records revealed few references to KHWIS; however, documentation specifically cited the KHWIS as having improved the availability of human resources for health information regarding workforce planning, management, and development. KHWIS data have impacted a range of improvements in health worker regulation, human resources management, and workforce policy and planning at Kenya's ministries of health. Published by Elsevier Ireland Ltd.
Welter, Petra; Riesmeier, Jörg; Fischer, Benedikt; Grouls, Christoph; Kuhl, Christiane; Deserno, Thomas M
2011-01-01
It is widely accepted that content-based image retrieval (CBIR) can be extremely useful for computer-aided diagnosis (CAD). However, CBIR has not been established in clinical practice yet. As a widely unattended gap of integration, a unified data concept for CBIR-based CAD results and reporting is lacking. Picture archiving and communication systems and the workflow of radiologists must be considered for successful data integration to be achieved. We suggest that CBIR systems applied to CAD should integrate their results in a picture archiving and communication systems environment such as Digital Imaging and Communications in Medicine (DICOM) structured reporting documents. A sample DICOM structured reporting template adaptable to CBIR and an appropriate integration scheme is presented. The proposed CBIR data concept may foster the promulgation of CBIR systems in clinical environments and, thereby, improve the diagnostic process.
Riesmeier, Jörg; Fischer, Benedikt; Grouls, Christoph; Kuhl, Christiane; Deserno (né Lehmann), Thomas M
2011-01-01
It is widely accepted that content-based image retrieval (CBIR) can be extremely useful for computer-aided diagnosis (CAD). However, CBIR has not been established in clinical practice yet. As a widely unattended gap of integration, a unified data concept for CBIR-based CAD results and reporting is lacking. Picture archiving and communication systems and the workflow of radiologists must be considered for successful data integration to be achieved. We suggest that CBIR systems applied to CAD should integrate their results in a picture archiving and communication systems environment such as Digital Imaging and Communications in Medicine (DICOM) structured reporting documents. A sample DICOM structured reporting template adaptable to CBIR and an appropriate integration scheme is presented. The proposed CBIR data concept may foster the promulgation of CBIR systems in clinical environments and, thereby, improve the diagnostic process. PMID:21672913
Initial retrieval sequence and blending strategy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pemwell, D.L.; Grenard, C.E.
1996-09-01
This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.
A Computerized Library and Evaluation System for Integral Neutron Experiments.
ERIC Educational Resources Information Center
Hampel, Viktor E.; And Others
A computerized library of references to integral neutron experiments has been developed at the Lawrence Radiation Laboratory at Livermore. This library serves as a data base for the systematic retrieval of documents describing diverse critical and bulk nuclear experiments. The evaluation and reduction of the physical parameters of the experiments…
Recommendations for a Habitability Data Base.
ERIC Educational Resources Information Center
Illinois Univ., Urbana. Library Research Center.
A prototype Habitability Data Base was developed for the United States Army Corps of Engineers. From a review of selected Army documents, standards in the form of goals or architectural criteria were identified as significant to man-environment relations (MER). A search of appropriate information systems was conducted to retrieve a minimum of 500…
Sentence-Based Metadata: An Approach and Tool for Viewing Database Designs.
ERIC Educational Resources Information Center
Boyle, John M.; Gunge, Jakob; Bryden, John; Librowski, Kaz; Hanna, Hsin-Yi
2002-01-01
Describes MARS (Museum Archive Retrieval System), a research tool which enables organizations to exchange digital images and documents by means of a common thesaurus structure, and merge the descriptive data and metadata of their collections. Highlights include theoretical basis; searching the MARS database; and examples in European museums.…
A probabilistic NF2 relational algebra for integrated information retrieval and database systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fuhr, N.; Roelleke, T.
The integration of information retrieval (IR) and database systems requires a data model which allows for modelling documents as entities, representing uncertainty and vagueness and performing uncertain inference. For this purpose, we present a probabilistic data model based on relations in non-first-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Thus, the set of weighted index terms of a document are represented as a probabilistic subrelation. In a similar way, imprecise attribute values are modelled as a set-valued attribute. We redefine the relational operators for this type of relations such thatmore » the result of each operator is again a probabilistic NF2 relation, where the weight of a tuple gives the probability that this tuple belongs to the result. By ordering the tuples according to decreasing probabilities, the model yields a ranking of answers like in most IR models. This effect also can be used for typical database queries involving imprecise attribute values as well as for combinations of database and IR queries.« less
Chew, Peter A; Bader, Brett W
2012-10-16
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
Trustworthiness and relevance in web-based clinical question answering.
Cruchet, Sarah; Boyer, Célia; van der Plas, Lonneke
2012-01-01
Question answering systems try to give precise answers to a user's question posed in natural language. It is of utmost importance that the answers returned are relevant to the user's question. For clinical QA, the trustworthiness of answers is another important issue. Limiting the document collection to certified websites helps to improve the trustworthiness of answers. On the other hand, limited document collections are known to harm the relevancy of answers. We show, however, in a comparative evaluation, that promoting trustworthiness has no negative effect on the relevance of the retrieved answers in our clinical QA system. On the contrary, the answers found are in general more relevant.
Computer program CDCID: an automated quality control program using CDC update
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singer, G.L.; Aguilar, F.
1984-04-01
A computer program, CDCID, has been developed in coordination with a quality control program to provide a highly automated method of documenting changes to computer codes at EG and G Idaho, Inc. The method uses the standard CDC UPDATE program in such a manner that updates and their associated documentation are easily made and retrieved in various formats. The method allows each card image of a source program to point to the document which describes it, who created the card, and when it was created. The method described is applicable to the quality control of computer programs in general. Themore » computer program described is executable only on CDC computing systems, but the program could be modified and applied to any computing system with an adequate updating program.« less
Clustering document fragments using background color and texture information
NASA Astrophysics Data System (ADS)
Chanda, Sukalpa; Franke, Katrin; Pal, Umapada
2012-01-01
Forensic analysis of questioned documents sometimes can be extensively data intensive. A forensic expert might need to analyze a heap of document fragments and in such cases to ensure reliability he/she should focus only on relevant evidences hidden in those document fragments. Relevant document retrieval needs finding of similar document fragments. One notion of obtaining such similar documents could be by using document fragment's physical characteristics like color, texture, etc. In this article we propose an automatic scheme to retrieve similar document fragments based on visual appearance of document paper and texture. Multispectral color characteristics using biologically inspired color differentiation techniques are implemented here. This is done by projecting document color characteristics to Lab color space. Gabor filter-based texture analysis is used to identify document texture. It is desired that document fragments from same source will have similar color and texture. For clustering similar document fragments of our test dataset we use a Self Organizing Map (SOM) of dimension 5×5, where the document color and texture information are used as features. We obtained an encouraging accuracy of 97.17% from 1063 test images.
Seminal nanotechnology literature: a review.
Kostoff, Ronald N; Koytcheff, Raymond G; Lau, Clifford G Y
2009-11-01
This paper uses complementary text mining techniques to identify and retrieve the high impact (seminal) nanotechnology literature over a span of time. Following a brief scientometric analysis of the seminal articles retrieved, these seminal articles are then used as a basis for a comprehensive literature survey of nanoscience and nanotechnology. The paper ends with a global analysis of the relation of seminal nanotechnology document production to total nanotechnology document production.
Automatic natural acquisition of a semantic network for information retrieval systems
NASA Astrophysics Data System (ADS)
Enguehard, Chantal; Malvache, Pierre; Trigano, Philippe
1992-03-01
The amount of information is becoming greater and greater, in industries where complex processes are performed it is becoming increasingly difficult to profit from all the documents produced when fresh knowledge becomes available (reports, experiments, findings). This situation causes a considerable and expensive waste of precious time lost searching for documents or, quite simply, results in outright repeating what has been done. One solution is to transform all paper information into computerized information. We might imagine that we are in a science-fiction world and that we have the perfect computer. We tell it everything we know, we make it read all the books, and if we ask it any question, it will find the response if that response exists. But unfortunately, we are in the real world and the last four decades have taught us to minimize our expectations of computers. During the 1960s, the information retrieval systems appeared. Their purpose is to provide access to any desired documents, in response to a question about a subject, even if it is not known to exist. Here we focus on the problem of selecting items to index the documents. In 1966, Salton identified this problem as crucial when he saw that his system, Medlars, did not find a relevant text because of the wrong indexation. Faced with this problem, he imagined a guide to help authors choose the correct indexation, but he anticipated the automation of this operation with the SMART system. It was stated previously that a manual language analysis for information items by subjects experts is likely to prove impractical in the long run. After a brief survey of the existing responses to the index choice problem, we shall present the system automatic natural acquisition (ANA) which chooses items to index texts by using as little knowledge as possible- -just by learning the language. This system does not use any grammar or lexicon, so the selected indexes will be very close to the field concerned in the texts.
Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches.
Zeng, Yuqun; Liu, Xusheng; Wang, Yanshan; Shen, Feichen; Liu, Sijia; Rastegar-Mojarad, Majid; Wang, Liwei; Liu, Hongfang
2017-10-16
Self-management is crucial to diabetes care and providing expert-vetted content for answering patients' questions is crucial in facilitating patient self-management. The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients' questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums. ©Yuqun Zeng, Xusheng Liu, Yanshan Wang, Feichen Shen, Sijia Liu, Majid Rastegar Mojarad, Liwei Wang, Hongfang Liu. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 16.10.2017.
An overview of the evaluation plan for PC/MISI: PC-based Multiple Information System Interface
NASA Technical Reports Server (NTRS)
Dominick, Wayne D. (Editor); Lim, Bee Lee; Hall, Philip P.
1985-01-01
An initial evaluation plan for the personal computer multiple information system interface (PC/MISI) project is discussed. The document is intend to be used as a blueprint for the evaluation of this system. Each objective of the design project is discussed along with the evaluation parameters and methodology to be used in the evaluation of the implementation's achievement of those objectives. The potential of the system for research activities related to more general aspects of information retrieval is also discussed.
XDS-I Gateway Development for HIE Connectivity with Legacy PACS at Gil Hospital.
Simalango, Mikael Fernandus; Kim, Youngchul; Seo, Young Tae; Choi, Young Hwan; Cho, Yong Kyun
2013-12-01
The ability to support healthcare document sharing is imperative in a health information exchange (HIE). Sharing imaging documents or images, however, can be challenging, especially when they are stored in a picture archiving and communication system (PACS) archive that does not support document sharing via standard HIE protocols. This research proposes a standard-compliant imaging gateway that enables connectivity between a legacy PACS and the entire HIE. Investigation of the PACS solutions used at Gil Hospital was conducted. An imaging gateway application was then developed using a Java technology stack. Imaging document sharing capability enabled by the gateway was tested by integrating it into Gil Hospital's order communication system and its HIE infrastructure. The gateway can acquire radiology images from a PACS storage system, provide and register the images to Gil Hospital's HIE for document sharing purposes, and make the images retrievable by a cross-enterprise document sharing document viewer. Development of an imaging gateway that mediates communication between a PACS and an HIE can be considered a viable option when the PACS does not support the standard protocol for cross-enterprise document sharing for imaging. Furthermore, the availability of common HIE standards expedites the development and integration of the imaging gateway with an HIE.
XDS-I Gateway Development for HIE Connectivity with Legacy PACS at Gil Hospital
Simalango, Mikael Fernandus; Kim, Youngchul; Seo, Young Tae; Cho, Yong Kyun
2013-01-01
Objectives The ability to support healthcare document sharing is imperative in a health information exchange (HIE). Sharing imaging documents or images, however, can be challenging, especially when they are stored in a picture archiving and communication system (PACS) archive that does not support document sharing via standard HIE protocols. This research proposes a standard-compliant imaging gateway that enables connectivity between a legacy PACS and the entire HIE. Methods Investigation of the PACS solutions used at Gil Hospital was conducted. An imaging gateway application was then developed using a Java technology stack. Imaging document sharing capability enabled by the gateway was tested by integrating it into Gil Hospital's order communication system and its HIE infrastructure. Results The gateway can acquire radiology images from a PACS storage system, provide and register the images to Gil Hospital's HIE for document sharing purposes, and make the images retrievable by a cross-enterprise document sharing document viewer. Conclusions Development of an imaging gateway that mediates communication between a PACS and an HIE can be considered a viable option when the PACS does not support the standard protocol for cross-enterprise document sharing for imaging. Furthermore, the availability of common HIE standards expedites the development and integration of the imaging gateway with an HIE. PMID:24523994
Cross-language information retrieval using PARAFAC2.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bader, Brett William; Chew, Peter; Abdelali, Ahmed
A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language-independently', so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar tomore » other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the 'concepts' in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in cross-language information retrieval.« less
Experiments in Multi-Lingual Information Retrieval.
ERIC Educational Resources Information Center
Salton, Gerard
A comparison was made of the performance in an automatic information retrieval environment of user queries and document abstracts available in natural language form in both English and French. The results obtained indicate that the automatic indexing and retrieval techniques actually used appear equally effective in handling the query and document…
40 CFR 792.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2012 CFR
2012-07-01
....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 33 2012-07-01 2012-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
40 CFR 792.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2011 CFR
2011-07-01
....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 32 2011-07-01 2011-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
40 CFR 792.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2013 CFR
2013-07-01
....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 33 2013-07-01 2013-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
40 CFR 792.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2014 CFR
2014-07-01
....190 Storage and retrieval of records and data. (a) All raw data, documentation, records, protocols... 40 Protection of Environment 32 2014-07-01 2014-07-01 false Storage and retrieval of records and data. 792.190 Section 792.190 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED...
Pipelining Architecture of Indexing Using Agglomerative Clustering
NASA Astrophysics Data System (ADS)
Goyal, Deepika; Goyal, Deepti; Gupta, Parul
2010-11-01
The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.
Della Seta, Maurella; Sellitri, Cinzia
2004-01-01
The research project "Collection and dissemination of bioethical information through an integrated electronic system", started in 2001 by the Istituto Superiore di Sanità (ISS), had among its objectives, the realization of an integrated system for data collection and exchange of documents related to bioethics. The system should act as a reference tool for those research activities impacting on citizens' health and welfare. This paper aims at presenting some initiatives, developed in the project framework, in order to establish an Italian documentation network, among which: a) exchange of ISS publications with Italian institutions active in this field; b) survey through a questionnaire aimed at assessing Italian informative resources, state-of-the-art and holdings of documentation centres and ethical committees; c) Italian Internet resources analysis. The results of the survey, together with the analysis of web sites, show that at present in Italy there are many interesting initiatives for collecting and spreading of documentation in the bioethical fields, but there is an urgent need for an integration of such resources. Ethical committees generally speaking need a larger availability of documents, while there are good potentialities for the establishment of an electronic network for document retrieval and delivery.
GSC configuration management plan
NASA Technical Reports Server (NTRS)
Withers, B. Edward
1990-01-01
The tools and methods used for the configuration management of the artifacts (including software and documentation) associated with the Guidance and Control Software (GCS) project are described. The GCS project is part of a software error studies research program. Three implementations of GCS are being produced in order to study the fundamental characteristics of the software failure process. The Code Management System (CMS) is used to track and retrieve versions of the documentation and software. Application of the CMS for this project is described and the numbering scheme is delineated for the versions of the project artifacts.
DOE Office of Scientific and Technical Information (OSTI.GOV)
DEXTER, M.L.
1999-11-15
This document serves as a notice of construction (NOC) pursuant to the requirements of Washington Administrative Code (WAC) 246 247-060, and as a request for approval to modify pursuant to 40 Code of Federal Regulations (CFR) 61 07 for the installation and operation of one waste retrieval system in the 24 1 AP-102 Tank and one waste retrieval system in the 241 AP 104 Tank Pursuant to 40 CFR 61 09 (a)( 1) this application is also intended to provide anticipated initial start up notification Its is requested that EPA approval of this application will also constitute EPA acceptance ofmore » the initial start up notification Project W 211 Initial Tank Retrieval Systems (ITRS) is scoped to install a waste retrieval system in the following double-shell tanks 241-AP 102-AP 104 AN 102, AN 103, AN-104, AN 105, AY 102 AZ 102 and SY-102 between now and the year 2011. Because of the extended installation schedules and unknowns about specific activities/designs at each tank, it was decided to submit NOCs as that information became available This NOC covers the installation and operation of a waste retrieval system in tanks 241 AP-102 and 241 AP 104 Generally this includes removal of existing equipment installation of new equipment and construction of new ancillary equipment and buildings Tanks 241 AP 102 and 241 AP 104 will provide waste feed for immobilization into a low activity waste (LAW) product (i.e. glass logs) The total effective dose equivalent (TEDE) to the offsite maximally exposed individual (MEI) from the construction activities is 0 045 millirem per year The unabated TEDE to the offsite ME1 from operation of the mixer pumps is 0 042 millirem per year.« less
Document Ranking Based upon Markov Chains.
ERIC Educational Resources Information Center
Danilowicz, Czeslaw; Balinski, Jaroslaw
2001-01-01
Considers how the order of documents in information retrieval responses are determined and introduces a method that uses a probabilistic model of a document set where documents are regarded as states of a Markov chain and where transition probabilities are directly proportional to similarities between documents. (Author/LRW)
A model for enhancing Internet medical document retrieval with "medical core metadata".
Malet, G; Munoz, F; Appleyard, R; Hersh, W
1999-01-01
Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and MEDLINE-type content descriptions. The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines.
A Model for Enhancing Internet Medical Document Retrieval with “Medical Core Metadata”
Malet, Gary; Munoz, Felix; Appleyard, Richard; Hersh, William
1999-01-01
Objective: Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. Design: The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and Medline-type content descriptions. Results: The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. Conclusions: The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines. PMID:10094069
Using Crowdsourced Geospatial Data to Aid in Nuclear Proliferation Monitoring
2016-12-01
M. Stephens, and Ronald D. Bonnell, “DAI for Document Retrieval: The MINDS Project,” in Distributed Artificial Intelligence , ed. Michael N. Huhns...Ronald D. Bonnell. “DAI for Document Retrieval: The MINDS Project,” In Distributed Artificial Intelligence , edited by Michael N. Huhns, 249–283...was for the director of National Intelligence to explore ways that crowdsourced geospatial imagery technologies could aid existing governmental
Intelligent search and retrieval of a large multimedia knowledgebase for the Hubble Space Telescope
NASA Technical Reports Server (NTRS)
Clapis, Paul J.; Byers, William S.
1990-01-01
A document-retrieval assistant (DRA) in a microcomputer format is described which incorporates hypertext and natural language capabilities. Hypertext is used to introduce an intelligent search capability, and the natural-language interface permits access to specific data without the use of keywords. The DRA can be used to access and 'browse' the large multimedia database that is composed of project documentation from the HST.
ERIC Educational Resources Information Center
Reynnells, M. Louise, Comp.
This document lists 248 federal funding programs available to rural areas. The programs were selected from the Catalog of Federal Domestic Assistance, 1995, which is available online from the Federal Assistance Programs Retrieval System (FAPRS). Entries are listed under the following federal departments or agencies: Department of Agriculture,…
NASA Technical Reports Server (NTRS)
1997-01-01
This report summarizes work done under Cooperative Agreement (CA) on the following testbed projects: TERRIERS - The development of the ground systems to support the TERRIERS satellite mission at Boston University (BU). HSTS - The application of ARC's Heuristic Scheduling Testbed System (HSTS) to the EUVE satellite mission. SELMON - The application of NASA's Jet Propulsion Laboratory's (JPL) Selective Monitoring (SELMON) system to the EUVE satellite mission. EVE - The development of the EUVE Virtual Environment (EVE), a prototype three-dimensional (3-D) visualization environment for the EUVE satellite and its sensors, instruments, and communications antennae. FIDO - The development of the Fault-Induced Document Officer (FIDO) system, a prototype application to respond to anomalous conditions by automatically searching for, retrieving, and displaying relevant documentation for an operators use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
1986-02-01
This Environmental Assessment (EA) supports the DOE proposal to Congress to construct and operate a facility for monitored retrievable storage (MRS) of spent fuel at a site on the Clinch River in the Roane County portion of Oak Ridge, Tennessee. The first part of this document is an assessment of the value of, need for, and feasibility of an MRS facility as an integral component of the waste management system. The second part is an assessment and comparison of the potential environmental impacts projected for each of six site-design combinations. The MRS facility would be centrally located with respect tomore » existing reactors, and would receive and canister spent fuel in preparation for shipment to and disposal in a geologic repository. 207 refs., 57 figs., 132 tabs.« less
Knowledge Modeling in Prior Art Search
NASA Astrophysics Data System (ADS)
Graf, Erik; Frommholz, Ingo; Lalmas, Mounia; van Rijsbergen, Keith
This study explores the benefits of integrating knowledge representations in prior art patent retrieval. Key to the introduced approach is the utilization of human judgment available in the form of classifications assigned to patent documents. The paper first outlines in detail how a methodology for the extraction of knowledge from such an hierarchical classification system can be established. Further potential ways of integrating this knowledge with existing Information Retrieval paradigms in a scalable and flexible manner are investigated. Finally based on these integration strategies the effectiveness in terms of recall and precision is evaluated in the context of a prior art search task for European patents. As a result of this evaluation it can be established that in general the proposed knowledge expansion techniques are particularly beneficial to recall and, with respect to optimizing field retrieval settings, further result in significant precision gains.
Rapid automatic keyword extraction for information retrieval and analysis
Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA
2012-03-06
Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.
The astronomical data base and retrieval system at NASA
NASA Technical Reports Server (NTRS)
Mead, J. M.; Nagy, T. A.; Hill, R. S.; Warren, W. H., Jr.
1982-01-01
More than 250 machine-readable catalogs of stars and extended celestial objects are now available at the NASA/Goddard Space Flight Center (GSFC) as the result of over a decade of catalog acquisition, verification and documentation. Retrieval programs are described which permit the user to obtain from a remote terminal bibliographical listings for stars; to find all celestial objects from a given list that are within a defined angular separation from each object in another list; to plot celestial objects on overlays for sky survey plate areas; and to search selected catalogs for objects by criteria of position, identification number, magnitude or spectral type.
Application of portable CDA for secure clinical-document exchange.
Huang, Kuo-Hsuan; Hsieh, Sung-Huai; Chang, Yuan-Jen; Lai, Feipei; Hsieh, Sheau-Ling; Lee, Hsiu-Hui
2010-08-01
Health Level Seven (HL7) organization published the Clinical Document Architecture (CDA) for exchanging documents among heterogeneous systems and improving medical quality based on the design method in CDA. In practice, although the HL7 organization tried to make medical messages exchangeable, it is still hard to exchange medical messages. There are many issues when two hospitals want to exchange clinical documents, such as patient privacy, network security, budget, and the strategies of the hospital. In this article, we propose a method for the exchange and sharing of clinical documents in an offline model based on the CDA-the Portable CDA. This allows the physician to retrieve the patient's medical record stored in a portal device, but not through the Internet in real time. The security and privacy of CDA data will also be considered.
Boyer, C; Baujard, V; Scherrer, J R
2001-01-01
Any new user to the Internet will think that to retrieve the relevant document is an easy task especially with the wealth of sources available on this medium, but this is not the case. Even experienced users have difficulty formulating the right query for making the most of a search tool in order to efficiently obtain an accurate result. The goal of this work is to reduce the time and the energy necessary in searching and locating medical and health information. To reach this goal we have developed HONselect [1]. The aim of HONselect is not only to improve efficiency in retrieving documents but to respond to an increased need for obtaining a selection of relevant and accurate documents from a breadth of various knowledge databases including scientific bibliographical references, clinical trials, daily news, multimedia illustrations, conferences, forum, Web sites, clinical cases, and others. The authors based their approach on the knowledge representation using the National Library of Medicine's Medical Subject Headings (NLM, MeSH) vocabulary and classification [2,3]. The innovation is to propose a multilingual "one-stop searching" (one Web interface to databases currently in English, French and German) with full navigational and connectivity capabilities. The user may choose from a given selection of related terms the one that best suit his search, navigate in the term's hierarchical tree, and access directly to a selection of documents from high quality knowledge suppliers such as the MEDLINE database, the NLM's ClinicalTrials.gov server, the NewsPage's daily news, the HON's media gallery, conference listings and MedHunt's Web sites [4, 5, 6, 7, 8, 9]. HONselect, developed by HON, a non-profit organisation [10], is a free online available multilingual tool based on the MeSH thesaurus to index, select, retrieve and display accurate, up to date, high-level and quality documents.
Historical Note: The Past Thirty Years in Information Retrieval.
ERIC Educational Resources Information Center
Salton, Gerard
1987-01-01
Briefly reviews early work in documentation and text processing, and predictions that were made about the creative role of computers in information retrieval. An attempt is made to explain why these predictions were not fulfilled and conclusions are drawn regarding the limits of computer power in text retrieval applications. (Author/CLB)
Engineering a Multi-Purpose Test Collection for Web Retrieval Experiments.
ERIC Educational Resources Information Center
Bailey, Peter; Craswell, Nick; Hawking, David
2003-01-01
Describes a test collection that was developed as a multi-purpose testbed for experiments on the Web in distributed information retrieval, hyperlink algorithms, and conventional ad hoc retrieval. Discusses inter-server connectivity, integrity of server holdings, inclusion of documents related to a wide spread of likely queries, and distribution of…
21 CFR 58.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2013 CFR
2013-04-01
... 21 Food and Drugs 1 2013-04-01 2013-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2014 CFR
2014-04-01
... 21 Food and Drugs 1 2014-04-01 2014-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2012 CFR
2012-04-01
... 21 Food and Drugs 1 2012-04-01 2012-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 21 Food and Drugs 1 2010-04-01 2010-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
21 CFR 58.190 - Storage and retrieval of records and data.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 21 Food and Drugs 1 2011-04-01 2011-04-01 false Storage and retrieval of records and data. 58.190...) There shall be archives for orderly storage and expedient retrieval of all raw data, documentation... GENERAL GOOD LABORATORY PRACTICE FOR NONCLINICAL LABORATORY STUDIES Records and Reports § 58.190 Storage...
Logic-Based Retrieval: Technology for Content-Oriented and Analytical Querying of Patent Data
NASA Astrophysics Data System (ADS)
Klampanos, Iraklis Angelos; Wu, Hengzhi; Roelleke, Thomas; Azzam, Hany
Patent searching is a complex retrieval task. An initial document search is only the starting point of a chain of searches and decisions that need to be made by patent searchers. Keyword-based retrieval is adequate for document searching, but it is not suitable for modelling comprehensive retrieval strategies. DB-like and logical approaches are the state-of-the-art techniques to model strategies, reasoning and decision making. In this paper we present the application of logical retrieval to patent searching. The two grand challenges are expressiveness and scalability, where high degree of expressiveness usually means a loss in scalability. In this paper we report how to maintain scalability while offering the expressiveness of logical retrieval required for solving patent search tasks. We present logical retrieval background, and how to model data-source selection and results' fusion. Moreover, we demonstrate the modelling of a retrieval strategy, a technique by which patent professionals are able to express, store and exchange their strategies and rationales when searching patents or when making decisions. An overview of the architecture and technical details complement the paper, while the evaluation reports preliminary results on how query processing times can be guaranteed, and how quality is affected by trading off responsiveness.
Optimized model tuning in medical systems.
Kléma, Jirí; Kubalík, Jirí; Lhotská, Lenka
2005-12-01
In medical systems it is often advantageous to utilize specific problem situations (cases) in addition to or instead of a general model. Decisions are then based on relevant past cases retrieved from a case memory. The reliability of such decisions depends directly on the ability to identify cases of practical relevance to the current situation. This paper discusses issues of automated tuning in order to obtain a proper definition of mutual case similarity in a specific medical domain. The main focus is on a reasonably time-consuming optimization of the parameters that determine case retrieval and further utilization in decision making/ prediction. The two case studies - mortality prediction after cardiological intervention, and resource allocation at a spa - document that the optimization process is influenced by various characteristics of the problem domain.
2014-11-01
sematic type. Injury or Poisoning inpo T037 Anatomical Abnormality anab T190 Given a document D, a concept vector = {1, 2, … , ...integrating biomedical terminology . Nucleic acids research 32, Database issue (2004), 267–270. 5. Chapman, W.W., Hillert, D., Velupillai, S., et...Conference (TREC), (2011). 9. Koopman, B. and Zuccon, G. Understanding negation and family history to improve clinical information retrieval. Proceedings
Yang, X Jessie; Wickens, Christopher D; Park, Taezoon; Fong, Liesel; Siah, Kewin T H
2015-12-01
We aimed to examine the effects of information access cost and accountability on medical residents' information retrieval strategy and performance during prehandover preparation. Prior studies observing doctors' prehandover practices witnessed the use of memory-intensive strategies when retrieving patient information. These strategies impose potential threats to patient safety as human memory is prone to errors. Of interest in this work are the underlying determinants of information retrieval strategy and the potential impacts on medical residents' information preparation performance. A two-step research approach was adopted, consisting of semistructured interviews with 21 medical residents and a simulation-based experiment with 32 medical residents. The semistructured interviews revealed that a substantial portion of medical residents (38%) relied largely on memory for preparing handover information. The simulation-based experiment showed that higher information access cost reduced information access attempts and access duration on patient documents and harmed information preparation performance. Higher accountability led to marginally longer access to patient documents. It is important to understand the underlying determinants of medical residents' information retrieval strategy and performance during prehandover preparation. We noted the criticality of easy access to patient documents in prehandover preparation. In addition, accountability marginally influenced medical residents' information retrieval strategy. Findings from this research suggested that the cost of accessing information sources should be minimized in developing handover preparation tools. © 2015, Human Factors and Ergonomics Society.
NELS 2.0 - A general system for enterprise wide information management
NASA Technical Reports Server (NTRS)
Smith, Stephanie L.
1993-01-01
NELS, the NASA Electronic Library System, is an information management tool for creating distributed repositories of documents, drawings, and code for use and reuse by the aerospace community. The NELS retrieval engine can load metadata and source files of full text objects, perform natural language queries to retrieve ranked objects, and create links to connect user interfaces. For flexibility, the NELS architecture has layered interfaces between the application program and the stored library information. The session manager provides the interface functions for development of NELS applications. The data manager is an interface between session manager and the structured data system. The center of the structured data system is the Wide Area Information Server. This system architecture provides access to information across heterogeneous platforms in a distributed environment. There are presently three user interfaces that connect to the NELS engine; an X-Windows interface, and ASCII interface and the Spatial Data Management System. This paper describes the design and operation of NELS as an information management tool and repository.
Word Spotting for Indic Documents to Facilitate Retrieval
NASA Astrophysics Data System (ADS)
Bhardwaj, Anurag; Setlur, Srirangaraj; Govindaraju, Venu
With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script-specific keyword spotting for Devanagari documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script-independent keyword spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.
Acquisition plan for Digital Document Storage (DDS) prototype system
NASA Technical Reports Server (NTRS)
1990-01-01
NASA Headquarters maintains a continuing interest in and commitment to exploring the use of new technology to support productivity improvements in meeting service requirements tasked to the NASA Scientific and Technical Information (STI) Facility, and to support cost effective approaches to the development and delivery of enhanced levels of service provided by the STI Facility. The DDS project has been pursued with this interest and commitment in mind. It is believed that DDS will provide improved archival blowback quality and service for ad hoc requests for paper copies of documents archived and serviced centrally at the STI Facility. It will also develop an operating capability to scan, digitize, store, and reproduce paper copies of 5000 NASA technical reports archived annually at the STI Facility and serviced to the user community. Additionally, it will provide NASA Headquarters and field installations with on-demand, remote, electronic retrieval of digitized, bilevel, bit mapped report images along with branched, nonsequential retrieval of report subparts.
ERIC Educational Resources Information Center
Cornell Univ., Ithaca, NY. Dept. of Computer Science.
Part Two of the eighteenth report on Salton's Magical Automatic Retriever of Texts (SMART) project is composed of three papers: The first: "The Effect of Common Words and Synonyms on Retrieval Performance" by D. Bergmark discloses that removal of common words from the query and document vectors significantly increases precision and that…
An Information Retrieval and Recommendation System for Astronomical Observatories
NASA Astrophysics Data System (ADS)
Mukund, Nikhil; Thakur, Saurabh; Abraham, Sheelu; Aniyan, A. K.; Mitra, Sanjit; Sajeeth Philip, Ninan; Vaghmare, Kaustubh; Acharjya, D. P.
2018-03-01
We present a machine-learning-based information retrieval system for astronomical observatories that tries to address user-defined queries related to an instrument. In the modern instrumentation scenario where heterogeneous systems and talents are simultaneously at work, the ability to supply people with the right information helps speed up the tasks for detector operation, maintenance, and upgradation. The proposed method analyzes existing documented efforts at the site to intelligently group related information to a query and to present it online to the user. The user in response can probe the suggested content and explore previously developed solutions or probable ways to address the present situation optimally. We demonstrate natural language-processing-backed knowledge rediscovery by making use of the open source logbook data from the Laser Interferometric Gravitational Observatory (LIGO). We implement and test a web application that incorporates the above idea for LIGO Livingston, LIGO Hanford, and Virgo observatories.
ERIC Educational Resources Information Center
National Air Pollution Control Administration (DHEW), Raleigh, NC.
This two-part bibliography represents an effort to collect, condense, and organize the literature on the hydrocarbons in relation to air pollution. The approximately 2,300 documents abstracted are all included in the information storage and retrieval system of the National Air Pollution Control Administration's (NAPCA) Air Pollution Technical…
A User-Centered Approach to Adaptive Hypertext Based on an Information Relevance Model
NASA Technical Reports Server (NTRS)
Mathe, Nathalie; Chen, James
1994-01-01
Rapid and effective to information in large electronic documentation systems can be facilitated if information relevant in an individual user's content can be automatically supplied to this user. However most of this knowledge on contextual relevance is not found within the contents of documents, it is rather established incrementally by users during information access. We propose a new model for interactively learning contextual relevance during information retrieval, and incrementally adapting retrieved information to individual user profiles. The model, called a relevance network, records the relevance of references based on user feedback for specific queries and user profiles. It also generalizes such knowledge to later derive relevant references for similar queries and profiles. The relevance network lets users filter information by context of relevance. Compared to other approaches, it does not require any prior knowledge nor training. More importantly, our approach to adaptivity is user-centered. It facilitates acceptance and understanding by users by giving them shared control over the adaptation without disturbing their primary task. Users easily control when to adapt and when to use the adapted system. Lastly, the model is independent of the particular application used to access information, and supports sharing of adaptations among users.
Modeling and mining term association for improving biomedical information retrieval performance.
Hu, Qinmin; Huang, Jimmy Xiangji; Hu, Xiaohua
2012-06-11
The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.
Modeling and mining term association for improving biomedical information retrieval performance
2012-01-01
Background The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. Results We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. Conclusions First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages. PMID:22901087
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
An Abstraction-Based Data Model for Information Retrieval
NASA Astrophysics Data System (ADS)
McAllister, Richard A.; Angryk, Rafal A.
Language ontologies provide an avenue for automated lexical analysis that may be used to supplement existing information retrieval methods. This paper presents a method of information retrieval that takes advantage of WordNet, a lexical database, to generate paths of abstraction, and uses them as the basis for an inverted index structure to be used in the retrieval of documents from an indexed corpus. We present this method as a entree to a line of research on using ontologies to perform word-sense disambiguation and improve the precision of existing information retrieval techniques.
Web image retrieval using an effective topic and content-based technique
NASA Astrophysics Data System (ADS)
Lee, Ching-Cheng; Prabhakara, Rashmi
2005-03-01
There has been an exponential growth in the amount of image data that is available on the World Wide Web since the early development of Internet. With such a large amount of information and image available and its usefulness, an effective image retrieval system is thus greatly needed. In this paper, we present an effective approach with both image matching and indexing techniques that improvise on existing integrated image retrieval methods. This technique follows a two-phase approach, integrating query by topic and query by example specification methods. In the first phase, The topic-based image retrieval is performed by using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. This technique consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. In the second phase, we use query by example specification to perform a low-level content-based image match in order to retrieve smaller and relatively closer results of the example image. From this, information related to the image feature is automatically extracted from the query image. The main objective of our approach is to develop a functional image search and indexing technique and to demonstrate that better retrieval results can be achieved.
Moen, Hans; Ginter, Filip; Marsi, Erwin; Peltonen, Laura-Maria; Salakoski, Tapio; Salanterä, Sanna
2015-01-01
Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a--possibly unfinished--care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task.
2015-01-01
Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task. PMID:26099735
Tautomerism in chemical information management systems
NASA Astrophysics Data System (ADS)
Warr, Wendy A.
2010-06-01
Tautomerism has an impact on many of the processes in chemical information management systems including novelty checking during registration into chemical structure databases; storage of structures; exact and substructure searching in chemical structure databases; and depiction of structures retrieved by a search. The approaches taken by 27 different software vendors and database producers are compared. It is hoped that this comparison will act as a discussion document that could ultimately improve databases and software for researchers in the future.
Tank Waste Retrieval Lessons Learned at the Hanford Site
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dodd, R.A.
One of the environmental remediation challenges facing the nation is the retrieval and permanent disposal of approximately 90 million gallons of radioactive waste stored in underground tanks at the U. S. Department of Energy (DOE) facilities. The Hanford Site is located in southeastern Washington State and stores roughly 60 percent of this waste. An estimated 53 million gallons of high-level, transuranic, and low-level radioactive waste is stored underground in 149 single-shell tanks (SSTs) and 28 newer double-shell tanks (DSTs) at the Hanford Site. These SSTs range in size from 55,000 gallons to 1,000,000 gallon capacity. Approximately 30 million gallons ofmore » this waste is stored in SSTs. The SSTs were constructed between 1943 and 1964 and all have exceeded the nominal 20-year design life. Sixty-seven SSTs are known or suspected to have leaked an estimated 1,000,000 gallons of waste to the surrounding soil. The risk of additional SST leakage has been greatly reduced by removing more than 3 million gallons of interstitial liquids and supernatant and transferring this waste to the DST system. Retrieval of SST salt-cake and sludge waste is underway to further reduce risks and stage feed materials for the Hanford Site Waste Treatment Plant. Regulatory requirements for SST waste retrieval and tank farm closure are established in the Hanford Federal Facility Agreement and Consent Order (HFFACO), better known as the Tri- Party Agreement, or TPA. The HFFACO was signed by the DOE, the State of Washington Department of Ecology (Ecology), and U.S. Environmental Protection Agency (EPA) and requires retrieval of as much waste as technically possible, with waste residues not to exceed 360 ft{sup 3} in 530,000 gallon or larger tanks; 30 ft{sup 3} in 55,000 gallon or smaller tanks; or the limit of waste retrieval technology, whichever is less. If residual waste volume requirements cannot be achieved, then HFFACO Appendix H provisions can be invoked to request Ecology and EPA approval of an exception to the waste retrieval criteria for a specific tank. Tank waste retrieval has been conducted at the Hanford Site over the last few decades using a method referred to as Past Practice Hydraulic Sluicing. Past Practice Hydraulic Sluicing employs large volumes of DST supernatant and water to dislodge, dissolve, mobilize, and retrieve tank waste. Concern over the leak integrity of SSTs resulted in the need for tank waste retrieval methods capable of using smaller volumes of liquid in a more controlled manner. Retrieval of SST waste in accordance with HFFACO requirements was initiated at the Hanford Site in April 2003. New and innovative tank waste retrieval methods that minimize and control the use of liquids are being implemented for the first time. These tank waste retrieval methods replace Past Practice Hydraulic Sluicing and employ modified sluicing, vacuum retrieval, and in-tank vehicle techniques. Waste retrieval has been completed in seven Hanford Site SSTs (C-106, C-103, C-201, C-202, C-203, C-204, and S-112) in accordance with HFFACO requirements. Three additional tanks are currently in the process of being retrieved (C-108, C-109 and S-102) Preparation for retrieval of two additional SSTs (C-104 and C-110) is ongoing with retrieval operations forecasted to start in calendar year 2008. Tank C-106 was retrieved to a residual waste volume of 470 ft{sup 3} using oxalic acid dissolution and modified sluicing. An Appendix H exception request for Tank C-106 is undergoing review. Tank C-103 was retrieved to a residual volume of 351 ft{sup 3} using a modified sluicing technology. This approach was successful at reaching the TPA limits for this tank of less than 360 ft{sup 3}and the limits of the technology. Tanks C-201, C-202, C-203, and C-204 are smaller (55,000 gallon) tanks and waste removal was completed in accordance with HFFACO requirements using a vacuum retrieval system. Residual waste volumes in each of these four tanks were less than 25 ft{sup 3}. Tank S-112 retrieval was completed February 28, 2007, meeting the TPA Limits of less than 360 cu ft using salt-cake dissolution, modified sluicing, in-tank vehicle with high pressure water spray and caustic dissolution. Tanks C-108 and C-109 have been retrieved to 90% and 85% respectively. Modified sluicing was no longer effective at retrieving the remaining 5,000 to 10,000 gallons of residual. A Mobile Retrieval Tool (FoldTrac) is scheduled for installation early in 2008 to assist in breaking up chunks of waste and mobilizing the waste for transfer. Lessons learned from application of new tank waste retrieval methods are being documented and incorporated into future retrieval operations. They address all phases of retrieval including process design, equipment procurement and installation, supporting documentation, and system operations. Information is obtained through interviews with retrieval project personnel, focused workshops, review of problem evaluation requests, and evaluation of retrieval performance data. This paper presents current retrieval successes and lessons learned from retrieval of tank waste at the Hanford Site and discusses how this information is used to optimize retrieval system efficiency, improve overall cost effectiveness of retrieval operations, and ensure that HFFACO requirements are met. (authors)« less
Automatic indexing and retrieval of encounter-specific evidence for point-of-care support.
O'Sullivan, Dympna M; Wilk, Szymon A; Michalowski, Wojtek J; Farion, Ken J
2010-08-01
Evidence-based medicine relies on repositories of empirical research evidence that can be used to support clinical decision making for improved patient care. However, retrieving evidence from such repositories at local sites presents many challenges. This paper describes a methodological framework for automatically indexing and retrieving empirical research evidence in the form of the systematic reviews and associated studies from The Cochrane Library, where retrieved documents are specific to a patient-physician encounter and thus can be used to support evidence-based decision making at the point of care. Such an encounter is defined by three pertinent groups of concepts - diagnosis, treatment, and patient, and the framework relies on these three groups to steer indexing and retrieval of reviews and associated studies. An evaluation of the indexing and retrieval components of the proposed framework was performed using documents relevant for the pediatric asthma domain. Precision and recall values for automatic indexing of systematic reviews and associated studies were 0.93 and 0.87, and 0.81 and 0.56, respectively. Moreover, precision and recall for the retrieval of relevant systematic reviews and associated studies were 0.89 and 0.81, and 0.92 and 0.89, respectively. With minor modifications, the proposed methodological framework can be customized for other evidence repositories. Copyright 2010 Elsevier Inc. All rights reserved.
Spotting words in handwritten Arabic documents
NASA Astrophysics Data System (ADS)
Srihari, Sargur; Srinivasan, Harish; Babu, Pavithra; Bhole, Chetan
2006-01-01
The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an indexed (segmented) set of documents. A two-step approach is employed in performing the search: (1) prototype selection: the query is used to obtain a set of handwritten samples of that word from a known set of writers (these are the prototypes), and (2) word matching: the prototypes are used to spot each occurrence of those words in the indexed document database. A ranking is performed on the entire set of test word images-- where the ranking criterion is a similarity score between each prototype word and the candidate words based on global word shape features. A database of 20,000 word images contained in 100 scanned handwritten Arabic documents written by 10 different writers was used to study retrieval performance. Using five writers for providing prototypes and the other five for testing, using manually segmented documents, 55% precision is obtained at 50% recall. Performance increases as more writers are used for training.
Retrieval feedback in MEDLINE.
Srinivasan, P
1996-01-01
OBJECTIVE: To investigate a new approach for query expansion based on retrieval feedback. The first objective in this study was to examine alternative query-expansion methods within the same retrieval-feedback framework. The three alternatives proposed are: expansion on the MeSH query field alone, expansion on the free-text field alone, and expansion on both the MeSH and the free-text fields. The second objective was to gain further understanding of retrieval feedback by examining possible dependencies on relevant documents during the feedback cycle. DESIGN: Comparative study of retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a MEDLINE test collection of 75 queries and 2,334 MEDLINE citations. MEASUREMENTS: Retrieval effectivenesses of the original unexpanded and the alternative expanded queries were compared using 11-point-average precision scores (11-AvgP). These are averages of precision scores obtained at 11 standard recall points. RESULTS: All three expansion strategies significantly improved the original queries in terms of retrieval effectiveness. Expansion on MeSH alone was equivalent to expansion on both MeSH and the free-text fields. Expansion on the free-text field alone improved the queries significantly less than did the other two strategies. The second part of the study indicated that retrieval-feedback-based expansion yields significant performance improvements independent of the availability of relevant documents for feedback information. CONCLUSIONS: Retrieval feedback offers a robust procedure for query expansion that is most effective for MEDLINE when applied to the MeSH field. PMID:8653452
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic–there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions. PMID:27487242
NASA Astrophysics Data System (ADS)
Kwok, Ron; Kurtz, Nathan T.; Brucker, Ludovic; Ivanoff, Alvaro; Newman, Thomas; Farrell, Sinead L.; King, Joshua; Howell, Stephen; Webster, Melinda A.; Paden, John; Leuschen, Carl; MacGregor, Joseph A.; Richter-Menge, Jacqueline; Harbeck, Jeremy; Tschudi, Mark
2017-11-01
Since 2009, the ultra-wideband snow radar on Operation IceBridge (OIB; a NASA airborne mission to survey the polar ice covers) has acquired data in annual campaigns conducted during the Arctic and Antarctic springs. Progressive improvements in radar hardware and data processing methodologies have led to improved data quality for subsequent retrieval of snow depth. Existing retrieval algorithms differ in the way the air-snow (a-s) and snow-ice (s-i) interfaces are detected and localized in the radar returns and in how the system limitations are addressed (e.g., noise, resolution). In 2014, the Snow Thickness On Sea Ice Working Group (STOSIWG) was formed and tasked with investigating how radar data quality affects snow depth retrievals and how retrievals from the various algorithms differ. The goal is to understand the limitations of the estimates and to produce a well-documented, long-term record that can be used for understanding broader changes in the Arctic climate system. Here, we assess five retrieval algorithms by comparisons with field measurements from two ground-based campaigns, including the BRomine, Ozone, and Mercury EXperiment (BROMEX) at Barrow, Alaska; a field program by Environment and Climate Change Canada at Eureka, Nunavut; and available climatology and snowfall from ERA-Interim reanalysis. The aim is to examine available algorithms and to use the assessment results to inform the development of future approaches. We present results from these assessments and highlight key considerations for the production of a long-term, calibrated geophysical record of springtime snow thickness over Arctic sea ice.
Aquarius Salinity Retrieval Algorithm: Final Pre-Launch Version
NASA Technical Reports Server (NTRS)
Wentz, Frank J.; Le Vine, David M.
2011-01-01
This document provides the theoretical basis for the Aquarius salinity retrieval algorithm. The inputs to the algorithm are the Aquarius antenna temperature (T(sub A)) measurements along with a number of NCEP operational products and pre-computed tables of space radiation coming from the galaxy and sun. The output is sea-surface salinity and many intermediate variables required for the salinity calculation. This revision of the Algorithm Theoretical Basis Document (ATBD) is intended to be the final pre-launch version.
Neural networks for data mining electronic text collections
NASA Astrophysics Data System (ADS)
Walker, Nicholas; Truman, Gregory
1997-04-01
The use of neural networks in information retrieval and text analysis has primarily suffered from the issues of adequate document representation, the ability to scale to very large collections, dynamism in the face of new information and the practical difficulties of basing the design on the use of supervised training sets. Perhaps the most important approach to begin solving these problems is the use of `intermediate entities' which reduce the dimensionality of document representations and the size of documents collections to manageable levels coupled with the use of unsupervised neural network paradigms. This paper describes the issues, a fully configured neural network-based text analysis system--dataHARVEST--aimed at data mining text collections which begins this process, along with the remaining difficulties and potential ways forward.
Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Renganathan, Vinaitheerthan
2017-07-01
With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
An introduction to information retrieval: applications in genomics
Nadkarni, P M
2011-01-01
Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user’s query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics. PMID:12049181
ERIC Educational Resources Information Center
Atherton, Pauline; And Others
A single issue of Nuclear Science Abstracts, containing about 2,300 abstracts, was indexed by Universal Decimal Classification (UDC) using the Special Subject Edition of UDC for Nuclear Science and Technology. The descriptive cataloging and UDC-indexing records formed a computer-stored data base. A systematic random sample of 500 additional…
Portable exhausters POR-004 SKID B, POR-005 SKID C, POR-006 SKID D storage plan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, O.D.
1997-09-04
This document provides a storage plan for portable exhausters POR-004 SKID B, POR-005 SKID C, AND POR-006 SKID D. The exhausters will be stored until they are needed by the TWRS (Tank Waste Remediation Systems) Saltwell Pumping Program. The storage plan provides criteria for portable exhauster storage, periodic inspections during storage, and retrieval from storage.
IGDS/TRAP Interface Program (ITIP). Software Design Document
NASA Technical Reports Server (NTRS)
Jefferys, Steve; Johnson, Wendell
1981-01-01
The preliminary design of the IGDS/TRAP Interface Program (ITIP) is described. The ITIP is implemented on the PDP 11/70 and interfaces directly with the Interactive Graphics Design System and the Data Management and Retrieval System. The program provides an efficient method for developing a network flow diagram. Performance requirements, operational rquirements, and design requirements are discussed along with sources and types of input and destination and types of output. Information processing functions and data base requirements are also covered.
Ensemble methods with simple features for document zone classification
NASA Astrophysics Data System (ADS)
Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing
2012-01-01
Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
Deep Question Answering for protein annotation
Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick
2015-01-01
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ PMID:26384372
Deep Question Answering for protein annotation.
Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick
2015-01-01
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.
Space Communication Artificial Intelligence for Link Evaluation Terminal (SCAILET)
NASA Technical Reports Server (NTRS)
Shahidi, Anoosh K.; Schlegelmilch, Richard F.; Petrik, Edward J.; Walters, Jerry L.
1992-01-01
A software application to assist end-users of the high burst rate (HBR) link evaluation terminal (LET) for satellite communications is being developed. The HBR LET system developed at NASA Lewis Research Center is an element of the Advanced Communications Technology Satellite (ACTS) Project. The HBR LET is divided into seven major subsystems, each with its own expert. Programming scripts, test procedures defined by design engineers, set up the HBR LET system. These programming scripts are cryptic, hard to maintain and require a steep learning curve. These scripts were developed by the system engineers who will not be available for the end-users of the system. To increase end-user productivity a friendly interface needs to be added to the system. One possible solution is to provide the user with adequate documentation to perform the needed tasks. With the complexity of this system the vast amount of documentation needed would be overwhelming and the information would be hard to retrieve. With limited resources, maintenance is another reason for not using this form of documentation. An advanced form of interaction is being explored using current computer techniques. This application, which incorporates a combination of multimedia and artificial intelligence (AI) techniques to provided end-users with an intelligent interface to the HBR LET system, is comprised of an intelligent assistant, intelligent tutoring, and hypermedia documentation. The intelligent assistant and tutoring systems address the critical programming needs of the end-user.
An Experiment in Index Term Frequency
ERIC Educational Resources Information Center
Svenonius, Elaine
1972-01-01
The question is asked: Of index terms assigned to documents, which function most effectively in retrieval, the most used or popular terms, or those which are used relatively infrequently? The experiment is a retrieval experiment and uses the Cranfield-Salton data. (14 references) (Author)
Expert system for automatically correcting OCR output
NASA Astrophysics Data System (ADS)
Taghva, Kazem; Borsack, Julie; Condit, Allen
1994-03-01
This paper describes a new expert system for automatically correcting errors made by optical character recognition (OCR) devices. The system, which we call the post-processing system, is designed to improve the quality of text produced by an OCR device in preparation for subsequent retrieval from an information system. The system is composed of numerous parts: an information retrieval system, an English dictionary, a domain-specific dictionary, and a collection of algorithms and heuristics designed to correct as many OCR errors as possible. For the remaining errors that cannot be corrected, the system passes them on to a user-level editing program. This post-processing system can be viewed as part of a larger system that would streamline the steps of taking a document from its hard copy form to its usable electronic form, or it can be considered a stand alone system for OCR error correction. An earlier version of this system has been used to process approximately 10,000 pages of OCR generated text. Among the OCR errors discovered by this version, about 87% were corrected. We implement numerous new parts of the system, test this new version, and present the results.
WEBCAP: Web Scheduler for Distance Learning Multimedia Documents with Web Workload Considerations
ERIC Educational Resources Information Center
Habib, Sami; Safar, Maytham
2008-01-01
In many web applications, such as the distance learning, the frequency of refreshing multimedia web documents places a heavy burden on the WWW resources. Moreover, the updated web documents may encounter inordinate delays, which make it difficult to retrieve web documents in time. Here, we present an Internet tool called WEBCAP that can schedule…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
The Office of Civilian Radioactive Waste Management Systems Engineering Management Plan (OCRWM SEMP) specifies the technical management approach for the development of the waste management system, and specifies the approach for the development of each of the system elements -- the waste acceptance system, the transportation system, the Monitored Retrievable Storage (MRS) facility, and the mined geologic disposal system, which includes site characterization activity. The SEMP also delineates how systems engineering will be used by OCRWM to describe the system development process; it identifies responsibilities for its implementation, and specifies the minimum requirements for systems engineering. It also identifies themore » close interrelationship of system engineering and licensing processes. This SEMP, which is a combined OCRWM and M&O SEMP, is part of the top-level program documentation and is prepared in accordance with the direction provided in the Program Management System Manual (PMSM). The relationship of this document to other top level documents in the CRWMS document hierarchy is defined in the PMSM. A systems engineering management plan for each project, which specifies the actions to be taken in implementing systems engineering at the project level, shall be prepared by the respective project managers. [``Program`` refers to the CRWMS-wide activity and ``project`` refers to that level responsible for accomplishing the specific activities of that segment of the program.] The requirements for the project level SEMPs are addressed in Section 4.2.2.2. They represent the minimum set of requirements, and do not preclude the broadening of systems engineering activities to meet the specific needs of each project.« less
Toward intelligent information sysytem
NASA Astrophysics Data System (ADS)
Onodera, Natsuo
"Hypertext" means a concept of a novel computer-assisted tool for storage and retrieval of text information based on human association. Structure of knowledge in our idea processing is generally complicated and networked, but traditional paper documents merely express it in essentially linear and sequential forms. However, recent advances in work-station technology have allowed us to process easily electronic documents containing non-linear structure such as references or hierarchies. This paper describes concept, history and basic organization of hypertext, and shows the outline and features of existing main hypertext systems. Particularly, use of the hypertext database is illustrated by an example of Intermedia developed by Brown University.
Building a common pipeline for rule-based document classification.
Patterson, Olga V; Ginter, Thomas; DuVall, Scott L
2013-01-01
Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Silviera, D.J.; Aaberg, R.L.; Cushing, C.E.
This environmental document includes a discussion of the purpose of a monitored retrievable storage facility, a description of two facility design concepts (sealed storage cask and field drywell), a description of three reference sites (arid, warm-wet, and cold-wet), and a discussion and comparison of the impacts associated with each of the six site/concept combinations. This analysis is based on a 15,000-MTU storage capacity and a throughput rate of up to 1800 MTU per year.
New Term Weighting Formulas for the Vector Space Method in Information Retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chisholm, E.; Kolda, T.G.
The goal in information retrieval is to enable users to automatically and accurately find data relevant to their queries. One possible approach to this problem i use the vector space model, which models documents and queries as vectors in the term space. The components of the vectors are determined by the term weighting scheme, a function of the frequencies of the terms in the document or query as well as throughout the collection. We discuss popular term weighting schemes and present several new schemes that offer improved performance.
Knowledge-Sparse and Knowledge-Rich Learning in Information Retrieval.
ERIC Educational Resources Information Center
Rada, Roy
1987-01-01
Reviews aspects of the relationship between machine learning and information retrieval. Highlights include learning programs that extend from knowledge-sparse learning to knowledge-rich learning; the role of the thesaurus; knowledge bases; artificial intelligence; weighting documents; work frequency; and merging classification structures. (78…
Salter, Phia S; Kelley, Nicholas J; Molina, Ludwin E; Thai, Luyen T
2017-09-01
Photographs provide critical retrieval cues for personal remembering, but few studies have considered this phenomenon at the collective level. In this research, we examined the psychological consequences of visual attention to the presence (or absence) of racially charged retrieval cues within American racial segregation photographs. We hypothesised that attention to racial retrieval cues embedded in historical photographs would increase social justice concept accessibility. In Study 1, we recorded gaze patterns with an eye-tracker among participants viewing images that contained racial retrieval cues or were digitally manipulated to remove them. In Study 2, we manipulated participants' gaze behaviour by either directing visual attention toward racial retrieval cues, away from racial retrieval cues, or directing attention within photographs where racial retrieval cues were missing. Across Studies 1 and 2, visual attention to racial retrieval cues in photographs documenting historical segregation predicted social justice concept accessibility.
Text Mining in Biomedical Domain with Emphasis on Document Clustering
2017-01-01
Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048
Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts.
Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R; Bernstam, Elmer V; Cohen, Trevor
2012-01-01
The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.
Deterministic Binary Vectors for Efficient Automated Indexing of MEDLINE/PubMed Abstracts
Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R.; Bernstam, Elmer V.; Cohen, Trevor
2012-01-01
The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI. PMID:23304369
Documents, Dialogue and the Emergence of Tertiary Orality
ERIC Educational Resources Information Center
Turner, Deborah; Allen, Warren
2013-01-01
Introduction: This investigation opens with a description of why studying non-traditional, oral documents can inform efforts to extend traditional library and information science practices, of description, storage, and retrieval, to artefacts made available through emerging media. Method: This study extends the method used to identify a document,…
Phoenix: Service Oriented Architecture for Information Management - Abstract Architecture Document
2011-09-01
implementation logic and policy if and which Information Brokering and Repository Services the information is going to be forwarded to. These service chains...descriptions are going to be retrieved. Raised Exceptions: • Exception getConsumers(sessionTrack : SessionTrack, information : Information...that exetnd the usefullness of the IM system as a whole. • Client • Event Notification • Filter • Information Discovery • Security • Service
ERIC Educational Resources Information Center
Kristiansen, Rolf
This paper suggests means of merging educational ideas with new information and communication technologies to aid individuals with disabilities. New technologies discussed include microtechnology and integrated circuits, high speed processing and retrieval of information, and light-weight equipment, among others. New technologies can be used as…
2008-08-08
Ms. Cindy E. Moran Director for Network Services 8 August 2008 DISN Forecast to Industry Report Documentation Page Form ApprovedOMB No. 0704-0188...TITLE AND SUBTITLE DISN (Defense Information system Network ) Forecast to Industry 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER...Prescribed by ANSI Std Z39-18 2 2 Integrated DISN Services by 2016: A Solid Goal Network Aware Applications Common Storage & Retrieval Shared Long
Using Photogrammetry to Estimate Tank Waste Volumes from Video
DOE Office of Scientific and Technical Information (OSTI.GOV)
Field, Jim G.
Washington River Protection Solutions (WRPS) contracted with HiLine Engineering & Fabrication, Inc. to assess the accuracy of photogrammetry tools as compared to video Camera/CAD Modeling System (CCMS) estimates. This test report documents the results of using photogrammetry to estimate the volume of waste in tank 241-C-I04 from post-retrieval videos and results using photogrammetry to estimate the volume of waste piles in the CCMS test video.
NASA Astrophysics Data System (ADS)
Sigurdson, J.; Tagerud, J.
1986-05-01
A UNIDO publication about machine tools with automatic control discusses the following: (1) numerical control (NC) machine tool perspectives, definition of NC, flexible manufacturing systems, robots and their industrial application, research and development, and sensors; (2) experience in developing a capability in NC machine tools; (3) policy issues; (4) procedures for retrieval of relevant documentation from data bases. Diagrams, statistics, bibliography are included.
Land use and land cover digital data
Fegeas, Robin G.; Claire, Robert W.; Guptill, Stephen C.; Anderson, K. Eric; Hallam, Cheryl A.
1983-01-01
The discipline of cartography is undergoing a number of profound changesthat center on the emerging influence ofdigital manipulation and analysis ofdata for the preparation of cartographic materials and for use in geographic information systems. Operational requirements have led to the development by the USGS National Mapping Division of several documents that establish in-house digital cartographic standards. In an effort to fulfill lead agency requirements for promulgation of Federal standards in the earth sciences, the documents have been edited and assembled with explanatory text into a USGS Circular. This Circular describes some of the pertinent issues relative to digital cartographic data standards, documents the digital cartographic data standards currently in use within the USGS, and details the efforts of the USGS related to the definition of national digital cartographic data standards. It consists of several chapters; the first is a general overview, and each succeeding chapter is made up from documents that establish in-house standards for one of the various types of digital cartographic data currently produced. This chapter 895-E, describes the Geographic Information Retrieval and Analysis System that is used in conjunction with the USGS land use and land cover classification system to encode, edit, manipuate, and analyze land use and land cover digital data.
A medical digital library to support scenario and user-tailored information retrieval.
Chu, W W; Johnson, D B; Kangarloo, H
2000-06-01
Current large-scale information sources are designed to support general queries and lack the ability to support scenario-specific information navigation, gathering, and presentation. As a result, users are often unable to obtain desired specific information within a well-defined subject area. Today's information systems do not provide efficient content navigation, incremental appropriate matching, or content correlation. We are developing the following innovative technologies to remedy these problems: 1) scenario-based proxies, enabling the gathering and filtering of information customized for users within a pre-defined domain; 2) context-sensitive navigation and matching, providing approximate matching and similarity links when an exact match to a user's request is unavailable; 3) content correlation of documents, creating semantic links between documents and information sources; and 4) user models for customizing retrieved information and result presentation. A digital medical library is currently being constructed using these technologies to provide customized information for the user. The technologies are general in nature and can provide custom and scenario-specific information in many other domains (e.g., crisis management).
Enhancing biomedical text summarization using semantic relation extraction.
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
A Data Management System for International Space Station Simulation Tools
NASA Technical Reports Server (NTRS)
Betts, Bradley J.; DelMundo, Rommel; Elcott, Sharif; McIntosh, Dawn; Niehaus, Brian; Papasin, Richard; Mah, Robert W.; Clancy, Daniel (Technical Monitor)
2002-01-01
Groups associated with the design, operational, and training aspects of the International Space Station make extensive use of modeling and simulation tools. Users of these tools often need to access and manipulate large quantities of data associated with the station, ranging from design documents to wiring diagrams. Retrieving and manipulating this data directly within the simulation and modeling environment can provide substantial benefit to users. An approach for providing these kinds of data management services, including a database schema and class structure, is presented. Implementation details are also provided as a data management system is integrated into the Intelligent Virtual Station, a modeling and simulation tool developed by the NASA Ames Smart Systems Research Laboratory. One use of the Intelligent Virtual Station is generating station-related training procedures in a virtual environment, The data management component allows users to quickly and easily retrieve information related to objects on the station, enhancing their ability to generate accurate procedures. Users can associate new information with objects and have that information stored in a database.
Search and retrieval of office files using dBASE 3
NASA Technical Reports Server (NTRS)
Breazeale, W. L.; Talley, C. R.
1986-01-01
Described is a method of automating the office files retrieval process using a commercially available software package (dBASE III). The resulting product is a menu-driven computer program which requires no computer skills to operate. One part of the document is written for the potential user who has minimal computer experience and uses sample menu screens to explain the program; while a second part is oriented towards the computer literate individual and includes rather detailed descriptions of the methodology and search routines. Although much of the programming techniques are explained, this document is not intended to be a tutorial on dBASE III. It is hoped that the document will serve as a stimulus for other applications of dBASE III.
The Profile-Query Relationship.
ERIC Educational Resources Information Center
Shepherd, Michael A.; Phillips, W. J.
1986-01-01
Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…
A Bayesian Approach to Interactive Retrieval
ERIC Educational Resources Information Center
Tague, Jean M.
1973-01-01
A probabilistic model for interactive retrieval is presented. Bayesian statistical decision theory principles are applied: use of prior and sample information about the relationship of document descriptions to query relevance; maximization of expected value of a utility function, to the problem of optimally restructuring search strategies in an…
Topics in Semantic Representation
ERIC Educational Resources Information Center
Griffiths, Thomas L.; Steyvers, Mark; Tenenbaum, Joshua B.
2007-01-01
Processing language requires the retrieval of concepts from memory in response to an ongoing stream of information. This retrieval is facilitated if one can infer the gist of a sentence, conversation, or document and use that gist to predict related concepts and disambiguate words. This article analyzes the abstract computational problem…
Context-sensitive medical information retrieval.
Auerbuch, Mordechai; Karson, Tom H; Ben-Ami, Benjamin; Maimon, Oded; Rokach, Lior
2004-01-01
Substantial medical data such as pathology reports, operative reports, discharge summaries, and radiology reports are stored in textual form. Databases containing free-text medical narratives often need to be searched to find relevant information for clinical and research purposes. Terms that appear in these documents tend to appear in different contexts. The con-text of negation, a negative finding, is of special importance, since many of the most frequently described findings are those denied by the patient or subsequently "ruled out." Hence, when searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the retrieved documents will be irrelevant. The purpose of this work is to develop a methodology for automated learning of negative context patterns in medical narratives and test the effect of context identification on the performance of medical information retrieval. The algorithm presented significantly improves the performance of information retrieval done on medical narratives. The precision im-proves from about 60%, when using context-insensitive retrieval, to nearly 100%. The impact on recall is only minor. In addition, context-sensitive queries enable the user to search for terms in ways not otherwise available
On the use of the singular value decomposition for text retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Husbands, P.; Simon, H.D.; Ding, C.
2000-12-04
The use of the Singular Value Decomposition (SVD) has been proposed for text retrieval in several recent works. This technique uses the SVD to project very high dimensional document and query vectors into a low dimensional space. In this new space it is hoped that the underlying structure of the collection is revealed thus enhancing retrieval performance. Theoretical results have provided some evidence for this claim and to some extent experiments have confirmed this. However, these studies have mostly used small test collections and simplified document models. In this work we investigate the use of the SVD on large documentmore » collections. We show that, if interpreted as a mechanism for representing the terms of the collection, this technique alone is insufficient for dealing with the variability in term occurrence. Section 2 introduces the text retrieval concepts necessary for our work. A short description of our experimental architecture is presented in Section 3. Section 4 describes how term occurrence variability affects the SVD and then shows how the decomposition influences retrieval performance. A possible way of improving SVD-based techniques is presented in Section 5 and concluded in Section 6.« less
Feature extraction for document text using Latent Dirichlet Allocation
NASA Astrophysics Data System (ADS)
Prihatini, P. M.; Suryawan, I. K.; Mandia, IN
2018-01-01
Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.
Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai
2017-03-01
The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
Névéol, Aurélie; Zeng, Kelly; Bodenreider, Olivier
2006-01-01
Objective This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. Materials and methods The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Results Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. Conclusions The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis. PMID:17238409
Neveol, Aurélie; Zeng, Kelly; Bodenreider, Olivier
2006-01-01
This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.
Implementation of a thesaurus in an electronic photograph imaging system
NASA Astrophysics Data System (ADS)
Partlow, Denise
1995-11-01
A photograph imaging system presents a unique set of requirements for indexing and retrieving images, unlike a standard imaging system for written documents. This paper presents the requirements, technical design, and development results for a hierarchical ANSI standard thesaurus embedded into a photograph archival system. The thesaurus design incorporates storage reduction techniques, permits fast searches, and contains flexible indexing methods. It can be extended to many applications other than the retrieval of photographs. When photographic images are indexed into an electronic system, they are subject to a variety of indexing problems based on what the indexer `sees.' For instance, the indexer may categorize an image as a boat when others might refer to it as a ship, sailboat, or raft. The thesaurus will allow a user to locate images containing any synonym for boat, regardless of how the image was actually indexed. In addition to indexing problems, photos may need to be retrieved based on a broad category, for instance, flowers. The thesaurus allows a search for `flowers' to locate all images containing a rose, hibiscus, or daisy, yet still allow a specific search for an image containing only a rose. The technical design and method of implementation for such a thesaurus is presented. The thesaurus is implemented using an SQL relational data base management system that supports blobs, binary large objects. The design incorporates unique compression methods for storing the thesaurus words. Words are indexed to photographs using the compressed word and allow for very rapid searches, eliminating lengthy string matches.
A Compositional Relevance Model for Adaptive Information Retrieval
NASA Technical Reports Server (NTRS)
Mathe, Nathalie; Chen, James; Lu, Henry, Jr. (Technical Monitor)
1994-01-01
There is a growing need for rapid and effective access to information in large electronic documentation systems. Access can be facilitated if information relevant in the current problem solving context can be automatically supplied to the user. This includes information relevant to particular user profiles, tasks being performed, and problems being solved. However most of this knowledge on contextual relevance is not found within the contents of documents, and current hypermedia tools do not provide any easy mechanism to let users add this knowledge to their documents. We propose a compositional relevance network to automatically acquire the context in which previous information was found relevant. The model records information on the relevance of references based on user feedback for specific queries and contexts. It also generalizes such information to derive relevant references for similar queries and contexts. This model lets users filter information by context of relevance, build personalized views of documents over time, and share their views with other users. It also applies to any type of multimedia information. Compared to other approaches, it is less costly and doesn't require any a priori statistical computation, nor an extended training period. It is currently being implemented into the Computer Integrated Documentation system which enables integration of various technical documents in a hypertext framework.
Title list of documents made publicly available, November 1-30, 1995
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1996-01-01
The Title List of Documents Made Publicly Available is a monthly publication. It contains descriptions of the information received and generated by the U.S. Nuclear Regulatory Commission (NRC). This information includes (1) docketed material associated with civilian nuclear power plants and other uses of radioactive materials and (2) nondocketed material received and generated by NRC pertinent to its role as a regulatory agency. As used here, docketed does not refer to Court dockets; it refers to the system by which NRC maintains its regulatory records. This series of documents is indexed by a Personal Author Index, a Corporate Source Indexmore » and a Report Number Index. The docketed information contained in the Title List includes the information formerly issued through the Department of Energy publication Power Reactor Docker Information, last published in January 1979. NRC documents that are publicly available may be examined without charge at the NRC Public Document Room (PDR). Duplicate copies may be obtained for a fee. Standing orders for certain categories of documents are also available. Clients may search for and order desired titles through the PDR computerized Bibliographic Retrieval System, which is accessible both at the PDR and remotely. The PDR is staffed by professional technical librarians, who provide reference assistance to users.« less
ERIC Educational Resources Information Center
Lane, Sean M.; Roussel, Cristine C.; Villa, Diane; Morita, Shelby K.
2007-01-01
Three experiments explored the issue of whether enhanced metamnemonic knowledge at retrieval can improve participants' ability to make difficult source discriminations in the context of the eyewitness suggestibility paradigm. The 1st experiment documented differences in phenomenal experience between veridical and false memories. Experiment 2…
NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.
ERIC Educational Resources Information Center
Zhou, Lina; Zhang, Dongsong
2003-01-01
Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…
The Negative Testing and Negative Generation Effects Are Eliminated by Delay
ERIC Educational Resources Information Center
Mulligan, Neil W.; Peterson, Daniel J.
2015-01-01
Although retrieval often enhances subsequent memory (the testing effect), a negative testing effect has recently been documented in which prior retrieval harms later recall compared with restudying. The negative testing effect was predicated on the negative generation effect and the item-specific-relational framework. The present experiments…
Subject Retrieval from Full-Text Databases in the Humanities
ERIC Educational Resources Information Center
East, John W.
2007-01-01
This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…
Duplicate document detection in DocBrowse
NASA Astrophysics Data System (ADS)
Chalana, Vikram; Bruce, Andrew G.; Nguyen, Thien
1998-04-01
Duplicate documents are frequently found in large databases of digital documents, such as those found in digital libraries or in the government declassification effort. Efficient duplicate document detection is important not only to allow querying for similar documents, but also to filter out redundant information in large document databases. We have designed three different algorithm to identify duplicate documents. The first algorithm is based on features extracted from the textual content of a document, the second algorithm is based on wavelet features extracted from the document image itself, and the third algorithm is a combination of the first two. These algorithms are integrated within the DocBrowse system for information retrieval from document images which is currently under development at MathSoft. DocBrowse supports duplicate document detection by allowing (1) automatic filtering to hide duplicate documents, and (2) ad hoc querying for similar or duplicate documents. We have tested the duplicate document detection algorithms on 171 documents and found that text-based method has an average 11-point precision of 97.7 percent while the image-based method has an average 11- point precision of 98.9 percent. However, in general, the text-based method performs better when the document contains enough high-quality machine printed text while the image- based method performs better when the document contains little or no quality machine readable text.
GENESIS: GPS Environmental and Earth Science Information System
NASA Technical Reports Server (NTRS)
Hajj, George
1999-01-01
This presentation reviews the GPS ENvironmental and Earth Science Information System (GENESIS). The objectives of GENESIS are outlined (1) Data Archiving, searching and distribution for science data products derived from Space borne TurboRogue Space Receivers for GPS science and other ground based GPS receivers, (2) Data browsing using integrated visualization tools, (3) Interactive web/java-based data search and retrieval, (4) Data subscription service, (5) Data migration from existing GPS archived data, (6) On-line help and documentation, and (7) participation in the WP-ESIP federation. The presentation reviews the products and services of Genesis, and the technology behind the system.
Buried waste integrated demonstration human engineered control station. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1994-09-01
This document describes the Human Engineered Control Station (HECS) project activities including the conceptual designs. The purpose of the HECS is to enhance the effectiveness and efficiency of remote retrieval by providing an integrated remote control station. The HECS integrates human capabilities, limitations, and expectations into the design to reduce the potential for human error, provides an easy system to learn and operate, provides an increased productivity, and reduces the ultimate investment in training. The overall HECS consists of the technology interface stations, supporting engineering aids, platform (trailer), communications network (broadband system), and collision avoidance system.
Effective Web and Desktop Retrieval with Enhanced Semantic Spaces
NASA Astrophysics Data System (ADS)
Daoud, Amjad M.
We describe the design and implementation of the NETBOOK prototype system for collecting, structuring and efficiently creating semantic vectors for concepts, noun phrases, and documents from a corpus of free full text ebooks available on the World Wide Web. Automatic generation of concept maps from correlated index terms and extracted noun phrases are used to build a powerful conceptual index of individual pages. To ensure scalabilty of our system, dimension reduction is performed using Random Projection [13]. Furthermore, we present a complete evaluation of the relative effectiveness of the NETBOOK system versus the Google Desktop [8].
Flowpath evaluation and reconnaissance by remote field Eddy current testing (FERRET)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smoak, A.E.; Zollinger, W.T.
1993-12-31
This document describes the design and development of FERRET (Flowpath Evaluation and Reconnaisance by Remote-field Eddy current Testing). FERRET is a system for inspecting the steel pipes which carry cooling water to underground nuclear waste storage tanks. The FERRET system has been tested in a small scale cooling pipe mock-up, an improved full scale mock-up, and in flaw detection experiments. Early prototype designs of FERRET and the FERRET launcher (a device which inserts, moves, and retrieves probes from a piping system) as well as the field-ready design are discussed.
A novel word spotting method based on recurrent neural networks.
Frinken, Volkmar; Fischer, Andreas; Manmatha, R; Bunke, Horst
2012-02-01
Keyword spotting refers to the process of retrieving all instances of a given keyword from a document. In the present paper, a novel keyword spotting method for handwritten documents is described. It is derived from a neural network-based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm in conjunction with a recurrent neural network. We demonstrate that the proposed systems outperform not only a classical dynamic time warping-based approach but also a modern keyword spotting system, based on hidden Markov models. Furthermore, we analyze the performance of the underlying neural networks when using them in a recognition task followed by keyword spotting on the produced transcription. We point out the advantages of keyword spotting when compared to classic text line recognition.
Building a national electronic medical record exchange system - experiences in Taiwan.
Li, Yu-Chuan Jack; Yen, Ju-Chuan; Chiu, Wen-Ta; Jian, Wen-Shan; Syed-Abdul, Shabbir; Hsu, Min-Huei
2015-08-01
There are currently 501 hospitals and about 20,000 clinics in Taiwan. The National Health Insurance (NHI) system, which is operated by the NHI Administration, uses a single-payer system and covers 99.9% of the nation's total population of 23,000,000. Taiwan's NHI provides people with a high degree of freedom in choosing their medical care options. However, there is the potential concern that the available medical resources will be overused. The number of doctor consultations per person per year is about 15. Duplication of laboratory tests and prescriptions are not rare either. Building an electronic medical record exchange system is a good method of solving these problems and of improving continuity in health care. In November 2009, Taiwan's Executive Yuan passed the 'Plan for accelerating the implementation of electronic medical record systems in medical institutions' (2010-2012; a 3-year plan). According to this plan, a patient can, at any hospital in Taiwan, by using his/her health insurance IC card and physician's medical professional IC card, upon signing a written agreement, retrieve all important medical records for the past 6 months from other participating hospitals. The focus of this plan is to establish the National Electronic Medical Record Exchange Centre (EEC). A hospital's information system will be connected to the EEC through an electronic medical record (EMR) gateway. The hospital will convert the medical records for the past 6 months in its EMR system into standardized files and save them on the EMR gateway. The most important functions of the EEC are to generate an index of all the XML files on the EMR gateways of all hospitals, and to provide search and retrieval services for hospitals and clinics. The EEC provides four standard inter-institution EMR retrieval services covering medical imaging reports, laboratory test reports, discharge summaries, and outpatient records. In this system, we adopted the Health Level 7 (HL7) Clinical Document Architecture (CDA) standards to generate clinical documents and Integrating the Healthcare Enterprise (IHE) Cross-enterprise Document Sharing (XDS) profile for the communication infrastructure. By December of 2014, the number of hospitals that provide an inter-institution EMR exchange service had reached 321. Hospitals that had not joined the service were all smaller ones with less than 100 beds. Inter-institution EMR exchange can make it much easier for people to access their own medical records, reduce the waste of medical resources, and improve the quality of medical care. The implementation of an inter-institution EMR exchange system faces many challenges. This article provides Taiwan's experiences as a reference. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1977-06-01
The pilot plant is developed for ERDA low-level contact-handled transuranic waste, ERDA remote-handled intermediate-level transuranic waste, and for high-level waste experiments. All wastes placed in the WIPP arrive at the site processed and packaged; no waste processing is done at the WIPP. All wastes placed into the WIPP are retrievable. The proposed site for WIPP lies 26 miles east of Carlsbad, New Mexico. This document includes the executive summary and a detailed description of the facilities and systems. (DLC)
Full-field optical coherence tomography used for security and document identity
NASA Astrophysics Data System (ADS)
Chang, Shoude; Mao, Youxin; Sherif, Sherif; Flueraru, Costel
2006-09-01
The optical coherence tomography (OCT) is an emerging technology for high-resolution cross-sectional imaging of 3D structures. In the past years, OCT systems have been used mainly for medical, especially ophthalmological diagnostics. Concerning the nature of OCT system being capable to explore the internal features of an object, we apply the OCT technology to directly retrieve the 2D information pre-stored in a multiple-layer information carrier. The standard depth-resolution of an OCT system is at micrometer level. If a 20mm by 20mm sampling area with a 1024 x 1024 CCD array is used in the OCT system having 10 μm, an information carrier having a volume of 20mm x 20mm x 2mm could contain 200 Mega-pixel images. Because of its tiny size and large information volume, the information carrier, with its OCT retrieving system, will have potential applications in documents security and object identification. In addition, as the information carrier can be made by low-scattering transparent material, the signal/noise ratio will be improved dramatically. As a consequence, the specific hardware and complicated software can also be greatly simplified. Owing to non-scanning along X-Y axis, the full-field OCT could be the simplest and most economic imaging system for extracting information from such a multilayer information carrier. In this paper, deign and implementation of a full-field OCT system is described and the related algorithms are introduced. In our experiments, a four layers information carrier is used, which contains 4 layers of image pattern, two text images and two fingerprint images. The extracted tomography images of each layer are also provided.
Do People Experience Cognitive Biases while Searching for Information?
Lau, Annie Y.S.; Coiera, Enrico W.
2007-01-01
Objective To test whether individuals experience cognitive biases whilst searching using information retrieval systems. Biases investigated are anchoring, order, exposure and reinforcement. Design A retrospective analysis and a prospective experiment were conducted to investigate whether cognitive biases affect the way that documentary evidence is interpreted while searching online. The retrospective analysis was conducted on the search and decision behaviors of 75 clinicians (44 doctors, 31 nurses), answering questions for 8 clinical scenarios within 80 minutes in a controlled setting. The prospective study was conducted on 227 undergraduate students, who used the same search engine to answer two of six randomly assigned consumer health questions. Measurements Frequencies of correct answers pre- and post- search, and confidence in answers were collected. The impact of reading a document on the final decision was measured by the population likelihood ratio (LR) of the frequency of reading the document and the frequency of obtaining a correct answer. Documents with a LR > 1 were most likely to be associated with a correct answer, and those with a LR < 1 were most likely to be associated with an incorrect answer to a question. Agreement between a subject and the evidence they read was estimated by a concurrence rate, which measured the frequency that subjects’ answers agreed with the likelihood ratios of a group of documents, normalized for document order, time exposure or reinforcement through repeated access. Serial position curves were plotted for the relationship between subjects’ pre-search confidence, document order, the number of times and length of time a document was accessed, and concurrence with post-search answers. Chi-square analyses tested for the presence of biases, and the Kolmogorov-Smirnov test checked for equality of distribution of evidence in the comparison populations. Results A person’s prior belief (anchoring) has a significant impact on their post-search answer (retrospective: P < 0.001; prospective: P < 0.001). Documents accessed at different positions in a search session (order effect [retrospective: P = 0.76; prospective: P = 0.026]), and documents processed for different lengths of time (exposure effect [retrospective: P = 0.27; prospective: P = 0.0081]) also influenced decision post-search more than expected in the prospective experiment but not in the retrospective analysis. Reinforcement through repeated exposure to a document did not yield statistical differences in decision outcome post-search (retrospective: P = 0.31; prospective: P = 0.81). Conclusion People may experience anchoring, exposure and order biases while searching for information, and these biases may influence the quality of decision making during and after the use of information retrieval systems. PMID:17600097
An XML Data Model for Inverted Image Indexing
NASA Astrophysics Data System (ADS)
So, Simon W.; Leung, Clement H. C.; Tse, Philip K. C.
2003-01-01
The Internet world makes increasing use of XML-based technologies. In multimedia data indexing and retrieval, the MPEG-7 standard for Multimedia Description Scheme is specified using XML. The flexibility of XML allows users to define other markup semantics for special contexts, construct data-centric XML documents, exchange standardized data between computer systems, and present data in different applications. In this paper, the Inverted Image Indexing paradigm is presented and modeled using XML Schema.
EM-31 RETRIEVAL KNOWLEDGE CENTER MEETING REPORT: MOBILIZE AND DISLODGE TANK WASTE HEELS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fellinger, A.
2010-02-16
The Retrieval Knowledge Center sponsored a meeting in June 2009 to review challenges and gaps to retrieval of tank waste heels. The facilitated meeting was held at the Savannah River Research Campus with personnel broadly representing tank waste retrieval knowledge at Hanford, Savannah River, Idaho, and Oak Ridge. This document captures the results of this meeting. In summary, it was agreed that the challenges to retrieval of tank waste heels fell into two broad categories: (1) mechanical heel waste retrieval methodologies and equipment and (2) understanding and manipulating the heel waste (physical, radiological, and chemical characteristics) to support retrieval optionsmore » and subsequent processing. Recent successes and lessons from deployments of the Sand and Salt Mantis vehicles as well as retrieval of C-Area tanks at Hanford were reviewed. Suggestions to address existing retrieval approaches that utilize a limited set of tools and techniques are included in this report. The meeting found that there had been very little effort to improve or integrate the multiple proven or new techniques and tools available into a menu of available methods for rapid insertion into baselines. It is recommended that focused developmental efforts continue in the two areas underway (low-level mixing evaluation and pumping slurries with large solid materials) and that projects to demonstrate new/improved tools be launched to outfit tank farm operators with the needed tools to complete tank heel retrievals effectively and efficiently. This document describes the results of a meeting held on June 3, 2009 at the Savannah River Site in South Carolina to identify technology gaps and potential technology solutions to retrieving high-level waste (HLW) heels from waste tanks within the complex of sites run by the U. S. Department of Energy (DOE). The meeting brought together personnel with extensive tank waste retrieval knowledge from DOE's four major waste sites - Hanford, Savannah River, Idaho, and Oak Ridge. The meeting was arranged by the Retrieval Knowledge Center (RKC), which is a technology development project sponsored by the Office of Technology Innovation & Development - formerly the Office of Engineering and Technology - within the DOE Office of Environmental Management (EM).« less
Regional information guidance system based on hypermedia concept
NASA Astrophysics Data System (ADS)
Matoba, Hiroshi; Hara, Yoshinori; Kasahara, Yutako
1990-08-01
A regional information guidance system has been developed on an image workstation. Two main features of this system are hypermedia data structure and friendly visual interface realized by the full-color frame memory system. As the hypermedia data structure manages regional information such as maps, pictures and explanations of points of interest, users can retrieve those information one by one, next to next according to their interest change. For example, users can retrieve explanation of a picture through the link between pictures and text explanations. Users can also traverse from one document to another by using keywords as cross reference indices. The second feature is to utilize a full-color, high resolution and wide space frame memory for visual interface design. This frame memory system enables real-time operation of image data and natural scene representation. The system also provides half tone representing function which enables fade-in/out presentations. This fade-in/out functions used in displaying and erasing menu and image data, makes visual interface soft for human eyes. The system we have developed is a typical example of multimedia applications. We expect the image workstation will play an important role as a platform for multimedia applications.
A strategy for electronic dissemination of NASA Langley technical publications
NASA Technical Reports Server (NTRS)
Roper, Donna G.; Mccaskill, Mary K.; Holland, Scott D.; Walsh, Joanne L.; Nelson, Michael L.; Adkins, Susan L.; Ambur, Manjula Y.; Campbell, Bryan A.
1994-01-01
To demonstrate NASA Langley Research Center's relevance and to transfer technology to external customers in a timely and efficient manner, Langley has formed a working group to study and recommend a course of action for the electronic dissemination of technical reports (EDTR). The working group identified electronic report requirements (e.g., accessibility, file format, search requirements) of customers in U.S. industry through numerous site visits and personal contacts. Internal surveys were also used to determine commonalities in document preparation methods. From these surveys, a set of requirements for an electronic dissemination system was developed. Two candidate systems were identified and evaluated against the set of requirements: the Full-Text Electronic Documents System (FEDS), which is a full-text retrieval system based on the commercial document management package Interleaf, and the Langley Technical Report Server (LTRS), which is a Langley-developed system based on the publicly available World Wide Web (WWW) software system. Factors that led to the selection of LTRS as the vehicle for electronic dissemination included searching and viewing capability, current system operability, and client software availability for multiple platforms at no cost to industry. This report includes the survey results, evaluations, a description of the LTRS architecture, recommended policy statement, and suggestions for future implementations.
Generating Models of Surgical Procedures using UMLS Concepts and Multiple Sequence Alignment
Meng, Frank; D’Avolio, Leonard W.; Chen, Andrew A.; Taira, Ricky K.; Kangarloo, Hooshang
2005-01-01
Surgical procedures can be viewed as a process composed of a sequence of steps performed on, by, or with the patient’s anatomy. This sequence is typically the pattern followed by surgeons when generating surgical report narratives for documenting surgical procedures. This paper describes a methodology for semi-automatically deriving a model of conducted surgeries, utilizing a sequence of derived Unified Medical Language System (UMLS) concepts for representing surgical procedures. A multiple sequence alignment was computed from a collection of such sequences and was used for generating the model. These models have the potential of being useful in a variety of informatics applications such as information retrieval and automatic document generation. PMID:16779094
STREAM Table Program: User's manual and program document
NASA Technical Reports Server (NTRS)
Hiles, K. H.
1981-01-01
This program was designed to be an editor for the Lewis Chemical Equilibrium program input files and is used for storage, manipulation and retrieval of the large amount of data required. The files are based on the facility name, case number, and table number. The data is easily recalled by supplying the sheet number to be displayed. The retrieval basis is a sheet defined to be all of the individual flow streams which comprise a given portion of a coal gasification system. A sheet may cover more than one page of output tables. The program allows for the insertion of a new table, revision of existing tables, deletion of existing tables, or the printing of selected tables. No calculations are performed. Only pointers are used to keep track of the data.
Evaluating Documents Reference Service and the Implications for Improvement.
ERIC Educational Resources Information Center
Parker, June D.
1996-01-01
Presents an evaluation of reference services and government document use at East Carolina University (North Carolina) library. Factors that most affect retrieval success include cataloging and technical problems, the amount of time spent in searching, and staff knowledge. (Author/AEF)
NASA Astrophysics Data System (ADS)
Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan
2010-10-01
The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
Technological Imperatives: Using Computers in Academic Debate.
ERIC Educational Resources Information Center
Ticku, Ravinder; Phelps, Greg
Intended for forensic educators and debate teams, this document details how one university debate team, at the University of Iowa, makes use of computer resources on campus to facilitate storage and retrieval of information useful to debaters. The introduction notes the problem of storing and retrieving the amount of information required by debate…
1999 Leak Detection and Monitoring and Mitigation Strategy Update
DOE Office of Scientific and Technical Information (OSTI.GOV)
OHL, P.C.
This document is a complete revision of WHC-SD-WM-ES-378, Rev 1. This update includes recent developments in Leak Detection, Leak Monitoring, and Leak Mitigation technologies, as well as, recent developments in single-shell tank retrieval technologies. In addition, a single-shell tank retrieval release protection strategy is presented.
Indexing and retrieving DICOM data in disperse and unstructured archives.
Costa, Carlos; Freitas, Filipe; Pereira, Marco; Silva, Augusto; Oliveira, José L
2009-01-01
This paper proposes an indexing and retrieval solution to gather information from distributed DICOM documents by allowing searches and access to the virtual data repository using a Google-like process. The medical imaging modalities are becoming more powerful and less expensive. The result is the proliferation of equipment acquisition by imaging centers, including the small ones. With this dispersion of data, it is not easy to take advantage of all the information that can be retrieved from these studies. Furthermore, many of these small centers do not have large enough requirements to justify the acquisition of a traditional PACS. A peer-to-peer PACS platform to index and query DICOM files over a set of distributed repositories that are logically viewed as a single federated unit. The solution is based on a public domain document-indexing engine and extends traditional PACS query and retrieval mechanisms. This proposal deals well with complex searching requirements, from a single desktop environment to distributed scenarios. The solution performance and robustness were demonstrated in trials. The characteristics of presented PACS platform make it particularly important for small institutions, including educational and research groups.
PACS and electronic health records
NASA Astrophysics Data System (ADS)
Cohen, Simona; Gilboa, Flora; Shani, Uri
2002-05-01
Electronic Health Record (EHR) is a major component of the health informatics domain. An important part of the EHR is the medical images obtained over a patient's lifetime and stored in diverse PACS. The vision presented in this paper is that future medical information systems will convert data from various medical sources -- including diverse modalities, PACS, HIS, CIS, RIS, and proprietary systems -- to HL7 standard XML documents. Then, the various documents are indexed and compiled to EHRs, upon which complex queries can be posed. We describe the conversion of data retrieved from PACS systems through DICOM to HL7 standard XML documents. This enables the EHR system to answer queries such as 'Get all chest images of patients at the age of 20-30, that have blood type 'A' and are allergic to pine trees', which a single PACS cannot answer. The integration of data from multiple sources makes our approach capable of delivering such answers. It enables the correlation of medical, demographic, clinical, and even genetic information. In addition, by fully indexing all the tagged data in DICOM objects, it becomes possible to offer access to huge amounts of valuable data, which can be better exploited in the specific radiology domain.
Space environmental effects on spacecraft: LEO materials selection guide, part 2
NASA Astrophysics Data System (ADS)
Silverman, Edward M.
1995-08-01
This document provides performance properties on major spacecraft materials and subsystems that have been exposed to the low-Earth orbit (LEO) space environment. Spacecraft materials include metals, polymers, composites, white and black paints, thermal-control blankets, adhesives, and lubricants. Spacecraft subsystems include optical components, solar cells, and electronics. Information has been compiled from LEO short-term spaceflight experiments (e.g., space shuttle) and from retrieved satellites of longer mission durations (e.g., Long Duration Exposure Facility). Major space environment effects include atomic oxygen (AO), ultraviolet radiation, micrometeoroids and debris, contamination, and particle radiation. The main objective of this document is to provide a decision tool to designers for designing spacecraft and structures. This document identifies the space environments that will affect the performance of materials and components, e.g., thermal-optical property changes of paints due to UV exposures, AO-induced surface erosion of composites, dimensional changes due to thermal cycling, vacuum-induced moisture outgassing, and surface optical changes due to AO/UV exposures. Where appropriate, relationships between the space environment and the attendant material/system effects are identified. Part 2 covers thermal control systems, power systems, optical components, electronic systems, and applications.
Space environmental effects on spacecraft: LEO materials selection guide, part 2
NASA Technical Reports Server (NTRS)
Silverman, Edward M.
1995-01-01
This document provides performance properties on major spacecraft materials and subsystems that have been exposed to the low-Earth orbit (LEO) space environment. Spacecraft materials include metals, polymers, composites, white and black paints, thermal-control blankets, adhesives, and lubricants. Spacecraft subsystems include optical components, solar cells, and electronics. Information has been compiled from LEO short-term spaceflight experiments (e.g., space shuttle) and from retrieved satellites of longer mission durations (e.g., Long Duration Exposure Facility). Major space environment effects include atomic oxygen (AO), ultraviolet radiation, micrometeoroids and debris, contamination, and particle radiation. The main objective of this document is to provide a decision tool to designers for designing spacecraft and structures. This document identifies the space environments that will affect the performance of materials and components, e.g., thermal-optical property changes of paints due to UV exposures, AO-induced surface erosion of composites, dimensional changes due to thermal cycling, vacuum-induced moisture outgassing, and surface optical changes due to AO/UV exposures. Where appropriate, relationships between the space environment and the attendant material/system effects are identified. Part 2 covers thermal control systems, power systems, optical components, electronic systems, and applications.
A simple procedure for retrieval of a cement-retained implant-supported crown: a case report.
Buzayan, Muaiyed Mahmoud; Mahmood, Wan Adida; Yunus, Norsiah Binti
2014-02-01
Retrieval of cement-retained implant prostheses can be more demanding than retrieval of screw-retained prostheses. This case report describes a simple and predictable procedure to locate the abutment screw access openings of cementretained implant-supported crowns in cases of fractured ceramic veneer. A conventional periapical radiography image was captured using a digital camera, transferred to a computer, and manipulated using Microsoft Word document software to estimate the location of the abutment screw access.
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
2003-02-01
FEB 2003 2. REPORT TYPE 3. DATES COVERED 00-00-2003 to 00-00-2003 4. TITLE AND SUBTITLE The Effect of Bilingual Term List Size on Dictionary ...298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina...are extensively used as a resource for dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written
Machine Translation-Supported Cross-Language Information Retrieval for a Consumer Health Resource
Rosemblat, Graciela; Gemoets, Darren; Browne, Allen C.; Tse, Tony
2003-01-01
The U.S. National Institutes of Health, through its National Library of Medicine, developed ClinicalTrials.gov to provide the public with easy access to information on clinical trials on a wide range of conditions or diseases. Only English language information retrieval is currently supported. Given the growing number of Spanish speakers in the U.S. and their increasing use of the Web, we anticipate a significant increase in Spanish-speaking users. This study compares the effectiveness of two common cross-language information retrieval methods using machine translation, query translation versus document translation, using a subset of genuine user queries from ClinicalTrials.gov. Preliminary results conducted with the ClinicalTrials.gov search engine show that in our environment, query translation is statistically significantly better than document translation. We discuss possible reasons for this result and we conclude with suggestions for future work. PMID:14728236
Friedman, Carol; Hripcsak, George; Shagina, Lyuda; Liu, Hongfang
1999-01-01
Objective: To design a document model that provides reliable and efficient access to clinical information in patient reports for a broad range of clinical applications, and to implement an automated method using natural language processing that maps textual reports to a form consistent with the model. Methods: A document model that encodes structured clinical information in patient reports while retaining the original contents was designed using the extensible markup language (XML), and a document type definition (DTD) was created. An existing natural language processor (NLP) was modified to generate output consistent with the model. Two hundred reports were processed using the modified NLP system, and the XML output that was generated was validated using an XML validating parser. Results: The modified NLP system successfully processed all 200 reports. The output of one report was invalid, and 199 reports were valid XML forms consistent with the DTD. Conclusions: Natural language processing can be used to automatically create an enriched document that contains a structured component whose elements are linked to portions of the original textual report. This integrated document model provides a representation where documents containing specific information can be accurately and efficiently retrieved by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted. Using an XML model of tagging provides an additional benefit in that software tools that manipulate XML documents are readily available. PMID:9925230
Hierarchic Agglomerative Clustering Methods for Automatic Document Classification.
ERIC Educational Resources Information Center
Griffiths, Alan; And Others
1984-01-01
Considers classifications produced by application of single linkage, complete linkage, group average, and word clustering methods to Keen and Cranfield document test collections, and studies structure of hierarchies produced, extent to which methods distort input similarity matrices during classification generation, and retrieval effectiveness…
Project W-320, 241-C-106 sluicing HVAC calculations, Volume 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, J.W.
1998-08-07
This supporting document has been prepared to make the FDNW calculations for Project W-320, readily retrievable. The report contains the following calculations: Exhaust airflow sizing for Tank 241-C-106; Equipment sizing and selection recirculation fan; Sizing high efficiency mist eliminator; Sizing electric heating coil; Equipment sizing and selection of recirculation condenser; Chiller skid system sizing and selection; High efficiency metal filter shielding input and flushing frequency; and Exhaust skid stack sizing and fan sizing.
Research highlights of the global modeling and simulation branch for 1986-1987
NASA Technical Reports Server (NTRS)
Baker, Wayman (Editor); Susskind, Joel (Editor); Pfaendtner, James (Editor); Randall, David (Editor); Atlas, Robert (Editor)
1988-01-01
This document provides a summary of the research conducted in the Global Modeling and Simulation Branch and highlights the most significant accomplishments in 1986 to 1987. The Branch has been the focal point for global weather and climate prediction research in the Laboratory for Atmospheres through the retrieval and use of satellite data, the development of global models and data assimilation techniques, the simulation of future observing systems, and the performance of atmospheric diagnostic studies.
Clinical dashboards: impact on workflow, care quality, and patient safety.
Egan, Marie
2006-01-01
There is a vast array of technical data that is continuously generated within the intensive care unit environment. In addition to physiological monitors, there is information being captured by the ventilator, intravenous infusion pumps, medication dispensing units, and even the patient's bed. The ability to retrieve and synchronize data is essential for both clinical documentation and real-time problem solving for individual patients and the intensive care unit population as a whole. Technical advances that permit the integration of all relevant data into a singular display or "dashboard" may improve staff efficiency, accelerate decisions, streamline workflow processes, and reduce oversights and errors in clinical practice. Critical care nurses must coordinate all aspects of care for one or more patients. Clinical data are constantly being retrieved, documented, analyzed, and communicated to others, all within the daily routine of nursing care. In addition, many bedside monitors and devices have alarms systems that must be evaluated throughout the workday, and actions taken on the basis of the patient's condition and other data. It is obvious that the complexity within such care processes presents many potential opportunities for overlooking important details. The capability to systematically and logically link physiological monitors and other selected data sets into a cohesive dashboard system holds tremendous promise for improving care quality, patient safety, and clinical outcomes in the intensive care unit.
A hierarchical SVG image abstraction layer for medical imaging
NASA Astrophysics Data System (ADS)
Kim, Edward; Huang, Xiaolei; Tan, Gang; Long, L. Rodney; Antani, Sameer
2010-03-01
As medical imaging rapidly expands, there is an increasing need to structure and organize image data for efficient analysis, storage and retrieval. In response, a large fraction of research in the areas of content-based image retrieval (CBIR) and picture archiving and communication systems (PACS) has focused on structuring information to bridge the "semantic gap", a disparity between machine and human image understanding. An additional consideration in medical images is the organization and integration of clinical diagnostic information. As a step towards bridging the semantic gap, we design and implement a hierarchical image abstraction layer using an XML based language, Scalable Vector Graphics (SVG). Our method encodes features from the raw image and clinical information into an extensible "layer" that can be stored in a SVG document and efficiently searched. Any feature extracted from the raw image including, color, texture, orientation, size, neighbor information, etc., can be combined in our abstraction with high level descriptions or classifications. And our representation can natively characterize an image in a hierarchical tree structure to support multiple levels of segmentation. Furthermore, being a world wide web consortium (W3C) standard, SVG is able to be displayed by most web browsers, interacted with by ECMAScript (standardized scripting language, e.g. JavaScript, JScript), and indexed and retrieved by XML databases and XQuery. Using these open source technologies enables straightforward integration into existing systems. From our results, we show that the flexibility and extensibility of our abstraction facilitates effective storage and retrieval of medical images.