Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James
2014-05-13
An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.
Automatic information extraction from unstructured mammography reports using distributed semantics.
Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L
2018-02-01
To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.
Information extraction from multi-institutional radiology reports.
Hassanpour, Saeed; Langlotz, Curtis P
2016-01-01
The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.
Integrating Information Extraction Agents into a Tourism Recommender System
NASA Astrophysics Data System (ADS)
Esparcia, Sergio; Sánchez-Anguix, Víctor; Argente, Estefanía; García-Fornes, Ana; Julián, Vicente
Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system.
Longitudinal Analysis of New Information Types in Clinical Notes
Zhang, Rui; Pakhomov, Serguei; Melton, Genevieve B.
2014-01-01
It is increasingly recognized that redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous, significant, and may negatively impact the secondary use of these notes for research and patient care. We investigated several automated methods to identify redundant versus relevant new information in clinical reports. These methods may provide a valuable approach to extract clinically pertinent information and further improve the accuracy of clinical information extraction systems. In this study, we used UMLS semantic types to extract several types of new information, including problems, medications, and laboratory information. Automatically identified new information highly correlated with manual reference standard annotations. Methods to identify different types of new information can potentially help to build up more robust information extraction systems for clinical researchers as well as aid clinicians and researchers in navigating clinical notes more effectively and quickly identify information pertaining to changes in health states. PMID:25717418
Smart Extraction and Analysis System for Clinical Research.
Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung
2017-05-01
With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.
Conception of Self-Construction Production Scheduling System
NASA Astrophysics Data System (ADS)
Xue, Hai; Zhang, Xuerui; Shimizu, Yasuhiro; Fujimura, Shigeru
With the high speed innovation of information technology, many production scheduling systems have been developed. However, a lot of customization according to individual production environment is required, and then a large investment for development and maintenance is indispensable. Therefore now the direction to construct scheduling systems should be changed. The final objective of this research aims at developing a system which is built by it extracting the scheduling technique automatically through the daily production scheduling work, so that an investment will be reduced. This extraction mechanism should be applied for various production processes for the interoperability. Using the master information extracted by the system, production scheduling operators can be supported to accelerate the production scheduling work easily and accurately without any restriction of scheduling operations. By installing this extraction mechanism, it is easy to introduce scheduling system without a lot of expense for customization. In this paper, at first a model for expressing a scheduling problem is proposed. Then the guideline to extract the scheduling information and use the extracted information is shown and some applied functions are also proposed based on it.
Hunter, Lawrence; Lu, Zhiyong; Firby, James; Baumgartner, William A; Johnson, Helen L; Ogren, Philip V; Cohen, K Bretonnel
2008-01-01
Background Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at PMID:18237434
CRL/Brandeis: Description of the DIDEROT System as Used for MUC-5
1993-01-01
been evaluated in the 4th Message Understanding Conference (MUC-4 ) where it was required to extract information from 200 texts on South American...Email : jamesp@cs.brandeis .edu Abstract This report describes the major developments over the last six months in completing th e Diderot information ...extraction system for the MUC-5 evaluation . Diderot is an information extraction system built at CRL and Brandeis University over th e past two
An information extraction framework for cohort identification using electronic health records.
Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G
2013-01-01
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.
Extracting laboratory test information from biomedical text
Kang, Yanna Shen; Kayaalp, Mehmet
2013-01-01
Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058
Information Extraction from Unstructured Text for the Biodefense Knowledge Center
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samatova, N F; Park, B; Krishnamurthy, R
2005-04-29
The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces ofmore » our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.« less
Extracting and standardizing medication information in clinical text - the MedEx-UIMA system.
Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C; Xu, Hua
2014-01-01
Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.
An Information Extraction Framework for Cohort Identification Using Electronic Health Records
Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255
CMS-2 Reverse Engineering and ENCORE/MODEL Integration
1992-05-01
Automated extraction of design information from an existing software system written in CMS-2 can be used to document that system as-built, and that I The...extracted information is provided by a commer- dally available CASE tool. * Information describing software system design is automatically extracted...the displays in Figures 1, 2, and 3. T achiev ths GE 11 b iuo w as rjcs CM-2t Aa nsltr(M2da 1 n Joia Reverse EwngiernTcnlg 5RT [2GRE] . Two xampe fD
Models Extracted from Text for System-Software Safety Analyses
NASA Technical Reports Server (NTRS)
Malin, Jane T.
2010-01-01
This presentation describes extraction and integration of requirements information and safety information in visualizations to support early review of completeness, correctness, and consistency of lengthy and diverse system safety analyses. Software tools have been developed and extended to perform the following tasks: 1) extract model parts and safety information from text in interface requirements documents, failure modes and effects analyses and hazard reports; 2) map and integrate the information to develop system architecture models and visualizations for safety analysts; and 3) provide model output to support virtual system integration testing. This presentation illustrates the methods and products with a rocket motor initiation case.
Extracting and standardizing medication information in clinical text – the MedEx-UIMA system
Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C.; Xu, Hua
2014-01-01
Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/. PMID:25954575
NASA Astrophysics Data System (ADS)
Chen, Lei; Li, Dehua; Yang, Jie
2007-12-01
Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.
Extraction of Data from a Hospital Information System to Perform Process Mining.
Neira, Ricardo Alfredo Quintano; de Vries, Gert-Jan; Caffarel, Jennifer; Stretton, Erin
2017-01-01
The aim of this work is to share our experience in relevant data extraction from a hospital information system in preparation for a research study using process mining techniques. The steps performed were: research definition, mapping the normative processes, identification of tables and fields names of the database, and extraction of data. We then offer lessons learned during data extraction phase. Any errors made in the extraction phase will propagate and have implications on subsequent analyses. Thus, it is essential to take the time needed and devote sufficient attention to detail to perform all activities with the goal of ensuring high quality of the extracted data. We hope this work will be informative for other researchers to plan and execute extraction of data for process mining research studies.
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus
2015-01-01
Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus. PMID:26099853
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.
Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia
2015-01-01
Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.
Table Extraction from Web Pages Using Conditional Random Fields to Extract Toponym Related Data
NASA Astrophysics Data System (ADS)
Luthfi Hanifah, Hayyu'; Akbar, Saiful
2017-01-01
Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.
1986-05-01
Information System (DGIS) is being developed to provide the DD crmjnj t with a modern tool to access diverse dtabaiees and extract information products...this community with a modern tool for accessing these databases and extracting information products from them. Since the Defense Technical Information...adjunct to DROLS xesults. The study , thereor. centerd around obtaining background information inside the unit on that unit’s users who request DROLS
Fine-grained information extraction from German transthoracic echocardiography reports.
Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank
2015-11-12
Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.
Quiñones, Karin D; Su, Hua; Marshall, Byron; Eggers, Shauna; Chen, Hsinchun
2007-09-01
Explosive growth in biomedical research has made automated information extraction, knowledge integration, and visualization increasingly important and critically needed. The Arizona BioPathway (ABP) system extracts and displays biological regulatory pathway information from the abstracts of journal articles. This study uses relations extracted from more than 200 PubMed abstracts presented in a tabular and graphical user interface with built-in search and aggregation functionality. This paper presents a task-centered assessment of the usefulness and usability of the ABP system focusing on its relation aggregation and visualization functionalities. Results suggest that our graph-based visualization is more efficient in supporting pathway analysis tasks and is perceived as more useful and easier to use as compared to a text-based literature-viewing method. Relation aggregation significantly contributes to knowledge-acquisition efficiency. Together, the graphic and tabular views in the ABP Visualizer provide a flexible and effective interface for pathway relation browsing and analysis. Our study contributes to pathway-related research and biological information extraction by assessing the value of a multiview, relation-based interface that supports user-controlled exploration of pathway information across multiple granularities.
Considering context: reliable entity networks through contextual relationship extraction
NASA Astrophysics Data System (ADS)
David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.
2016-05-01
Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.
CRL/Brandeis: The DIDEROT System
1993-01-01
system has already been evaluated in the 4th Mes- sage Understanding Conference (MUC-4) where it was required to extract information from 200 texts on...Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that... information extraction system built at CRL and Brandeis University over the past two years. It was produced as part of our efforts in the Tipster project
Tags Extarction from Spatial Documents in Search Engines
NASA Astrophysics Data System (ADS)
Borhaninejad, S.; Hakimpour, F.; Hamzei, E.
2015-12-01
Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.
NASA Astrophysics Data System (ADS)
Zhang, Xiaowen; Chen, Bingfeng
2017-08-01
Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.
eGARD: Extracting associations between genomic anomalies and drug responses from text
Rao, Shruti; McGarvey, Peter; Wu, Cathy; Madhavan, Subha; Vijay-Shanker, K.
2017-01-01
Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials. PMID:29261751
Rapid Training of Information Extraction with Local and Global Data Views
2012-05-01
relation type extension system based on active learning a relation type extension system based on semi-supervised learning, and a crossdomain...bootstrapping system for domain adaptive named entity extraction. The active learning procedure adopts features extracted at the sentence level as the local
Semantic extraction and processing of medical records for patient-oriented visual index
NASA Astrophysics Data System (ADS)
Zheng, Weilin; Dong, Wenjie; Chen, Xiangjiao; Zhang, Jianguo
2012-02-01
To have comprehensive and completed understanding healthcare status of a patient, doctors need to search patient medical records from different healthcare information systems, such as PACS, RIS, HIS, USIS, as a reference of diagnosis and treatment decisions for the patient. However, it is time-consuming and tedious to do these procedures. In order to solve this kind of problems, we developed a patient-oriented visual index system (VIS) to use the visual technology to show health status and to retrieve the patients' examination information stored in each system with a 3D human model. In this presentation, we present a new approach about how to extract the semantic and characteristic information from the medical record systems such as RIS/USIS to create the 3D Visual Index. This approach includes following steps: (1) Building a medical characteristic semantic knowledge base; (2) Developing natural language processing (NLP) engine to perform semantic analysis and logical judgment on text-based medical records; (3) Applying the knowledge base and NLP engine on medical records to extract medical characteristics (e.g., the positive focus information), and then mapping extracted information to related organ/parts of 3D human model to create the visual index. We performed the testing procedures on 559 samples of radiological reports which include 853 focuses, and achieved 828 focuses' information. The successful rate of focus extraction is about 97.1%.
Construction of Green Tide Monitoring System and Research on its Key Techniques
NASA Astrophysics Data System (ADS)
Xing, B.; Li, J.; Zhu, H.; Wei, P.; Zhao, Y.
2018-04-01
As a kind of marine natural disaster, Green Tide has been appearing every year along the Qingdao Coast, bringing great loss to this region, since the large-scale bloom in 2008. Therefore, it is of great value to obtain the real time dynamic information about green tide distribution. In this study, methods of optical remote sensing and microwave remote sensing are employed in Green Tide Monitoring Research. A specific remote sensing data processing flow and a green tide information extraction algorithm are designed, according to the optical and microwave data of different characteristics. In the aspect of green tide spatial distribution information extraction, an automatic extraction algorithm of green tide distribution boundaries is designed based on the principle of mathematical morphology dilation/erosion. And key issues in information extraction, including the division of green tide regions, the obtaining of basic distributions, the limitation of distribution boundary, and the elimination of islands, have been solved. The automatic generation of green tide distribution boundaries from the results of remote sensing information extraction is realized. Finally, a green tide monitoring system is built based on IDL/GIS secondary development in the integrated environment of RS and GIS, achieving the integration of RS monitoring and information extraction.
The utility of an automated electronic system to monitor and audit transfusion practice.
Grey, D E; Smith, V; Villanueva, G; Richards, B; Augustson, B; Erber, W N
2006-05-01
Transfusion laboratories with transfusion committees have a responsibility to monitor transfusion practice and generate improvements in clinical decision-making and red cell usage. However, this can be problematic and expensive because data cannot be readily extracted from most laboratory information systems. To overcome this problem, we developed and introduced a system to electronically extract and collate extensive amounts of data from two laboratory information systems and to link it with ICD10 clinical codes in a new database using standard information technology. Three data files were generated from two laboratory information systems, ULTRA (version 3.2) and TM, using standard information technology scripts. These were patient pre- and post-transfusion haemoglobin, blood group and antibody screen, and cross match and transfusion data. These data together with ICD10 codes for surgical cases were imported into an MS ACCESS database and linked by means of a unique laboratory number. Queries were then run to extract the relevant information and processed in Microsoft Excel for graphical presentation. We assessed the utility of this data extraction system to audit transfusion practice in a 600-bed adult tertiary hospital over an 18-month period. A total of 52 MB of data were extracted from the two laboratory information systems for the 18-month period and together with 2.0 MB theatre ICD10 data enabled case-specific transfusion information to be generated. The audit evaluated 15,992 blood group and antibody screens, 25,344 cross-matched red cell units and 15,455 transfused red cell units. Data evaluated included cross-matched to transfusion ratios and pre- and post-transfusion haemoglobin levels for a range of clinical diagnoses. Data showed significant differences between clinical units and by ICD10 code. This method to electronically extract large amounts of data and linkage with clinical databases has provided a powerful and sustainable tool for monitoring transfusion practice. It has been successfully used to identify areas requiring education, training and clinical guidance and allows for comparison with national haemoglobin-based transfusion guidelines.
Information Extraction for System-Software Safety Analysis: Calendar Year 2008 Year-End Report
NASA Technical Reports Server (NTRS)
Malin, Jane T.
2009-01-01
This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.
Hsiao, Mei-Yu; Chen, Chien-Chung; Chen, Jyh-Horng
2009-10-01
With a rapid progress in the field, a great many fMRI studies are published every year, to the extent that it is now becoming difficult for researchers to keep up with the literature, since reading papers is extremely time-consuming and labor-intensive. Thus, automatic information extraction has become an important issue. In this study, we used the Unified Medical Language System (UMLS) to construct a hierarchical concept-based dictionary of brain functions. To the best of our knowledge, this is the first generalized dictionary of this kind. We also developed an information extraction system for recognizing, mapping and classifying terms relevant to human brain study. The precision and recall of our system was on a par with that of human experts in term recognition, term mapping and term classification. Our approach presented in this paper presents an alternative to the more laborious, manual entry approach to information extraction.
Automated extraction of family history information from clinical notes.
Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S; Winden, Tamara J; Carter, Elizabeth W; Melton, Genevieve B
2014-01-01
Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication ("indicator phrases"), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications.
Automated Extraction of Family History Information from Clinical Notes
Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S.; Winden, Tamara J.; Carter, Elizabeth W.; Melton, Genevieve B.
2014-01-01
Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication (“indicator phrases”), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications. PMID:25954443
Jenke, Dennis
2007-01-01
Leaching of plastic materials, packaging, or containment systems by finished drug products and/or their related solutions can happen when contact occurs between such materials, systems, and products. While the drug product vendor has the regulatory/legal responsibility to demonstrate that such leaching does not affect the safety, efficacy, and/or compliance of the finished drug product, the plastic's supplier can facilitate that demonstration by providing the drug product vendor with appropriate and relevant information. Although it is a reasonable expectation that suppliers would possess and share such facilitating information, it is not reasonable for vendors to expect suppliers to (1) reveal confidential information without appropriate safeguards and (2) possess information specific to the vendor's finished drug product. Any potential conflict between the vendor's desire for information and the supplier's willingness to either generate or supply such information can be mitigated if the roles and responsibilities of these two stakeholders are established up front. The vendor of the finished drug product is responsible for supplying regulators with a full and complete leachables assessment for its finished drug product. To facilitate (but not take the place of) the vendor's leachables assessment, suppliers of the materials, components, or systems can provide the vendor with a full and complete extractables assessment for their material/system. The vendor and supplier share the responsibility for reconciling or correlating the extractables and leachables data. While this document establishes the components of a full and complete extractables assessment, specifying the detailed process by which a full and complete extractables assessment is performed is beyond its scope.
Ben-Shaul, Yoram
2015-01-01
Across all sensory modalities, stimuli can vary along multiple dimensions. Efficient extraction of information requires sensitivity to those stimulus dimensions that provide behaviorally relevant information. To derive social information from chemosensory cues, sensory systems must embed information about the relationships between behaviorally relevant traits of individuals and the distributions of the chemical cues that are informative about these traits. In simple cases, the mere presence of one particular compound is sufficient to guide appropriate behavior. However, more generally, chemosensory information is conveyed via relative levels of multiple chemical cues, in non-trivial ways. The computations and networks needed to derive information from multi-molecule stimuli are distinct from those required by single molecule cues. Our current knowledge about how socially relevant information is encoded by chemical blends, and how it is extracted by chemosensory systems is very limited. This manuscript explores several scenarios and the neuronal computations required to identify them. PMID:26635515
Framework for automatic information extraction from research papers on nanocrystal devices
Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C
2015-01-01
Summary To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system. PMID:26665057
Framework for automatic information extraction from research papers on nanocrystal devices.
Dieb, Thaer M; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C
2015-01-01
To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.
A Real-Time System for Lane Detection Based on FPGA and DSP
NASA Astrophysics Data System (ADS)
Xiao, Jing; Li, Shutao; Sun, Bin
2016-12-01
This paper presents a real-time lane detection system including edge detection and improved Hough Transform based lane detection algorithm and its hardware implementation with field programmable gate array (FPGA) and digital signal processor (DSP). Firstly, gradient amplitude and direction information are combined to extract lane edge information. Then, the information is used to determine the region of interest. Finally, the lanes are extracted by using improved Hough Transform. The image processing module of the system consists of FPGA and DSP. Particularly, the algorithms implemented in FPGA are working in pipeline and processing in parallel so that the system can run in real-time. In addition, DSP realizes lane line extraction and display function with an improved Hough Transform. The experimental results show that the proposed system is able to detect lanes under different road situations efficiently and effectively.
Information extraction from Italian medical reports: An ontology-driven approach.
Viani, Natalia; Larizza, Cristiana; Tibollo, Valentina; Napolitano, Carlo; Priori, Silvia G; Bellazzi, Riccardo; Sacchi, Lucia
2018-03-01
In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible. The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports. The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results. Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database. Copyright © 2017 Elsevier B.V. All rights reserved.
Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar
2017-01-01
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838
Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar
2017-01-01
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.
Support patient search on pathology reports with interactive online learning based data extraction.
Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng
2015-01-01
Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
PKDE4J: Entity and relation extraction for public knowledge discovery.
Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young
2015-10-01
Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction. Copyright © 2015 Elsevier Inc. All rights reserved.
A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records
Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela
2016-01-01
Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. PMID:26911818
A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.
Tahsin, Tasnia; Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela
2016-09-01
The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Lu, Mujie; Shang, Wenjie; Ji, Xinkai; Hua, Mingzhuang; Cheng, Kuo
2015-12-01
Nowadays, intelligent transportation system (ITS) has already become the new direction of transportation development. Traffic data, as a fundamental part of intelligent transportation system, is having a more and more crucial status. In recent years, video observation technology has been widely used in the field of traffic information collecting. Traffic flow information contained in video data has many advantages which is comprehensive and can be stored for a long time, but there are still many problems, such as low precision and high cost in the process of collecting information. This paper aiming at these problems, proposes a kind of traffic target detection method with broad applicability. Based on three different ways of getting video data, such as aerial photography, fixed camera and handheld camera, we develop a kind of intelligent analysis software which can be used to extract the macroscopic, microscopic traffic flow information in the video, and the information can be used for traffic analysis and transportation planning. For road intersections, the system uses frame difference method to extract traffic information, for freeway sections, the system uses optical flow method to track the vehicles. The system was applied in Nanjing, Jiangsu province, and the application shows that the system for extracting different types of traffic flow information has a high accuracy, it can meet the needs of traffic engineering observations and has a good application prospect.
Uncovering the essential links in online commercial networks
NASA Astrophysics Data System (ADS)
Zeng, Wei; Fang, Meiling; Shao, Junming; Shang, Mingsheng
2016-09-01
Recommender systems are designed to effectively support individuals' decision-making process on various web sites. It can be naturally represented by a user-object bipartite network, where a link indicates that a user has collected an object. Recently, research on the information backbone has attracted researchers' interests, which is a sub-network with fewer nodes and links but carrying most of the relevant information. With the backbone, a system can generate satisfactory recommenda- tions while saving much computing resource. In this paper, we propose an enhanced topology-aware method to extract the information backbone in the bipartite network mainly based on the information of neighboring users and objects. Our backbone extraction method enables the recommender systems achieve more than 90% of the accuracy of the top-L recommendation, however, consuming only 20% links. The experimental results show that our method outperforms the alternative backbone extraction methods. Moreover, the structure of the information backbone is studied in detail. Finally, we highlight that the information backbone is one of the most important properties of the bipartite network, with which one can significantly improve the efficiency of the recommender system.
ECG Identification System Using Neural Network with Global and Local Features
ERIC Educational Resources Information Center
Tseng, Kuo-Kun; Lee, Dachao; Chen, Charles
2016-01-01
This paper proposes a human identification system via extracted electrocardiogram (ECG) signals. Two hierarchical classification structures based on global shape feature and local statistical feature is used to extract ECG signals. Global shape feature represents the outline information of ECG signals and local statistical feature extracts the…
Real time system design of motor imagery brain-computer interface based on multi band CSP and SVM
NASA Astrophysics Data System (ADS)
Zhao, Li; Li, Xiaoqin; Bian, Yan
2018-04-01
Motion imagery (MT) is an effective method to promote the recovery of limbs in patients after stroke. Though an online MT brain computer interface (BCT) system, which apply MT, can enhance the patient's participation and accelerate their recovery process. The traditional method deals with the electroencephalogram (EEG) induced by MT by common spatial pattern (CSP), which is used to extract information from a frequency band. Tn order to further improve the classification accuracy of the system, information of two characteristic frequency bands is extracted. The effectiveness of the proposed feature extraction method is verified by off-line analysis of competition data and the analysis of online system.
An introduction to the Marshall information retrieval and display system
NASA Technical Reports Server (NTRS)
1974-01-01
An on-line terminal oriented data storage and retrieval system is presented which allows a user to extract and process information from stored data bases. The use of on-line terminals for extracting and displaying data from the data bases provides a fast and responsive method for obtaining needed information. The system consists of general purpose computer programs that provide the overall capabilities of the total system. The system can process any number of data files via a Dictionary (one for each file) which describes the data format to the system. New files may be added to the system at any time, and reprogramming is not required. Illustrations of the system are shown, and sample inquiries and responses are given.
The Site Program demonstration of CF Systems' organics extraction technology was conducted to obtain specific operating and cost information that could be used in evaluating the potential applicability of the technology to Superfund sites. The demonstration was conducted concurr...
New Method for Knowledge Management Focused on Communication Pattern in Product Development
NASA Astrophysics Data System (ADS)
Noguchi, Takashi; Shiba, Hajime
In the field of manufacturing, the importance of utilizing knowledge and know-how has been growing. To meet this background, there is a need for new methods to efficiently accumulate and extract effective knowledge and know-how. To facilitate the extraction of knowledge and know-how needed by engineers, we first defined business process information which includes schedule/progress information, document data, information about communication among parties concerned, and information which corresponds to these three types of information. Based on our definitions, we proposed an IT system (FlexPIM: Flexible and collaborative Process Information Management) to register and accumulate business process information with the least effort. In order to efficiently extract effective information from huge volumes of accumulated business process information, focusing attention on “actions” and communication patterns, we propose a new extraction method using communication patterns. And the validity of this method has been verified for some communication patterns.
System for selecting relevant information for decision support.
Kalina, Jan; Seidl, Libor; Zvára, Karel; Grünfeldová, Hana; Slovák, Dalibor; Zvárová, Jana
2013-01-01
We implemented a prototype of a decision support system called SIR which has a form of a web-based classification service for diagnostic decision support. The system has the ability to select the most relevant variables and to learn a classification rule, which is guaranteed to be suitable also for high-dimensional measurements. The classification system can be useful for clinicians in primary care to support their decision-making tasks with relevant information extracted from any available clinical study. The implemented prototype was tested on a sample of patients in a cardiological study and performs an information extraction from a high-dimensional set containing both clinical and gene expression data.
Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications
NASA Technical Reports Server (NTRS)
Edwards, David E.; Haimes, Robert
1999-01-01
An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.
Liu, Zengjian; Tang, Buzhou; Wang, Xiaolong; Chen, Qingcai; Li, Haodi; Bu, Junzhao; Jiang, Jingzhi; Deng, Qiwen; Zhu, Suisong
2016-01-01
Time is an important aspect of information and is very useful for information utilization. The goal of this study was to analyze the challenges of temporal expression (TE) extraction and normalization in Chinese clinical notes by assessing the performance of a rule-based system developed by us on a manually annotated corpus (including 1,778 clinical notes of 281 hospitalized patients). In order to develop system conveniently, we divided TEs into three categories: direct, indirect and uncertain TEs, and designed different rules for each category of them. Evaluation on the independent test set shows that our system achieves an F-score of93.40% on TE extraction, and an accuracy of 92.58% on TE normalization under "exact-match" criterion. Compared with HeidelTime for Chinese newswire text, our system is much better, indicating that it is necessary to develop a specific TE extraction and normalization system for Chinese clinical notes because of domain difference.
2010-01-01
Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041
Meystre, Stéphane M; Thibault, Julien; Shen, Shuying; Hurdle, John F; South, Brett R
2010-01-01
OBJECTIVE To describe a new medication information extraction system-Textractor-developed for the 'i2b2 medication extraction challenge'. The development, functionalities, and official evaluation of the system are detailed. Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algorithms, while other modules use regular expressions, rules, and dictionaries, and one module embeds MetaMap Transfer. The official evaluation was based on a reference standard of 251 discharge summaries annotated by all teams participating in the challenge. The metrics used were recall, precision, and the F(1)-measure. They were calculated with exact and inexact matches, and were averaged at the level of systems and documents. The reference metric for this challenge, the system-level overall F(1)-measure, reached about 77% for exact matches, with a recall of 72% and a precision of 83%. Performance was the best with route information (F(1)-measure about 86%), and was good for dosage and frequency information, with F(1)-measures of about 82-85%. Results were not as good for durations, with F(1)-measures of 36-39%, and for reasons, with F(1)-measures of 24-27%. The official evaluation of Textractor for the i2b2 medication extraction challenge demonstrated satisfactory performance. This system was among the 10 best performing systems in this challenge.
Using automatically extracted information from mammography reports for decision-support
Bozkurt, Selen; Gimenez, Francisco; Burnside, Elizabeth S.; Gulkesen, Kemal H.; Rubin, Daniel L.
2016-01-01
Objective To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report. Materials and methods We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect (“reference standard”) structured inputs to the DSS (“RS-DSS”) vs NLP-derived inputs (“NLP-DSS”) on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist. Results The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004 ± 0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%. Conclusion The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions. PMID:27388877
Zheng, Shuai; Jabbour, Salma K; O'Reilly, Shannon E; Lu, James J; Dong, Lihua; Ding, Lijuan; Xiao, Ying; Yue, Ning; Wang, Fusheng; Zou, Wei
2018-02-01
In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non-small cell lung cancer (NSCLC). We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies. ©Shuai Zheng, Salma K Jabbour, Shannon E O'Reilly, James J Lu, Lihua Dong, Lijuan Ding, Ying Xiao, Ning Yue, Fusheng Wang, Wei Zou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.02.2018.
Electronic processing of informed consents in a global pharmaceutical company environment.
Vishnyakova, Dina; Gobeill, Julien; Oezdemir-Zaech, Fatma; Kreim, Olivier; Vachon, Therese; Clade, Thierry; Haenning, Xavier; Mikhailov, Dmitri; Ruch, Patrick
2014-01-01
We present an electronic capture tool to process informed consents, which are mandatory recorded when running a clinical trial. This tool aims at the extraction of information expressing the duration of the consent given by the patient to authorize the exploitation of biomarker-related information collected during clinical trials. The system integrates a language detection module (LDM) to route a document into the appropriate information extraction module (IEM). The IEM is based on language-specific sets of linguistic rules for the identification of relevant textual facts. The achieved accuracy of both the LDM and IEM is 99%. The architecture of the system is described in detail.
Smart Information Management in Health Big Data.
Muteba A, Eustache
2017-01-01
The smart information management system (SIMS) is concerned with the organization of anonymous patient records in a big data and their extraction in order to provide needful real-time intelligence. The purpose of the present study is to highlight the design and the implementation of the smart information management system. We emphasis, in one hand, the organization of a big data in flat file in simulation of nosql database, and in the other hand, the extraction of information based on lookup table and cache mechanism. The SIMS in the health big data aims the identification of new therapies and approaches to delivering care.
aCGH-MAS: Analysis of aCGH by means of Multiagent System
Benito, Rocío; Bajo, Javier; Rodríguez, Ana Eugenia; Abáigar, María
2015-01-01
There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. PMID:25874203
Design and application of PDF model for extracting
NASA Astrophysics Data System (ADS)
Xiong, Lei
2013-07-01
In order to change the steps of contributions in editorial department system from two steps to one, this paper advocates that the technology of extracting the information of PDF files should be transplanted from PDF reader into IEEE Xplore contribution system and that it should be combined with uploading in batch skillfully to enable editors to upload PDF files about 1GB in batch for once. Computers will extract the information of the title, author, address, mailbox, abstract and key words of thesis voluntarily for later retrieval so as to save plenty of labor, material and finance for editorial department.
Development of an Information System for Diploma Works Management
ERIC Educational Resources Information Center
Georgieva-Trifonova, Tsvetanka
2011-01-01
In this paper, a client/server information system for the management of data and its extraction from a database containing information for diploma works of students is proposed. The developed system provides users the possibility of accessing information about different characteristics of the diploma works, according to their specific interests.…
Music information retrieval in compressed audio files: a survey
NASA Astrophysics Data System (ADS)
Zampoglou, Markos; Malamos, Athanasios G.
2014-07-01
In this paper, we present an organized survey of the existing literature on music information retrieval systems in which descriptor features are extracted directly from the compressed audio files, without prior decompression to pulse-code modulation format. Avoiding the decompression step and utilizing the readily available compressed-domain information can significantly lighten the computational cost of a music information retrieval system, allowing application to large-scale music databases. We identify a number of systems relying on compressed-domain information and form a systematic classification of the features they extract, the retrieval tasks they tackle and the degree in which they achieve an actual increase in the overall speed-as well as any resulting loss in accuracy. Finally, we discuss recent developments in the field, and the potential research directions they open toward ultra-fast, scalable systems.
Semi-automatic building extraction in informal settlements from high-resolution satellite imagery
NASA Astrophysics Data System (ADS)
Mayunga, Selassie David
The extraction of man-made features from digital remotely sensed images is considered as an important step underpinning management of human settlements in any country. Man-made features and buildings in particular are required for varieties of applications such as urban planning, creation of geographical information systems (GIS) databases and Urban City models. The traditional man-made feature extraction methods are very expensive in terms of equipment, labour intensive, need well-trained personnel and cannot cope with changing environments, particularly in dense urban settlement areas. This research presents an approach for extracting buildings in dense informal settlement areas using high-resolution satellite imagery. The proposed system uses a novel strategy of extracting building by measuring a single point at the approximate centre of the building. The fine measurement of the building outlines is then effected using a modified snake model. The original snake model on which this framework is based, incorporates an external constraint energy term which is tailored to preserving the convergence properties of the snake model; its use to unstructured objects will negatively affect their actual shapes. The external constrained energy term was removed from the original snake model formulation, thereby, giving ability to cope with high variability of building shapes in informal settlement areas. The proposed building extraction system was tested on two areas, which have different situations. The first area was Tungi in Dar Es Salaam, Tanzania where three sites were tested. This area is characterized by informal settlements, which are illegally formulated within the city boundaries. The second area was Oromocto in New Brunswick, Canada where two sites were tested. Oromocto area is mostly flat and the buildings are constructed using similar materials. Qualitative and quantitative measures were employed to evaluate the accuracy of the results as well as the performance of the system. The qualitative and quantitative measures were based on visual inspection and by comparing the measured coordinates to the reference data respectively. In the course of this process, a mean area coverage of 98% was achieved for Dar Es Salaam test sites, which globally indicated that the extracted building polygons were close to the ground truth data. Furthermore, the proposed system saved time to extract a single building by 32%. Although the extracted building polygons are within the perimeter of ground truth data, visually some of the extracted building polygons were somewhat distorted. This implies that interactive post-editing process is necessary for cartographic representation.
Raja, Kalpana; Natarajan, Jeyakumar
2018-07-01
Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. Copyright © 2018 Elsevier B.V. All rights reserved.
Zheng, Shuai; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A
2017-01-01
Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. PMID:28487265
Extracting heading and temporal range from optic flow: Human performance issues
NASA Technical Reports Server (NTRS)
Kaiser, Mary K.; Perrone, John A.; Stone, Leland; Banks, Martin S.; Crowell, James A.
1993-01-01
Pilots are able to extract information about their vehicle motion and environmental structure from dynamic transformations in the out-the-window scene. In this presentation, we focus on the information in the optic flow which specifies vehicle heading and distance to objects in the environment, scaled to a temporal metric. In particular, we are concerned with modeling how the human operators extract the necessary information, and what factors impact their ability to utilize the critical information. In general, the psychophysical data suggest that the human visual system is fairly robust to degradations in the visual display, e.g., reduced contrast and resolution or restricted field of view. However, extraneous motion flow, i.e., introduced by sensor rotation, greatly compromises human performance. The implications of these models and data for enhanced/synthetic vision systems are discussed.
KneeTex: an ontology-driven system for information extraction from MRI reports.
Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate
2015-01-01
In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance. KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.
Han, Xu; Kim, Jung-jae; Kwoh, Chee Keong
2016-01-01
Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
The LifeWatch approach to the exploration of distributed species information
Fuentes, Daniel; Fiore, Nicola
2014-01-01
Abstract This paper introduces a new method of automatically extracting, integrating and presenting information regarding species from the most relevant online taxonomic resources. First, the information is extracted and joined using data wrappers and integration solutions. Then, an analytical tool is used to provide a visual representation of the data. The information is then integrated into a user friendly content management system. The proposal has been implemented using data from the Global Biodiversity Information Facility (GBIF), the Catalogue of Life (CoL), the World Register of Marine Species (WoRMS), the Integrated Taxonomic Information System (ITIS) and the Global Names Index (GNI). The approach improves data quality, avoiding taxonomic and nomenclature errors whilst increasing the availability and accessibility of the information. PMID:25589865
Automatic Extraction of Metadata from Scientific Publications for CRIS Systems
ERIC Educational Resources Information Center
Kovacevic, Aleksandar; Ivanovic, Dragan; Milosavljevic, Branko; Konjovic, Zora; Surla, Dusan
2011-01-01
Purpose: The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS). Design/methodology/approach: The system is based on machine learning and performs automatic extraction…
Automation for System Safety Analysis
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Fleming, Land; Throop, David; Thronesbery, Carroll; Flores, Joshua; Bennett, Ted; Wennberg, Paul
2009-01-01
This presentation describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.
Maximum work extraction and implementation costs for nonequilibrium Maxwell's demons.
Sandberg, Henrik; Delvenne, Jean-Charles; Newton, Nigel J; Mitter, Sanjoy K
2014-10-01
We determine the maximum amount of work extractable in finite time by a demon performing continuous measurements on a quadratic Hamiltonian system subjected to thermal fluctuations, in terms of the information extracted from the system. The maximum work demon is found to apply a high-gain continuous feedback involving a Kalman-Bucy estimate of the system state and operates in nonequilibrium. A simple and concrete electrical implementation of the feedback protocol is proposed, which allows for analytic expressions of the flows of energy, entropy, and information inside the demon. This let us show that any implementation of the demon must necessarily include an external power source, which we prove both from classical thermodynamics arguments and from a version of Landauer's memory erasure argument extended to nonequilibrium linear systems.
Natural language processing and visualization in the molecular imaging domain.
Tulipano, P Karina; Tao, Ying; Millar, William S; Zanzonico, Pat; Kolbert, Katherine; Xu, Hua; Yu, Hong; Chen, Lifeng; Lussier, Yves A; Friedman, Carol
2007-06-01
Molecular imaging is at the crossroads of genomic sciences and medical imaging. Information within the molecular imaging literature could be used to link to genomic and imaging information resources and to organize and index images in a way that is potentially useful to researchers. A number of natural language processing (NLP) systems are available to automatically extract information from genomic literature. One existing NLP system, known as BioMedLEE, automatically extracts biological information consisting of biomolecular substances and phenotypic data. This paper focuses on the adaptation, evaluation, and application of BioMedLEE to the molecular imaging domain. In order to adapt BioMedLEE for this domain, we extend an existing molecular imaging terminology and incorporate it into BioMedLEE. BioMedLEE's performance is assessed with a formal evaluation study. The system's performance, measured as recall and precision, is 0.74 (95% CI: [.70-.76]) and 0.70 (95% CI [.63-.76]), respectively. We adapt a JAVA viewer known as PGviewer for the simultaneous visualization of images with NLP extracted information.
Automatic Molar Extraction from Dental Panoramic Radiographs for Forensic Personal Identification
NASA Astrophysics Data System (ADS)
Samopa, Febriliyan; Asano, Akira; Taguchi, Akira
Measurement of an individual molar provides rich information for forensic personal identification. We propose a computer-based system for extracting an individual molar from dental panoramic radiographs. A molar is obtained by extracting the region-of-interest, separating the maxilla and mandible, and extracting the boundaries between teeth. The proposed system is almost fully automatic; all that the user has to do is clicking three points on the boundary between the maxilla and the mandible.
Concept recognition for extracting protein interaction relations from biomedical text
Baumgartner, William A; Lu, Zhiyong; Johnson, Helen L; Caporaso, J Gregory; Paquette, Jesse; Lindemann, Anna; White, Elizabeth K; Medvedeva, Olga; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-01
Background: Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing. Results: Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist. Conclusion: Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet . PMID:18834500
Information extraction during simultaneous motion processing.
Rideaux, Reuben; Edwards, Mark
2014-02-01
When confronted with multiple moving objects the visual system can process them in two stages: an initial stage in which a limited number of signals are processed in parallel (i.e. simultaneously) followed by a sequential stage. We previously demonstrated that during the simultaneous stage, observers could discriminate between presentations containing up to 5 vs. 6 spatially localized motion signals (Edwards & Rideaux, 2013). Here we investigate what information is actually extracted during the simultaneous stage and whether the simultaneous limit varies with the detail of information extracted. This was achieved by measuring the ability of observers to extract varied information from low detail, i.e. the number of signals presented, to high detail, i.e. the actual directions present and the direction of a specific element, during the simultaneous stage. The results indicate that the resolution of simultaneous processing varies as a function of the information which is extracted, i.e. as the information extraction becomes more detailed, from the number of moving elements to the direction of a specific element, the capacity to process multiple signals is reduced. Thus, when assigning a capacity to simultaneous motion processing, this must be qualified by designating the degree of information extraction. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Kerkri, E; Quantin, C; Yetongnon, K; Allaert, F A; Dusserre, L
1999-01-01
In this paper, we present an application of EPIDWARE, medical data warehousing architecture, to our epidemiological follow-up project. The aim of this project is to extract and regroup information from various information systems for epidemiological studies. We give a description of the requirements of the epidemiological follow-up project such as anonymity of medical data information and data file linkage procedure. We introduce the concept of Data Warehousing Architecture. The particularities of data extraction and transformation are presented and discussed.
Efficient feature extraction from wide-area motion imagery by MapReduce in Hadoop
NASA Astrophysics Data System (ADS)
Cheng, Erkang; Ma, Liya; Blaisse, Adam; Blasch, Erik; Sheaff, Carolyn; Chen, Genshe; Wu, Jie; Ling, Haibin
2014-06-01
Wide-Area Motion Imagery (WAMI) feature extraction is important for applications such as target tracking, traffic management and accident discovery. With the increasing amount of WAMI collections and feature extraction from the data, a scalable framework is needed to handle the large amount of information. Cloud computing is one of the approaches recently applied in large scale or big data. In this paper, MapReduce in Hadoop is investigated for large scale feature extraction tasks for WAMI. Specifically, a large dataset of WAMI images is divided into several splits. Each split has a small subset of WAMI images. The feature extractions of WAMI images in each split are distributed to slave nodes in the Hadoop system. Feature extraction of each image is performed individually in the assigned slave node. Finally, the feature extraction results are sent to the Hadoop File System (HDFS) to aggregate the feature information over the collected imagery. Experiments of feature extraction with and without MapReduce are conducted to illustrate the effectiveness of our proposed Cloud-Enabled WAMI Exploitation (CAWE) approach.
Question analysis for Indonesian comparative question
NASA Astrophysics Data System (ADS)
Saelan, A.; Purwarianti, A.; Widyantoro, D. H.
2017-01-01
Information seeking is one of human needs today. Comparing things using search engine surely take more times than search only one thing. In this paper, we analyzed comparative questions for comparative question answering system. Comparative question is a question that comparing two or more entities. We grouped comparative questions into 5 types: selection between mentioned entities, selection between unmentioned entities, selection between any entity, comparison, and yes or no question. Then we extracted 4 types of information from comparative questions: entity, aspect, comparison, and constraint. We built classifiers for classification task and information extraction task. Features used for classification task are bag of words, whether for information extraction, we used lexical, 2 previous and following words lexical, and previous label as features. We tried 2 scenarios: classification first and extraction first. For classification first, we used classification result as a feature for extraction. Otherwise, for extraction first, we used extraction result as features for classification. We found that the result would be better if we do extraction first before classification. For the extraction task, classification using SMO gave the best result (88.78%), while for classification, it is better to use naïve bayes (82.35%).
Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text
NASA Astrophysics Data System (ADS)
Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.
2015-12-01
We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org
Information sciences experiment system
NASA Technical Reports Server (NTRS)
Katzberg, Stephen J.; Murray, Nicholas D.; Benz, Harry F.; Bowker, David E.; Hendricks, Herbert D.
1990-01-01
The rapid expansion of remote sensing capability over the last two decades will take another major leap forward with the advent of the Earth Observing System (Eos). An approach is presented that will permit experiments and demonstrations in onboard information extraction. The approach is a non-intrusive, eavesdropping mode in which a small amount of spacecraft real estate is allocated to an onboard computation resource. How such an approach allows the evaluation of advanced technology in the space environment, advanced techniques in information extraction for both Earth science and information science studies, direct to user data products, and real-time response to events, all without affecting other on-board instrumentation is discussed.
PDF text classification to leverage information extraction from publication reports.
Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha
2016-06-01
Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
Macedo, Alessandra A; Pessotti, Hugo C; Almansa, Luciana F; Felipe, Joaquim C; Kimura, Edna T
2016-07-01
The analyses of several systems for medical-imaging processing typically support the extraction of image attributes, but do not comprise some information that characterizes images. For example, morphometry can be applied to find new information about the visual content of an image. The extension of information may result in knowledge. Subsequently, results of mappings can be applied to recognize exam patterns, thus improving the accuracy of image retrieval and allowing a better interpretation of exam results. Although successfully applied in breast lesion images, the morphometric approach is still poorly explored in thyroid lesions due to the high subjectivity thyroid examinations. This paper presents a theoretical-practical study, considering Computer Aided Diagnosis (CAD) and Morphometry, to reduce the semantic discontinuity between medical image features and human interpretation of image content. The proposed method aggregates the content of microscopic images characterized by morphometric information and other image attributes extracted by traditional object extraction algorithms. This method carries out segmentation, feature extraction, image labeling and classification. Morphometric analysis was included as an object extraction method in order to verify the improvement of its accuracy for automatic classification of microscopic images. To validate this proposal and verify the utility of morphometric information to characterize thyroid images, a CAD system was created to classify real thyroid image-exams into Papillary Cancer, Goiter and Non-Cancer. Results showed that morphometric information can improve the accuracy and precision of image retrieval and the interpretation of results in computer-aided diagnosis. For example, in the scenario where all the extractors are combined with the morphometric information, the CAD system had its best performance (70% of precision in Papillary cases). Results signalized a positive use of morphometric information from images to reduce semantic discontinuity between human interpretation and image characterization. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Modelling and representation issues in automated feature extraction from aerial and satellite images
NASA Astrophysics Data System (ADS)
Sowmya, Arcot; Trinder, John
New digital systems for the processing of photogrammetric and remote sensing images have led to new approaches to information extraction for mapping and Geographic Information System (GIS) applications, with the expectation that data can become more readily available at a lower cost and with greater currency. Demands for mapping and GIS data are increasing as well for environmental assessment and monitoring. Hence, researchers from the fields of photogrammetry and remote sensing, as well as computer vision and artificial intelligence, are bringing together their particular skills for automating these tasks of information extraction. The paper will review some of the approaches used in knowledge representation and modelling for machine vision, and give examples of their applications in research for image understanding of aerial and satellite imagery.
Atmospheric and Oceanographic Information Processing System (AOIPS) system description
NASA Technical Reports Server (NTRS)
Bracken, P. A.; Dalton, J. T.; Billingsley, J. B.; Quann, J. J.
1977-01-01
The development of hardware and software for an interactive, minicomputer based processing and display system for atmospheric and oceanographic information extraction and image data analysis is described. The major applications of the system are discussed as well as enhancements planned for the future.
Zheng, Shuai; Lu, James J; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A; Wang, Fusheng
2017-05-09
Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. ©Shuai Zheng, James J Lu, Nima Ghasemzadeh, Salim S Hayek, Arshed A Quyyumi, Fusheng Wang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 09.05.2017.
MedXN: an open source medication extraction and normalization tool for clinical text
Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang
2014-01-01
Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954
Towards an intelligent framework for multimodal affective data analysis.
Poria, Soujanya; Cambria, Erik; Hussain, Amir; Huang, Guang-Bin
2015-03-01
An increasingly large amount of multimodal content is posted on social media websites such as YouTube and Facebook everyday. In order to cope with the growth of such so much multimodal data, there is an urgent need to develop an intelligent multi-modal analysis framework that can effectively extract information from multiple modalities. In this paper, we propose a novel multimodal information extraction agent, which infers and aggregates the semantic and affective information associated with user-generated multimodal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction. In particular, the developed intelligent agent adopts an ensemble feature extraction approach by exploiting the joint use of tri-modal (text, audio and video) features to enhance the multimodal information extraction process. In preliminary experiments using the eNTERFACE dataset, our proposed multi-modal system is shown to achieve an accuracy of 87.95%, outperforming the best state-of-the-art system by more than 10%, or in relative terms, a 56% reduction in error rate. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang
2015-06-06
Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.
PASTE: patient-centered SMS text tagging in a medication management system.
Stenner, Shane P; Johnson, Kevin B; Denny, Joshua C
2012-01-01
To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages.
NASA Astrophysics Data System (ADS)
Zhu, Zhen; Vana, Sudha; Bhattacharya, Sumit; Uijt de Haag, Maarten
2009-05-01
This paper discusses the integration of Forward-looking Infrared (FLIR) and traffic information from, for example, the Automatic Dependent Surveillance - Broadcast (ADS-B) or the Traffic Information Service-Broadcast (TIS-B). The goal of this integration method is to obtain an improved state estimate of a moving obstacle within the Field-of-View of the FLIR with added integrity. The focus of the paper will be on the approach phase of the flight. The paper will address methods to extract moving objects from the FLIR imagery and geo-reference these objects using outputs of both the onboard Global Positioning System (GPS) and the Inertial Navigation System (INS). The proposed extraction method uses a priori airport information and terrain databases. Furthermore, state information from the traffic information sources will be extracted and integrated with the state estimates from the FLIR. Finally, a method will be addressed that performs a consistency check between both sources of traffic information. The methods discussed in this paper will be evaluated using flight test data collected with a Gulfstream V in Reno, NV (GVSITE) and simulated ADS-B.
Truly work-like work extraction via a single-shot analysis.
Aberg, Johan
2013-01-01
The work content of non-equilibrium systems in relation to a heat bath is often analysed in terms of expectation values of an underlying random work variable. However, when optimizing the expectation value of the extracted work, the resulting extraction process is subject to intrinsic fluctuations, uniquely determined by the Hamiltonian and the initial distribution of the system. These fluctuations can be of the same order as the expected work content per se, in which case the extracted energy is unpredictable, thus intuitively more heat-like than work-like. This raises the question of the 'truly' work-like energy that can be extracted. Here we consider an alternative that corresponds to an essentially fluctuation-free extraction. We show that this quantity can be expressed in terms of a one-shot relative entropy measure introduced in information theory. This suggests that the relations between information theory and statistical mechanics, as illustrated by concepts like Maxwell's demon, Szilard engines and Landauer's principle, extends to the single-shot regime.
Novel method of extracting motion from natural movies.
Suzuki, Wataru; Ichinohe, Noritaka; Tani, Toshiki; Hayami, Taku; Miyakawa, Naohisa; Watanabe, Satoshi; Takeichi, Hiroshige
2017-11-01
The visual system in primates can be segregated into motion and shape pathways. Interaction occurs at multiple stages along these pathways. Processing of shape-from-motion and biological motion is considered to be a higher-order integration process involving motion and shape information. However, relatively limited types of stimuli have been used in previous studies on these integration processes. We propose a new algorithm to extract object motion information from natural movies and to move random dots in accordance with the information. The object motion information is extracted by estimating the dynamics of local normal vectors of the image intensity projected onto the x-y plane of the movie. An electrophysiological experiment on two adult common marmoset monkeys (Callithrix jacchus) showed that the natural and random dot movies generated with this new algorithm yielded comparable neural responses in the middle temporal visual area. In principle, this algorithm provided random dot motion stimuli containing shape information for arbitrary natural movies. This new method is expected to expand the neurophysiological and psychophysical experimental protocols to elucidate the integration processing of motion and shape information in biological systems. The novel algorithm proposed here was effective in extracting object motion information from natural movies and provided new motion stimuli to investigate higher-order motion information processing. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.
2018-01-18
processing. Specifically, the method described herein uses wgrib2 commands along with a Python script or program to produce tabular text files that in...It makes use of software that is readily available and can be implemented on many computer systems combined with relatively modest additional...example), extracts appropriate information, and lists the extracted information in a readable tabular form. The Python script used here is described in
Automated extraction and semantic analysis of mutation impacts from the biomedical literature
2012-01-01
Background Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions. PMID:22759648
HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records
Aggarwal, Anshul; Garhwal, Sunita
2018-01-01
Objectives One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. Methods A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. Results The HEDEA system is working, covering a large set of formats, to extract and analyse health information. Conclusions This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes. PMID:29770248
HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records.
Aggarwal, Anshul; Garhwal, Sunita; Kumar, Ajay
2018-04-01
One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. The HEDEA system is working, covering a large set of formats, to extract and analyse health information. This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.
Doppler extraction with a digital VCO
NASA Technical Reports Server (NTRS)
Starner, E. R.; Nossen, E. J.
1977-01-01
Digitally controlled oscillator in phased-locked loop may be useful for data communications systems, or may be modified to serve as information extraction component of microwave or optical system for collision avoidance or automatic braking. Instrument is frequency-synthesizing device with output specified precisely by digital number programmed into frequency register.
Event-based text mining for biology and functional genomics
Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.
2015-01-01
The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365
Recognition techniques for extracting information from semistructured documents
NASA Astrophysics Data System (ADS)
Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna
2000-12-01
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
FEX: A Knowledge-Based System For Planimetric Feature Extraction
NASA Astrophysics Data System (ADS)
Zelek, John S.
1988-10-01
Topographical planimetric features include natural surfaces (rivers, lakes) and man-made surfaces (roads, railways, bridges). In conventional planimetric feature extraction, a photointerpreter manually interprets and extracts features from imagery on a stereoplotter. Visual planimetric feature extraction is a very labour intensive operation. The advantages of automating feature extraction include: time and labour savings; accuracy improvements; and planimetric data consistency. FEX (Feature EXtraction) combines techniques from image processing, remote sensing and artificial intelligence for automatic feature extraction. The feature extraction process co-ordinates the information and knowledge in a hierarchical data structure. The system simulates the reasoning of a photointerpreter in determining the planimetric features. Present efforts have concentrated on the extraction of road-like features in SPOT imagery. Keywords: Remote Sensing, Artificial Intelligence (AI), SPOT, image understanding, knowledge base, apars.
Online particle detection with Neural Networks based on topological calorimetry information
NASA Astrophysics Data System (ADS)
Ciodaro, T.; Deva, D.; de Seixas, J. M.; Damazio, D.
2012-06-01
This paper presents the latest results from the Ringer algorithm, which is based on artificial neural networks for the electron identification at the online filtering system of the ATLAS particle detector, in the context of the LHC experiment at CERN. The algorithm performs topological feature extraction using the ATLAS calorimetry information (energy measurements). The extracted information is presented to a neural network classifier. Studies showed that the Ringer algorithm achieves high detection efficiency, while keeping the false alarm rate low. Optimizations, guided by detailed analysis, reduced the algorithm execution time by 59%. Also, the total memory necessary to store the Ringer algorithm information represents less than 6.2 percent of the total filtering system amount.
Extracting semantically enriched events from biomedical literature
2012-01-01
Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266
Extracting semantically enriched events from biomedical literature.
Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia
2012-05-23
Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Cáceres, Jesús; Somolinos, Roberto; Pascual, Mario; Martínez, Ignacio; Salvador, Carlos H; Monteagudo, José Luis
2013-01-01
The objective of this paper is to introduce a new language called ccML, designed to provide convenient pragmatic information to applications using the ISO/EN13606 reference model (RM), such as electronic health record (EHR) extracts editors. EHR extracts are presently built using the syntactic and semantic information provided in the RM and constrained by archetypes. The ccML extra information enables the automation of the medico-legal context information edition, which is over 70% of the total in an extract, without modifying the RM information. ccML is defined using a W3C XML schema file. Valid ccML files complement the RM with additional pragmatics information. The ccML language grammar is defined using formal language theory as a single-type tree grammar. The new language is tested using an EHR extracts editor application as proof-of-concept system. Seven ccML PVCodes (predefined value codes) are introduced in this grammar to cope with different realistic EHR edition situations. These seven PVCodes have different interpretation strategies, from direct look up in the ccML file itself, to more complex searches in archetypes or system precomputation. The possibility to declare generic types in ccML gives rise to ambiguity during interpretation. The criterion used to overcome ambiguity is that specificity should prevail over generality. The opposite would make the individual specific element declarations useless. A new mark-up language ccML is introduced that opens up the possibility of providing applications using the ISO/EN13606 RM with the necessary pragmatics information to be practical and realistic.
NASA Astrophysics Data System (ADS)
Shi, Liehang; Ling, Tonghui; Zhang, Jianguo
2016-03-01
Radiologists currently use a variety of terminologies and standards in most hospitals in China, and even there are multiple terminologies being used for different sections in one department. In this presentation, we introduce a medical semantic comprehension system (MedSCS) to extract semantic information about clinical findings and conclusion from free text radiology reports so that the reports can be classified correctly based on medical terms indexing standards such as Radlex or SONMED-CT. Our system (MedSCS) is based on both rule-based methods and statistics-based methods which improve the performance and the scalability of MedSCS. In order to evaluate the over all of the system and measure the accuracy of the outcomes, we developed computation methods to calculate the parameters of precision rate, recall rate, F-score and exact confidence interval.
Dillahunt-Aspillaga, Christina; Finch, Dezon; Massengale, Jill; Kretzmer, Tracy; Luther, Stephen L.; McCart, James A.
2014-01-01
Objective The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). Design Retrospective cohort study using data from selected progress notes stored in the EHR. Setting Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. Participants Service members and Veterans with TBI who participated in the PREP program (N = 60). Main Outcome Measures Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. Results Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. Conclusions Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort. PMID:25541956
Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela
2015-05-01
Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Development of a full-text information retrieval system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keizo Oyama; AKira Miyazawa, Atsuhiro Takasu; Kouji Shibano
The authors have executed a project to realize a full-text information retrieval system. The system is designed to deal with a document database comprising full text of a large number of documents such as academic papers. The document structures are utilized in searching and extracting appropriate information. The concept of structure handling and the configuration of the system are described in this paper.
Semantic World Modelling and Data Management in a 4d Forest Simulation and Information System
NASA Astrophysics Data System (ADS)
Roßmann, J.; Hoppen, M.; Bücken, A.
2013-08-01
Various types of 3D simulation applications benefit from realistic forest models. They range from flight simulators for entertainment to harvester simulators for training and tree growth simulations for research and planning. Our 4D forest simulation and information system integrates the necessary methods for data extraction, modelling and management. Using modern methods of semantic world modelling, tree data can efficiently be extracted from remote sensing data. The derived forest models contain position, height, crown volume, type and diameter of each tree. This data is modelled using GML-based data models to assure compatibility and exchangeability. A flexible approach for database synchronization is used to manage the data and provide caching, persistence, a central communication hub for change distribution, and a versioning mechanism. Combining various simulation techniques and data versioning, the 4D forest simulation and information system can provide applications with "both directions" of the fourth dimension. Our paper outlines the current state, new developments, and integration of tree extraction, data modelling, and data management. It also shows several applications realized with the system.
Extracting the Textual and Temporal Structure of Supercomputing Logs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, S; Singh, I; Chandra, A
2009-05-26
Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an onlinemore » clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.« less
Peng, Yifan; Torii, Manabu; Wu, Cathy H; Vijay-Shanker, K
2014-08-23
Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task. A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations. In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus.
PASTE: patient-centered SMS text tagging in a medication management system
Johnson, Kevin B; Denny, Joshua C
2011-01-01
Objective To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Design Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. Measurements A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Results Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Conclusion Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages. PMID:21984605
Extracting the Information Backbone in Online System
Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng
2013-01-01
Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such “less can be more” feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency. PMID:23690946
Li, Lishuang; Zhang, Panpan; Zheng, Tianfu; Zhang, Hongying; Jiang, Zhenchao; Huang, Degen
2014-01-01
Protein-Protein Interaction (PPI) extraction is an important task in the biomedical information extraction. Presently, many machine learning methods for PPI extraction have achieved promising results. However, the performance is still not satisfactory. One reason is that the semantic resources were basically ignored. In this paper, we propose a multiple-kernel learning-based approach to extract PPIs, combining the feature-based kernel, tree kernel and semantic kernel. Particularly, we extend the shortest path-enclosed tree kernel (SPT) by a dynamic extended strategy to retrieve the richer syntactic information. Our semantic kernel calculates the protein-protein pair similarity and the context similarity based on two semantic resources: WordNet and Medical Subject Heading (MeSH). We evaluate our method with Support Vector Machine (SVM) and achieve an F-score of 69.40% and an AUC of 92.00%, which show that our method outperforms most of the state-of-the-art systems by integrating semantic information.
DOT National Transportation Integrated Search
2011-06-01
This report describes an accuracy assessment of extracted features derived from three : subsets of Quickbird pan-sharpened high resolution satellite image for the area of the : Port of Los Angeles, CA. Visual Learning Systems Feature Analyst and D...
Qi, Xiubin; Crooke, Emma; Ross, Andrew; Bastow, Trevor P; Stalvies, Charlotte
2011-09-21
This paper presents a system and method developed to identify a source oil's characteristic properties by testing the oil's dissolved components in water. Through close examination of the oil dissolution process in water, we hypothesise that when oil is in contact with water, the resulting oil-water extract, a complex hydrocarbon mixture, carries the signature property information of the parent oil. If the dominating differences in compositions between such extracts of different oils can be identified, this information could guide the selection of various sensors, capable of capturing such chemical variations. When used as an array, such a sensor system can be used to determine parent oil information from the oil-water extract. To test this hypothesis, 22 oils' water extracts were prepared and selected dominant hydrocarbons analyzed with Gas Chromatography-Mass Spectrometry (GC-MS); the subsequent Principal Component Analysis (PCA) indicates that the major difference between the extract solutions is the relative concentration between the volatile mono-aromatics and fluorescent polyaromatics. An integrated sensor array system that is composed of 3 volatile hydrocarbon sensors and 2 polyaromatic hydrocarbon sensors was built accordingly to capture the major and subtle differences of these extracts. It was tested by exposure to a total of 110 water extract solutions diluted from the 22 extracts. The sensor response data collected from the testing were processed with two multivariate analysis tools to reveal information retained in the response patterns of the arrayed sensors: by conducting PCA, we were able to demonstrate the ability to qualitatively identify and distinguish different oil samples from their sensor array response patterns. When a supervised PCA, Linear Discriminate Analysis (LDA), was applied, even quantitative classification can be achieved: the multivariate model generated from the LDA achieved 89.7% of successful classification of the type of the oil samples. By grouping the samples based on the level of viscosity and density we were able to reveal the correlation between the oil extracts' sensor array responses and their original oils' feature properties. The equipment and method developed in this study have promising potential to be readily applied in field studies and marine surveys for oil exploration or oil spill monitoring.
Medical-Information-Management System
NASA Technical Reports Server (NTRS)
Alterescu, Sidney; Friedman, Carl A.; Frankowski, James W.
1989-01-01
Medical Information Management System (MIMS) computer program interactive, general-purpose software system for storage and retrieval of information. Offers immediate assistance where manipulation of large data bases required. User quickly and efficiently extracts, displays, and analyzes data. Used in management of medical data and handling all aspects of data related to care of patients. Other applications include management of data on occupational safety in public and private sectors, handling judicial information, systemizing purchasing and procurement systems, and analyses of cost structures of organizations. Written in Microsoft FORTRAN 77.
System for definition of the central-chest vasculature
NASA Astrophysics Data System (ADS)
Taeprasartsit, Pinyo; Higgins, William E.
2009-02-01
Accurate definition of the central-chest vasculature from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. For instance, the aorta and pulmonary artery help in automatic definition of the Mountain lymph-node stations for lung-cancer staging. This work presents a system for defining major vascular structures in the central chest. The system provides automatic methods for extracting the aorta and pulmonary artery and semi-automatic methods for extracting the other major central chest arteries/veins, such as the superior vena cava and azygos vein. Automatic aorta and pulmonary artery extraction are performed by model fitting and selection. The system also extracts certain vascular structure information to validate outputs. A semi-automatic method extracts vasculature by finding the medial axes between provided important sites. Results of the system are applied to lymph-node station definition and guidance of bronchoscopic biopsy.
Information extraction for enhanced access to disease outbreak reports.
Grishman, Ralph; Huttunen, Silja; Yangarber, Roman
2002-08-01
Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.
PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction.
Krallinger, Martin; Rodriguez-Penagos, Carlos; Tendulkar, Ashish; Valencia, Alfonso
2009-07-01
There is an increasing interest in using literature mining techniques to complement information extracted from annotation databases or generated by bioinformatics applications. Here we present PLAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana. Our system facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development. Beyond single entities, also predefined pairs of entities can be provided as queries for which literature-derived relations together with textual evidences are returned. PLAN2L does not require registration and is freely accessible at http://zope.bioinfo.cnio.es/plan2l.
Ye, Jay J
2016-01-01
Different methods have been described for data extraction from pathology reports with varying degrees of success. Here a technique for directly extracting data from relational database is described. Our department uses synoptic reports modified from College of American Pathologists (CAP) Cancer Protocol Templates to report most of our cancer diagnoses. Choosing the melanoma of skin synoptic report as an example, R scripting language extended with RODBC package was used to query the pathology information system database. Reports containing melanoma of skin synoptic report in the past 4 and a half years were retrieved and individual data elements were extracted. Using the retrieved list of the cases, the database was queried a second time to retrieve/extract the lymph node staging information in the subsequent reports from the same patients. 426 synoptic reports corresponding to unique lesions of melanoma of skin were retrieved, and data elements of interest were extracted into an R data frame. The distribution of Breslow depth of melanomas grouped by year is used as an example of intra-report data extraction and analysis. When the new pN staging information was present in the subsequent reports, 82% (77/94) was precisely retrieved (pN0, pN1, pN2 and pN3). Additional 15% (14/94) was retrieved with certain ambiguity (positive or knowing there was an update). The specificity was 100% for both. The relationship between Breslow depth and lymph node status was graphed as an example of lesion-specific multi-report data extraction and analysis. R extended with RODBC package is a simple and versatile approach well-suited for the above tasks. The success or failure of the retrieval and extraction depended largely on whether the reports were formatted and whether the contents of the elements were consistently phrased. This approach can be easily modified and adopted for other pathology information systems that use relational database for data management.
Motamed, Cyrus; Bourgain, Jean Louis
2016-06-01
Anaesthesia Information Management Systems (AIMS) generate large amounts of data, which might be useful for quality assurance programs. This study was designed to highlight the multiple contributions of our AIMS system in extracting quality indicators over a period of 10years. The study was conducted from 2002 to 2011. Two methods were used to extract anaesthesia indicators: the manual extraction of individual files for monitoring neuromuscular relaxation and structured query language (SQL) extraction for other indicators which were postoperative nausea and vomiting (PONV), pain, sedation scores, pain-related medications, scores and postoperative hypothermia. For each indicator, a program of information/meetings and adaptation/suggestions for operating room and PACU personnel was initiated to improve quality assurance, while data were extracted each year. The study included 77,573 patients. The mean overall completeness of data for the initial years ranged from 55 to 85% and was indicator-dependent, which then improved to 95% completeness for the last 5years. The incidence of neuromuscular monitoring was initially 67% and then increased to 95% (P<0.05). The rate of pharmacological reversal remained around 53% throughout the study. Regarding SQL data, an improvement of severe postoperative pain and PONV scores was observed throughout the study, while mild postoperative hypothermia remained a challenge, despite efforts for improvement. The AIMS system permitted the follow-up of certain indicators through manual sampling and many more via SQL extraction in a sustained and non-time-consuming way across years. However, it requires competent and especially dedicated resources to handle the database. Copyright © 2016 Société française d'anesthésie et de réanimation (Sfar). Published by Elsevier Masson SAS. All rights reserved.
Automatic extraction of pavement markings on streets from point cloud data of mobile LiDAR
NASA Astrophysics Data System (ADS)
Gao, Yang; Zhong, Ruofei; Tang, Tao; Wang, Liuzhao; Liu, Xianlin
2017-08-01
Pavement markings provide an important foundation as they help to keep roads users safe. Accurate and comprehensive information about pavement markings assists the road regulators and is useful in developing driverless technology. Mobile light detection and ranging (LiDAR) systems offer new opportunities to collect and process accurate pavement markings’ information. Mobile LiDAR systems can directly obtain the three-dimensional (3D) coordinates of an object, thus defining spatial data and the intensity of (3D) objects in a fast and efficient way. The RGB attribute information of data points can be obtained based on the panoramic camera in the system. In this paper, we present a novel method process to automatically extract pavement markings using multiple attribute information of the laser scanning point cloud from the mobile LiDAR data. This method process utilizes a differential grayscale of RGB color, laser pulse reflection intensity, and the differential intensity to identify and extract pavement markings. We utilized point cloud density to remove the noise and used morphological operations to eliminate the errors. In the application, we tested our method process on different sections of roads in Beijing, China, and Buffalo, NY, USA. The results indicated that both correctness (p) and completeness (r) were higher than 90%. The method process of this research can be applied to extract pavement markings from huge point cloud data produced by mobile LiDAR.
Daemonic ergotropy: enhanced work extraction from quantum correlations
NASA Astrophysics Data System (ADS)
Francica, Gianluca; Goold, John; Plastina, Francesco; Paternostro, Mauro
2017-03-01
We investigate how the presence of quantum correlations can influence work extraction in closed quantum systems, establishing a new link between the field of quantum non-equilibrium thermodynamics and the one of quantum information theory. We consider a bipartite quantum system and we show that it is possible to optimize the process of work extraction, thanks to the correlations between the two parts of the system, by using an appropriate feedback protocol based on the concept of ergotropy. We prove that the maximum gain in the extracted work is related to the existence of quantum correlations between the two parts, quantified by either quantum discord or, for pure states, entanglement. We then illustrate our general findings on a simple physical situation consisting of a qubit system.
Establishment of Application Guidance for OTC non-Kampo Crude Drug Extract Products in Japan
Somekawa, Layla; Maegawa, Hikoichiro; Tsukada, Shinsuke; Nakamura, Takatoshi
2017-01-01
Currently, there are no standardized regulatory systems for herbal medicinal products worldwide. Communication and sharing of knowledge between different regulatory systems will lead to mutual understanding and might help identify topics which deserve further discussion in the establishment of common standards. Regulatory information on traditional herbal medicinal products in Japan is updated by the establishment of Application Guidance for over-the-counter non-Kampo Crude Drug Extract Products. We would like to report on updated regulatory information on the new Application Guidance. Methods for comparison of Crude Drug Extract formulation and standard decoction and criteria for application and the key points to consider for each criterion are indicated in the guidance. Establishment of the guidance contributes to improvements in public health. We hope that the regulatory information about traditional herbal medicinal products in Japan will be of contribution to tackling the challenging task of regulating traditional herbal products worldwide. PMID:28894633
Technical design and system implementation of region-line primitive association framework
NASA Astrophysics Data System (ADS)
Wang, Min; Xing, Jinjin; Wang, Jie; Lv, Guonian
2017-08-01
Apart from regions, image edge lines are an important information source, and they deserve more attention in object-based image analysis (OBIA) than they currently receive. In the region-line primitive association framework (RLPAF), we promote straight-edge lines as line primitives to achieve powerful OBIAs. Along with regions, straight lines become basic units for subsequent extraction and analysis of OBIA features. This study develops a new software system called remote-sensing knowledge finder (RSFinder) to implement RLPAF for engineering application purposes. This paper introduces the extended technical framework, a comprehensively designed feature set, key technology, and software implementation. To our knowledge, RSFinder is the world's first OBIA system based on two types of primitives, namely, regions and lines. It is fundamentally different from other well-known region-only-based OBIA systems, such as eCogntion and ENVI feature extraction module. This paper has important reference values for the development of similarly structured OBIA systems and line-involved extraction algorithms of remote sensing information.
A study on building data warehouse of hospital information system.
Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo
2011-08-01
Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.
Optimal protocol for maximum work extraction in a feedback process with a time-varying potential
NASA Astrophysics Data System (ADS)
Kwon, Chulan
2017-12-01
The nonequilibrium nature of information thermodynamics is characterized by the inequality or non-negativity of the total entropy change of the system, memory, and reservoir. Mutual information change plays a crucial role in the inequality, in particular if work is extracted and the paradox of Maxwell's demon is raised. We consider the Brownian information engine where the protocol set of the harmonic potential is initially chosen by the measurement and varies in time. We confirm the inequality of the total entropy change by calculating, in detail, the entropic terms including the mutual information change. We rigorously find the optimal values of the time-dependent protocol for maximum extraction of work both for the finite-time and the quasi-static process.
Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease
Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.
1998-01-01
The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Cáceres, Jesús; Somolinos, Roberto; Pascual, Mario; Martínez, Ignacio; Salvador, Carlos H; Monteagudo, José Luis
2013-01-01
Objective The objective of this paper is to introduce a new language called ccML, designed to provide convenient pragmatic information to applications using the ISO/EN13606 reference model (RM), such as electronic health record (EHR) extracts editors. EHR extracts are presently built using the syntactic and semantic information provided in the RM and constrained by archetypes. The ccML extra information enables the automation of the medico-legal context information edition, which is over 70% of the total in an extract, without modifying the RM information. Materials and Methods ccML is defined using a W3C XML schema file. Valid ccML files complement the RM with additional pragmatics information. The ccML language grammar is defined using formal language theory as a single-type tree grammar. The new language is tested using an EHR extracts editor application as proof-of-concept system. Results Seven ccML PVCodes (predefined value codes) are introduced in this grammar to cope with different realistic EHR edition situations. These seven PVCodes have different interpretation strategies, from direct look up in the ccML file itself, to more complex searches in archetypes or system precomputation. Discussion The possibility to declare generic types in ccML gives rise to ambiguity during interpretation. The criterion used to overcome ambiguity is that specificity should prevail over generality. The opposite would make the individual specific element declarations useless. Conclusion A new mark-up language ccML is introduced that opens up the possibility of providing applications using the ISO/EN13606 RM with the necessary pragmatics information to be practical and realistic. PMID:23019241
INFOL for the CDC 6400 Information Storage and Retrieval System. Reference Manual.
ERIC Educational Resources Information Center
Mittman, B.; And Others
INFOL for the CDC 6400 is a rewrite in FORTRAN IV of the CDC 3600/3800 INFOL (Information Oriented Language), a generalized information storage and retrieval system developed by the Control Data Corporation for the CDC 3600/3800 computer. With INFOL, selected pieces of information are extracted from a file and presented to the user quickly and…
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov
Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V
2016-01-01
Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. PMID:27013523
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.
Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua
2016-07-01
Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
BioRAT: extracting biological information from full-length papers.
Corney, David P A; Buxton, Bernard F; Langdon, William B; Jones, David T
2004-11-22
Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.
Analogy between gambling and measurement-based work extraction
NASA Astrophysics Data System (ADS)
Vinkler, Dror A.; Permuter, Haim H.; Merhav, Neri
2016-04-01
In information theory, one area of interest is gambling, where mutual information characterizes the maximal gain in wealth growth rate due to knowledge of side information; the betting strategy that achieves this maximum is named the Kelly strategy. In the field of physics, it was recently shown that mutual information can characterize the maximal amount of work that can be extracted from a single heat bath using measurement-based control protocols, i.e. using ‘information engines’. However, to the best of our knowledge, no relation between gambling and information engines has been presented before. In this paper, we briefly review the two concepts and then demonstrate an analogy between gambling, where bits are converted into wealth, and information engines, where bits representing measurements are converted into energy. From this analogy follows an extension of gambling to the continuous-valued case, which is shown to be useful for investments in currency exchange rates or in the stock market using options. Moreover, the analogy enables us to use well-known methods and results from one field to solve problems in the other. We present three such cases: maximum work extraction when the probability distributions governing the system and measurements are unknown, work extraction when some energy is lost in each cycle, e.g. due to friction, and an analysis of systems with memory. In all three cases, the analogy enables us to use known results in order to obtain new ones.
WRIS: a resource information system for wildland management
Robert M. Russell; David A. Sharpnack; Elliot Amidon
1975-01-01
WRIS (Wildland Resource Information System) is a computer system for processing, storing, retrieving, updating, and displaying geographic data. The polygon, representing a land area boundary, forms the building block of WRIS. Polygons form a map. Maps are digitized manually or by automatic scanning. Computer programs can extract and produce polygon maps and can overlay...
Aviation obstacle auto-extraction using remote sensing information
NASA Astrophysics Data System (ADS)
Zimmer, N.; Lugsch, W.; Ravenscroft, D.; Schiefele, J.
2008-10-01
An Obstacle, in the aviation context, may be any natural, man-made, fixed or movable object, permanent or temporary. Currently, the most common way to detect relevant aviation obstacles from an aircraft or helicopter for navigation purposes and collision avoidance is the use of merged infrared and synthetic information of obstacle data. Several algorithms have been established to utilize synthetic and infrared images to generate obstacle information. There might be a situation however where the system is error-prone and may not be able to consistently determine the current environment. This situation can be avoided when the system knows the true position of the obstacle. The quality characteristics of the obstacle data strongly depends on the quality of the source data such as maps and official publications. In some countries such as newly industrializing and developing countries, quality and quantity of obstacle information is not available. The aviation world has two specifications - RTCA DO-276A and ICAO ANNEX 15 Ch. 10 - which describe the requirements for aviation obstacles. It is essential to meet these requirements to be compliant with the specifications and to support systems based on these specifications, e.g. 3D obstacle warning systems where accurate coordinates based on WGS-84 is a necessity. Existing aerial and satellite or soon to exist high quality remote sensing data makes it feasible to think about automated aviation obstacle data origination. This paper will describe the feasibility to auto-extract aviation obstacles from remote sensing data considering limitations of image and extraction technologies. Quality parameters and possible resolution of auto-extracted obstacle data will be discussed and presented.
Feasibility of an anticipatory noncontact precrash restraint actuation system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kercel, S.W.; Dress, W.B.
1995-12-31
The problem of providing an electronic warning of an impending crash to a precrash restraint system a fraction of a second before physical contact differs from more widely explored problems, such as providing several seconds of crash warning to a driver. One approach to precrash restraint sensing is to apply anticipatory system theory. This consists of nested simplified models of the system to be controlled and of the system`s environment. It requires sensory information to describe the ``current state`` of the system and the environment. The models use the sensory data to make a faster-than-real-time prediction about the near future.more » Anticipation theory is well founded but rarely used. A major problem is to extract real-time current-state information from inexpensive sensors. Providing current-state information to the nested models is the weakest element of the system. Therefore, sensors and real-time processing of sensor signals command the most attention in an assessment of system feasibility. This paper describes problem definition, potential ``showstoppers,`` and ways to overcome them. It includes experiments showing that inexpensive radar is a practical sensing element. It considers fast and inexpensive algorithms to extract information from sensor data.« less
Fast Coding of Orientation in Primary Visual Cortex
Shriki, Oren; Kohn, Adam; Shamir, Maoz
2012-01-01
Understanding how populations of neurons encode sensory information is a major goal of systems neuroscience. Attempts to answer this question have focused on responses measured over several hundred milliseconds, a duration much longer than that frequently used by animals to make decisions about the environment. How reliably sensory information is encoded on briefer time scales, and how best to extract this information, is unknown. Although it has been proposed that neuronal response latency provides a major cue for fast decisions in the visual system, this hypothesis has not been tested systematically and in a quantitative manner. Here we use a simple ‘race to threshold’ readout mechanism to quantify the information content of spike time latency of primary visual (V1) cortical cells to stimulus orientation. We find that many V1 cells show pronounced tuning of their spike latency to stimulus orientation and that almost as much information can be extracted from spike latencies as from firing rates measured over much longer durations. To extract this information, stimulus onset must be estimated accurately. We show that the responses of cells with weak tuning of spike latency can provide a reliable onset detector. We find that spike latency information can be pooled from a large neuronal population, provided that the decision threshold is scaled linearly with the population size, yielding a processing time of the order of a few tens of milliseconds. Our results provide a novel mechanism for extracting information from neuronal populations over the very brief time scales in which behavioral judgments must sometimes be made. PMID:22719237
ERIC Educational Resources Information Center
Zhang, Rui
2013-01-01
The widespread adoption of Electronic Health Record (EHR) has resulted in rapid text proliferation within clinical care. Clinicians' use of copying and pasting functions in EHR systems further compounds this by creating a large amount of redundant clinical information in clinical documents. A mixture of redundant information (especially outdated…
Evaluating Health Information Systems Using Ontologies.
Eivazzadeh, Shahryar; Anderberg, Peter; Larsson, Tobias C; Fricker, Samuel A; Berglund, Johan
2016-06-16
There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems. The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems-whether similar or heterogeneous-by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework. On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries. The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project. The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems.
Automated Extraction of Substance Use Information from Clinical Texts.
Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei; Arsoniadis, Elliot; Carter, Elizabeth W; Lindemann, Elizabeth; Sarkar, Indra Neil; Melton, Genevieve B
2015-01-01
Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.
[Study on infrared spectrum change of Ganoderma lucidum and its extracts].
Chen, Zao-Xin; Xu, Yong-Qun; Chen, Xiao-Kang; Huang, Dong-Lan; Lu, Wen-Guan
2013-05-01
From the determination of the infrared spectra of four substances (original ganoderma lucidum and ganoderma lucidum water extract, 95% ethanol extract and petroleum ether extract), it was found that the infrared spectrum can carry systematic chemical information and basically reflects the distribution of each component of the analyte. Ganoderma lucidum and its extracts can be distinguished according to the absorption peak area ratio of 3 416-3 279, 1 541 and 723 cm(-1) to 2 935-2 852 cm(-1). A method of calculating the information entropy of the sample set with Euclidean distance was proposed, the relationship between the information entropy and the amount of chemical information carried by the sample set was discussed, and the authors come to a conclusion that sample set of original ganoderma lucidum carry the most abundant chemical information. The infrared spectrum set of original ganoderma lucidum has better clustering effect on ganoderma atrum, Cyan ganoderma, ganoderma multiplicatum and ganoderma lucidum when making hierarchical cluster analysis of 4 sample set. The results show that infrared spectrum carries the chemical information of the material structure and closely relates to the chemical composition of the system. The higher the value of information entropy, the much richer the chemical information and the more the benefit for pattern recognition. This study has a guidance function to the construction of the sample set in pattern recognition.
NASA Astrophysics Data System (ADS)
Lee, Wendy
The advent of multisensory display systems, such as virtual and augmented reality, has fostered a new relationship between humans and space. Not only can these systems mimic real-world environments, they have the ability to create a new space typology made solely of data. In these spaces, two-dimensional information is displayed in three dimensions, requiring human senses to be used to understand virtual, attention-based elements. Studies in the field of big data have predominately focused on visual representations and extractions of information with little focus on sounds. The goal of this research is to evaluate the most efficient methods of perceptually extracting visual data using auditory stimuli in immersive environments. Using Rensselaer's CRAIVE-Lab, a virtual reality space with 360-degree panorama visuals and an array of 128 loudspeakers, participants were asked questions based on complex visual displays using a variety of auditory cues ranging from sine tones to camera shutter sounds. Analysis of the speed and accuracy of participant responses revealed that auditory cues that were more favorable for localization and were positively perceived were best for data extraction and could help create more user-friendly systems in the future.
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach
2012-01-01
Background Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. Methods We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. Results We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. Conclusions We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data. PMID:22759462
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.
Ratkovic, Zorana; Golik, Wiktoria; Warnier, Pierre
2012-06-26
Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data.
Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn
2009-01-01
The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.
Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron
2009-01-01
The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242
Multispectral system analysis through modeling and simulation
NASA Technical Reports Server (NTRS)
Malila, W. A.; Gleason, J. M.; Cicone, R. C.
1977-01-01
The design and development of multispectral remote sensor systems and associated information extraction techniques should be optimized under the physical and economic constraints encountered and yet be effective over a wide range of scene and environmental conditions. Direct measurement of the full range of conditions to be encountered can be difficult, time consuming, and costly. Simulation of multispectral data by modeling scene, atmosphere, sensor, and data classifier characteristics is set forth as a viable alternative, particularly when coupled with limited sets of empirical measurements. A multispectral system modeling capability is described. Use of the model is illustrated for several applications - interpretation of remotely sensed data from agricultural and forest scenes, evaluating atmospheric effects in Landsat data, examining system design and operational configuration, and development of information extraction techniques.
Multispectral system analysis through modeling and simulation
NASA Technical Reports Server (NTRS)
Malila, W. A.; Gleason, J. M.; Cicone, R. C.
1977-01-01
The design and development of multispectral remote sensor systems and associated information extraction techniques should be optimized under the physical and economic constraints encountered and yet be effective over a wide range of scene and environmental conditions. Direct measurement of the full range of conditions to be encountered can be difficult, time consuming, and costly. Simulation of multispectral data by modeling scene, atmosphere, sensor, and data classifier characteristics is set forth as a viable alternative, particularly when coupled with limited sets of empirical measurements. A multispectral system modeling capability is described. Use of the model is illustrated for several applications - interpretation of remotely sensed data from agricultural and forest scenes, evaluating atmospheric effects in LANDSAT data, examining system design and operational configuration, and development of information extraction techniques.
Evaluating Health Information Systems Using Ontologies
Anderberg, Peter; Larsson, Tobias C; Fricker, Samuel A; Berglund, Johan
2016-01-01
Background There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems. Objectives The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems—whether similar or heterogeneous—by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework. Methods On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries. Results The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project. Conclusions The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems. PMID:27311735
An Experimental Study of an Ultra-Mobile Vehicle for Off-Road Transportation.
1983-07-01
implemented. 2.2.3 Image Processing Algorithms The ultimate goal of a vision system is to understand the content of a scene and to extract useful...to extract useful information from it. Four existing robot-vision systems, the General Motors CONSIGHT system, the UNIVISIUN system, the Westinghouse...cos + C . sino A (5.48) By taking out a comon factor, Eq. (5.48) can be rewritten as /-- c. ( B coso + C sine) A (5.49) 203 !_ - Let Z B sie4 = : v, VB2
Developing Data Systems To Support the Analysis and Development of Large-Scale, On-Line Assessment.
ERIC Educational Resources Information Center
Yu, Chong Ho
Today many data warehousing systems are data rich, but information poor. Extracting useful information from an ocean of data to support administrative, policy, and instructional decisions becomes a major challenge to both database designers and measurement specialists. This paper focuses on the development of a data processing system that…
AAlAbdulsalam, Abdulrahman K.; Garvin, Jennifer H.; Redd, Andrew; Carter, Marjorie E.; Sweeny, Carol; Meystre, Stephane M.
2018-01-01
Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). PMID:29888032
AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M
2018-01-01
Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).
NASA Astrophysics Data System (ADS)
Loredana Soran, Maria; Codruta Cobzac, Simona; Varodi, Codruta; Lung, Ildiko; Surducan, Emanoil; Surducan, Vasile
2009-08-01
Three different techniques (maceration, sonication and extraction in microwave field) were used for extraction of essential oils from Ocimum basilicum L. The extracts were analyzed by TLC/HPTLC technique and the fingerprint informations were obtained. The GC-FID was used to characterized the extraction efficiency and for identify the terpenic bioactive compounds. The most efficient extraction technique was maceration followed by microwave and ultrasound. The best extraction solvent system was ethyl ether + ethanol (1:1, v/v). The main compounds identified in Ocimum basilicum L. extracts were: α and β-pinene (mixture), limonene, citronellol, and geraniol.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petrie, G.M.; Perry, E.M.; Kirkham, R.R.
1997-09-01
This report describes the work performed at the Pacific Northwest National Laboratory (PNNL) for the U.S. Department of Energy`s Office of Nonproliferation and National Security, Office of Research and Development (NN-20). The work supports the NN-20 Broad Area Search and Analysis, a program initiated by NN-20 to improve the detection and classification of undeclared weapons facilities. Ongoing PNNL research activities are described in three main components: image collection, information processing, and change analysis. The Multispectral Airborne Imaging System, which was developed to collect georeferenced imagery in the visible through infrared regions of the spectrum, and flown on a light aircraftmore » platform, will supply current land use conditions. The image information extraction software (dynamic clustering and end-member extraction) uses imagery, like the multispectral data collected by the PNNL multispectral system, to efficiently generate landcover information. The advanced change detection uses a priori (benchmark) information, current landcover conditions, and user-supplied rules to rank suspect areas by probable risk of undeclared facilities or proliferation activities. These components, both separately and combined, provide important tools for improving the detection of undeclared facilities.« less
Data base management system configuration specification. [computer storage devices
NASA Technical Reports Server (NTRS)
Neiers, J. W.
1979-01-01
The functional requirements and the configuration of the data base management system are described. Techniques and technology which will enable more efficient and timely transfer of useful data from the sensor to the user, extraction of information by the user, and exchange of information among the users are demonstrated.
Validity of association rules extracted by healthcare-data-mining.
Takeuchi, Hiroshi; Kodama, Naoki
2014-01-01
A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.
Pathak, Jyotishman; Bailey, Kent R; Beebe, Calvin E; Bethard, Steven; Carrell, David S; Chen, Pei J; Dligach, Dmitriy; Endle, Cory M; Hart, Lacey A; Haug, Peter J; Huff, Stanley M; Kaggal, Vinod C; Li, Dingcheng; Liu, Hongfang; Marchant, Kyle; Masanz, James; Miller, Timothy; Oniki, Thomas A; Palmer, Martha; Peterson, Kevin J; Rea, Susan; Savova, Guergana K; Stancl, Craig R; Sohn, Sunghwan; Solbrig, Harold R; Suesse, Dale B; Tao, Cui; Taylor, David P; Westberg, Les; Wu, Stephen; Zhuo, Ning; Chute, Christopher G
2013-01-01
Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems—Mayo Clinic and Intermountain Healthcare—were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines—namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)—we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts. PMID:24190931
3D local feature BKD to extract road information from mobile laser scanning point clouds
NASA Astrophysics Data System (ADS)
Yang, Bisheng; Liu, Yuan; Dong, Zhen; Liang, Fuxun; Li, Bijun; Peng, Xiangyang
2017-08-01
Extracting road information from point clouds obtained through mobile laser scanning (MLS) is essential for autonomous vehicle navigation, and has hence garnered a growing amount of research interest in recent years. However, the performance of such systems is seriously affected due to varying point density and noise. This paper proposes a novel three-dimensional (3D) local feature called the binary kernel descriptor (BKD) to extract road information from MLS point clouds. The BKD consists of Gaussian kernel density estimation and binarization components to encode the shape and intensity information of the 3D point clouds that are fed to a random forest classifier to extract curbs and markings on the road. These are then used to derive road information, such as the number of lanes, the lane width, and intersections. In experiments, the precision and recall of the proposed feature for the detection of curbs and road markings on an urban dataset and a highway dataset were as high as 90%, thus showing that the BKD is accurate and robust against varying point density and noise.
Design and Development of the Terrain Information Extraction System
1990-09-04
system successfully demonstrated relief measurement and orthophoto production, automated feature extraction has remained "the major problem of today’s...the hierarchical relaxation correlation method developed by Helava Associates, Inc. and digital orthophoto production. To achieve this high accuracy...image memory transfer rates will be achieved by using data blocks or "image tiles ." Further, an image fringe loading module will be implemented which
A Discriminant Distance Based Composite Vector Selection Method for Odor Classification
Choi, Sang-Il; Jeong, Gu-Min
2014-01-01
We present a composite vector selection method for an effective electronic nose system that performs well even in noisy environments. Each composite vector generated from a electronic nose data sample is evaluated by computing the discriminant distance. By quantitatively measuring the amount of discriminative information in each composite vector, composite vectors containing informative variables can be distinguished and the final composite features for odor classification are extracted using the selected composite vectors. Using the only informative composite vectors can be also helpful to extract better composite features instead of using all the generated composite vectors. Experimental results with different volatile organic compound data show that the proposed system has good classification performance even in a noisy environment compared to other methods. PMID:24747735
NASA Astrophysics Data System (ADS)
Rüther, Heinz; Martine, Hagai M.; Mtalo, E. G.
This paper presents a novel approach to semiautomatic building extraction in informal settlement areas from aerial photographs. The proposed approach uses a strategy of delineating buildings by optimising their approximate building contour position. Approximate building contours are derived automatically by locating elevation blobs in digital surface models. Building extraction is then effected by means of the snakes algorithm and the dynamic programming optimisation technique. With dynamic programming, the building contour optimisation problem is realized through a discrete multistage process and solved by the "time-delayed" algorithm, as developed in this work. The proposed building extraction approach is a semiautomatic process, with user-controlled operations linking fully automated subprocesses. Inputs into the proposed building extraction system are ortho-images and digital surface models, the latter being generated through image matching techniques. Buildings are modeled as "lumps" or elevation blobs in digital surface models, which are derived by altimetric thresholding of digital surface models. Initial windows for building extraction are provided by projecting the elevation blobs centre points onto an ortho-image. In the next step, approximate building contours are extracted from the ortho-image by region growing constrained by edges. Approximate building contours thus derived are inputs into the dynamic programming optimisation process in which final building contours are established. The proposed system is tested on two study areas: Marconi Beam in Cape Town, South Africa, and Manzese in Dar es Salaam, Tanzania. Sixty percent of buildings in the study areas have been extracted and verified and it is concluded that the proposed approach contributes meaningfully to the extraction of buildings in moderately complex and crowded informal settlement areas.
A Risk Assessment System with Automatic Extraction of Event Types
NASA Astrophysics Data System (ADS)
Capet, Philippe; Delavallade, Thomas; Nakamura, Takuya; Sandor, Agnes; Tarsitano, Cedric; Voyatzi, Stavroula
In this article we describe the joint effort of experts in linguistics, information extraction and risk assessment to integrate EventSpotter, an automatic event extraction engine, into ADAC, an automated early warning system. By detecting as early as possible weak signals of emerging risks ADAC provides a dynamic synthetic picture of situations involving risk. The ADAC system calculates risk on the basis of fuzzy logic rules operated on a template graph whose leaves are event types. EventSpotter is based on a general purpose natural language dependency parser, XIP, enhanced with domain-specific lexical resources (Lexicon-Grammar). Its role is to automatically feed the leaves with input data.
The ISES: A non-intrusive medium for in-space experiments in on-board information extraction
NASA Technical Reports Server (NTRS)
Murray, Nicholas D.; Katzberg, Stephen J.; Nealy, Mike
1990-01-01
The Information Science Experiment System (ISES) represents a new approach in applying advanced systems technology and techniques to on-board information extraction in the space environment. Basically, what is proposed is a 'black box' attached to the spacecraft data bus or local area network. To the spacecraft the 'black box' appears to be just another payload requiring power, heat rejection, interfaces, adding weight, and requiring time on the data management and communication system. In reality, the 'black box' is a programmable computational resource which eavesdrops on the data network, taking and producing selectable, real-time science data back on the network. This paper will present a brief overview of the ISES Concept and will discuss issues related to applying the ISES to the polar platform and Space Station Freedom. Critical to the operation of ISES is the viability of a payload-like interface to the spacecraft data bus or local area network. Study results that address this question will be reviewed vis-a-vis the solar platform and the core space station. Also, initial results of processing science and other requirements for onboard, real-time information extraction will be presented with particular emphasis on the polar platform. Opportunities for a broader range of applications on the core space station will also be discussed.
Shimazaki, Kei-ichi; Kushida, Tatsuya
2010-06-01
Lactoferrin is a multi-functional metal-binding glycoprotein that exhibits many biological functions of interest to many researchers from the fields of clinical medicine, dentistry, pharmacology, veterinary medicine, nutrition and milk science. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public data repositories. However, as the literature is expanding daily, this presents challenges in understanding the larger picture of lactoferrin function and mechanisms. In order to overcome the "analysis paralysis" associated with lactoferrin information, we attempted to apply a text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo). This information extraction system uses natural language processing and text mining technology. This system analyzes the sentences and titles from abstracts stored in the PubMed database, and can automatically extract binary relations that consist of interactions between genes/proteins, chemicals and diseases/functions. We expect that such information visualization analysis will be useful in determining novel relationships among a multitude of lactoferrin functions and mechanisms. We have demonstrated the utilization of this method to find pathways of lactoferrin participation in neovascularization, Helicobacter pylori attack on gastric mucosa, atopic dermatitis and lipid metabolism.
Photon extraction and conversion for scalable ion-trap quantum computing
NASA Astrophysics Data System (ADS)
Clark, Susan; Benito, Francisco; McGuinness, Hayden; Stick, Daniel
2014-03-01
Trapped ions represent one of the most mature and promising systems for quantum information processing. They have high-fidelity one- and two-qubit gates, long coherence times, and their qubit states can be reliably prepared and detected. Taking advantage of these inherent qualities in a system with many ions requires a means of entangling spatially separated ion qubits. One architecture achieves this entanglement through the use of emitted photons to distribute quantum information - a favorable strategy if photon extraction can be made efficient and reliable. Here I present results for photon extraction from an ion in a cavity formed by integrated optics on a surface trap, as well as results in frequency converting extracted photons for long distance transmission or interfering with photons from other types of optically active qubits. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U. S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
ERIC Educational Resources Information Center
Chowdhury, Gobinda G.
2003-01-01
Discusses issues related to natural language processing, including theoretical developments; natural language understanding; tools and techniques; natural language text processing systems; abstracting; information extraction; information retrieval; interfaces; software; Internet, Web, and digital library applications; machine translation for…
Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S
2016-10-01
Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.
Automated extraction of radiation dose information for CT examinations.
Cook, Tessa S; Zimmerman, Stefan; Maidment, Andrew D A; Kim, Woojin; Boonn, William W
2010-11-01
Exposure to radiation as a result of medical imaging is currently in the spotlight, receiving attention from Congress as well as the lay press. Although scanner manufacturers are moving toward including effective dose information in the Digital Imaging and Communications in Medicine headers of imaging studies, there is a vast repository of retrospective CT data at every imaging center that stores dose information in an image-based dose sheet. As such, it is difficult for imaging centers to participate in the ACR's Dose Index Registry. The authors have designed an automated extraction system to query their PACS archive and parse CT examinations to extract the dose information stored in each dose sheet. First, an open-source optical character recognition program processes each dose sheet and converts the information to American Standard Code for Information Interchange (ASCII) text. Each text file is parsed, and radiation dose information is extracted and stored in a database which can be queried using an existing pathology and radiology enterprise search tool. Using this automated extraction pipeline, it is possible to perform dose analysis on the >800,000 CT examinations in the PACS archive and generate dose reports for all of these patients. It is also possible to more effectively educate technologists, radiologists, and referring physicians about exposure to radiation from CT by generating report cards for interpreted and performed studies. The automated extraction pipeline enables compliance with the ACR's reporting guidelines and greater awareness of radiation dose to patients, thus resulting in improved patient care and management. Copyright © 2010 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Botsis, Taxiarchis; Foster, Matthew; Arya, Nina; Kreimeyer, Kory; Pandey, Abhishek; Arya, Deepa
2017-04-26
To evaluate the feasibility of automated dose and adverse event information retrieval in supporting the identification of safety patterns. We extracted all rabbit Anti-Thymocyte Globulin (rATG) reports submitted to the United States Food and Drug Administration Adverse Event Reporting System (FAERS) from the product's initial licensure in April 16, 1984 through February 8, 2016. We processed the narratives using the Medication Extraction (MedEx) and the Event-based Text-mining of Health Electronic Records (ETHER) systems and retrieved the appropriate medication, clinical, and temporal information. When necessary, the extracted information was manually curated. This process resulted in a high quality dataset that was analyzed with the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA) to explore the association of rATG dosing with post-transplant lymphoproliferative disorder (PTLD). Although manual curation was necessary to improve the data quality, MedEx and ETHER supported the extraction of the appropriate information. We created a final dataset of 1,380 cases with complete information for rATG dosing and date of administration. Analysis in PANACEA found that PTLD was associated with cumulative doses of rATG >8 mg/kg, even in periods where most of the submissions to FAERS reported low doses of rATG. We demonstrated the feasibility of investigating a dose-related safety pattern for a particular product in FAERS using a set of automated tools.
Development of Mobile Mapping System for 3D Road Asset Inventory.
Sairam, Nivedita; Nagarajan, Sudhagar; Ornitz, Scott
2016-03-12
Asset Management is an important component of an infrastructure project. A significant cost is involved in maintaining and updating the asset information. Data collection is the most time-consuming task in the development of an asset management system. In order to reduce the time and cost involved in data collection, this paper proposes a low cost Mobile Mapping System using an equipped laser scanner and cameras. First, the feasibility of low cost sensors for 3D asset inventory is discussed by deriving appropriate sensor models. Then, through calibration procedures, respective alignments of the laser scanner, cameras, Inertial Measurement Unit and GPS (Global Positioning System) antenna are determined. The efficiency of this Mobile Mapping System is experimented by mounting it on a truck and golf cart. By using derived sensor models, geo-referenced images and 3D point clouds are derived. After validating the quality of the derived data, the paper provides a framework to extract road assets both automatically and manually using techniques implementing RANSAC plane fitting and edge extraction algorithms. Then the scope of such extraction techniques along with a sample GIS (Geographic Information System) database structure for unified 3D asset inventory are discussed.
Development of Mobile Mapping System for 3D Road Asset Inventory
Sairam, Nivedita; Nagarajan, Sudhagar; Ornitz, Scott
2016-01-01
Asset Management is an important component of an infrastructure project. A significant cost is involved in maintaining and updating the asset information. Data collection is the most time-consuming task in the development of an asset management system. In order to reduce the time and cost involved in data collection, this paper proposes a low cost Mobile Mapping System using an equipped laser scanner and cameras. First, the feasibility of low cost sensors for 3D asset inventory is discussed by deriving appropriate sensor models. Then, through calibration procedures, respective alignments of the laser scanner, cameras, Inertial Measurement Unit and GPS (Global Positioning System) antenna are determined. The efficiency of this Mobile Mapping System is experimented by mounting it on a truck and golf cart. By using derived sensor models, geo-referenced images and 3D point clouds are derived. After validating the quality of the derived data, the paper provides a framework to extract road assets both automatically and manually using techniques implementing RANSAC plane fitting and edge extraction algorithms. Then the scope of such extraction techniques along with a sample GIS (Geographic Information System) database structure for unified 3D asset inventory are discussed. PMID:26985897
Jonnagaddala, Jitendra; Liaw, Siaw-Teng; Ray, Pradeep; Kumar, Manish; Dai, Hong-Jie; Hsu, Chien-Yeh
2015-01-01
Heart disease is the leading cause of death worldwide. Therefore, assessing the risk of its occurrence is a crucial step in predicting serious cardiac events. Identifying heart disease risk factors and tracking their progression is a preliminary step in heart disease risk assessment. A large number of studies have reported the use of risk factor data collected prospectively. Electronic health record systems are a great resource of the required risk factor data. Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records. In this study, we present an information extraction system to extract related information on heart disease risk factors from unstructured clinical notes using a hybrid approach. The hybrid approach employs both machine learning and rule-based clinical text mining techniques. The developed system achieved an overall microaveraged F-score of 0.8302.
NASA Earth Resources Survey Symposium. Volume 1-B: Geology, Information Systems and Services
NASA Technical Reports Server (NTRS)
1975-01-01
A symposium was conducted on the practical applications of earth resources survey technology including utilization and results of data from programs involving LANDSAT, the Skylab earth resources experiment package, and aircraft. Topics discussed include geological structure, landform surveys, energy and extractive resources, and information systems and services.
DesAutels, Spencer J; Fox, Zachary E; Giuse, Dario A; Williams, Annette M; Kou, Qing-Hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia
2016-01-01
Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems.
NASA Astrophysics Data System (ADS)
Skersys, Tomas; Butleris, Rimantas; Kapocius, Kestutis
2013-10-01
Approaches for the analysis and specification of business vocabularies and rules are very relevant topics in both Business Process Management and Information Systems Development disciplines. However, in common practice of Information Systems Development, the Business modeling activities still are of mostly empiric nature. In this paper, basic aspects of the approach for business vocabularies' semi-automated extraction from business process models are presented. The approach is based on novel business modeling-level OMG standards "Business Process Model and Notation" (BPMN) and "Semantics for Business Vocabularies and Business Rules" (SBVR), thus contributing to OMG's vision about Model-Driven Architecture (MDA) and to model-driven development in general.
NASA Technical Reports Server (NTRS)
Groom, N. J.
1979-01-01
The rim inertial measuring system (RIMS) is introduced and an approach for extracting angular rate and linear acceleration information from a RIMS unit is presented and discussed. The RIMS consists of one or more small annular momentum control devices (AMCDs), mounted in a strapped down configuration, which are used to measure angular rates and linear accelerations of a moving vehicle. An AMCD consists of a spinning rim, a set of noncontacting magnetic bearings for supporting the rim, and a noncontacting electromagnetic spin motor. The approach for extracting angular rate and linear acceleration information is for a single spacecraft mounted RIMS unit.
Rapid automatic keyword extraction for information retrieval and analysis
Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA
2012-03-06
Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.
Procedure for extraction of disparate data from maps into computerized data bases
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1979-01-01
A procedure is presented for extracting disparate sources of data from geographic maps and for the conversion of these data into a suitable format for processing on a computer-oriented information system. Several graphic digitizing considerations are included and related to the NASA Earth Resources Laboratory's Digitizer System. Current operating procedures for the Digitizer System are given in a simplified and logical manner. The report serves as a guide to those organizations interested in converting map-based data by using a comparable map digitizing system.
Optimal extraction of quasar Lyman limit absorption systems from the IUE archive
NASA Technical Reports Server (NTRS)
Tytler, David
1992-01-01
The IUE archive contains a wealth of information on Lyman limit absorption systems (LLS) in quasar spectra. QSO spectra from the IUE data base were optimally extracted, coadded, and analyzed to yield a homogeneous samples of LLS at low red shifts. This sample comprise 36 LLS, twice the number previously analyzed low z samples. These systems are ideal for the determination of the origin, redshift evolution, ionization, velocity dispersions and the metal abundances of absorption systems. Two of them are also excellent targets for the primordial Deuterium to Hydrogen ratio.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Zhaoqing; Wang, Taiping; Copping, Andrea E.
Understanding and providing proactive information on the potential for tidal energy projects to cause changes to the physical system and to key water quality constituents in tidal waters is a necessary and cost-effective means to avoid costly regulatory involvement and late stage surprises in the permitting process. This paper presents a modeling study for evaluating the tidal energy extraction and its potential impacts on the marine environment in a real world site - Tacoma Narrows of Puget Sound, Washington State, USA. An unstructured-grid coastal ocean model, fitted with a module that simulates tidal energy devices, was applied to simulate themore » tidal energy extracted by different turbine array configurations and the potential effects of the extraction at local and system-wide scales in Tacoma Narrows and South Puget Sound. Model results demonstrated the advantage of an unstructured-grid model for simulating the far-field effects of tidal energy extraction in a large model domain, as well as assessing the near-field effect using a fine grid resolution near the tidal turbines. The outcome shows that a realistic near-term deployment scenario extracts a very small fraction of the total tidal energy in the system and that system wide environmental effects are not likely; however, near-field effects on the flow field and bed shear stress in the area of tidal turbine farm are more likely. Model results also indicate that from a practical standpoint, hydrodynamic or water quality effects are not likely to be the limiting factor for development of large commercial-scale tidal farms. Results indicate that very high numbers of turbines are required to significantly alter the tidal system; limitations on marine space or other environmental concerns are likely to be reached before reaching these deployment levels. These findings show that important information obtained from numerical modeling can be used to inform regulatory and policy processes for tidal energy development.« less
Botsis, T; Woo, E J; Ball, R
2013-01-01
We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority.
The tool extracts deep phenotypic information from the clinical narrative at the document-, episode-, and patient-level. The final output is FHIR compliant patient-level phenotypic summary which can be consumed by research warehouses or the DeepPhe native visualization tool.
NASA Technical Reports Server (NTRS)
1975-01-01
The use of information from space systems in the operation of extractive industries, particularly in exploration for mineral and fuel resources was reviewed. Conclusions and recommendations reported are based on the fundamental premise that survival of modern industrial society requires a continuing secure flow of resources for energy, construction and manufacturing, and for use as plant foods.
Plant-uptake of uranium: Hydroponic and soil system studies
Ramaswami, A.; Carr, P.; Burkhardt, M.
2001-01-01
Limited information is available on screening and selection of terrestrial plants for uptake and translocation of uranium from soil. This article evaluates the removal of uranium from water and soil by selected plants, comparing plant performance in hydroponic systems with that in two soil systems (a sandy-loam soil and an organic-rich soil). Plants selected for this study were Sunflower (Helianthus giganteus), Spring Vetch (Vicia sativa), Hairy Vetch (Vicia villosa), Juniper (Juniperus monosperma), Indian Mustard (Brassica juncea), and Bush Bean (Phaseolus nanus). Plant performance was evaluated both in terms of the percent uranium extracted from the three systems, as well as the biological absorption coefficient (BAC) that normalized uranium uptake to plant biomass. Study results indicate that uranium extraction efficiency decreased sharply across hydroponic, sandy and organic soil systems, indicating that soil organic matter sequestered uranium, rendering it largely unavailable for plant uptake. These results indicate that site-specific soils must be used to screen plants for uranium extraction capability; plant behavior in hydroponic systems does not correlate well with that in soil systems. One plant species, Juniper, exhibited consistent uranium extraction efficiencies and BACs in both sandy and organic soils, suggesting unique uranium extraction capabilities.
Zhou, Li; Plasek, Joseph M; Mahoney, Lisa M; Karipineni, Neelima; Chang, Frank; Yan, Xuemin; Chang, Fenny; Dimaggio, Dana; Goldman, Debora S.; Rocha, Roberto A.
2011-01-01
Clinical information is often coded using different terminologies, and therefore is not interoperable. Our goal is to develop a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Evaluators manually reviewed 30 free-text and 10 structured outpatient clinical notes compared to MTERMS output. MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively for medication and temporal information. The local medication terminology had 83.0% coverage compared to RxNorm’s 98.0% coverage for free-text notes. 61.6% of mappings between the terminologies are exact match. Capture of duration was significantly improved (91.7% vs. 52.5%) from systems in the third i2b2 challenge. PMID:22195230
MIMS - MEDICAL INFORMATION MANAGEMENT SYSTEM
NASA Technical Reports Server (NTRS)
Frankowski, J. W.
1994-01-01
MIMS, Medical Information Management System is an interactive, general purpose information storage and retrieval system. It was first designed to be used in medical data management, and can be used to handle all aspects of data related to patient care. Other areas of application for MIMS include: managing occupational safety data in the public and private sectors; handling judicial information where speed and accuracy are high priorities; systemizing purchasing and procurement systems; and analyzing organizational cost structures. Because of its free format design, MIMS can offer immediate assistance where manipulation of large data bases is required. File structures, data categories, field lengths and formats, including alphabetic and/or numeric, are all user defined. The user can quickly and efficiently extract, display, and analyze the data. Three means of extracting data are provided: certain short items of information, such as social security numbers, can be used to uniquely identify each record for quick access; records can be selected which match conditions defined by the user; and specific categories of data can be selected. Data may be displayed and analyzed in several ways which include: generating tabular information assembled from comparison of all the records on the system; generating statistical information on numeric data such as means, standard deviations and standard errors; and displaying formatted listings of output data. The MIMS program is written in Microsoft FORTRAN-77. It was designed to operate on IBM Personal Computers and compatibles running under PC or MS DOS 2.00 or higher. MIMS was developed in 1987.
Zhou, Li; Friedman, Carol; Parsons, Simon; Hripcsak, George
2005-01-01
Exploring temporal information in narrative Electronic Medical Records (EMRs) is essential and challenging. We propose an architecture for an integrated approach to process temporal information in clinical narrative reports. The goal is to initiate and build a foundation that supports applications which assist healthcare practice and research by including the ability to determine the time of clinical events (e.g., past vs. present). Key components include: (1) a temporal constraint structure for temporal expressions and the development of an associated tagger; (2) a Natural Language Processing (NLP) system for encoding and extracting medical events and associating them with formalized temporal data; (3) a post-processor, with a knowledge-based subsystem to help discover implicit information, that resolves temporal expressions and deals with issues such as granularity and vagueness; and (4) a reasoning mechanism which models clinical reports as Simple Temporal Problems (STPs). PMID:16779164
Jenke, Dennis; Castner, James; Egert, Thomas; Feinberg, Tom; Hendricker, Alan; Houston, Christopher; Hunt, Desmond G; Lynch, Michael; Shaw, Arthur; Nicholas, Kumudini; Norwood, Daniel L; Paskiet, Diane; Ruberto, Michael; Smith, Edward J; Holcomb, Frank
2013-01-01
Polymeric and elastomeric materials are commonly encountered in medical devices and packaging systems used to manufacture, store, deliver, and/or administer drug products. Characterizing extractables from such materials is a necessary step in establishing their suitability for use in these applications. In this study, five individual materials representative of polymers and elastomers commonly used in packaging systems and devices were extracted under conditions and with solvents that are relevant to parenteral and ophthalmic drug products (PODPs). Extraction methods included elevated temperature sealed vessel extraction, sonication, refluxing, and Soxhlet extraction. Extraction solvents included a low-pH (pH = 2.5) salt mixture, a high-pH (pH = 9.5) phosphate buffer, a 1/1 isopropanol/water mixture, isopropanol, and hexane. The resulting extracts were chemically characterized via spectroscopic and chromatographic means to establish the metal/trace element and organic extractables profiles. Additionally, the test articles themselves were tested for volatile organic substances. The results of this testing established the extractables profiles of the test articles, which are reported herein. Trends in the extractables, and their estimated concentrations, as a function of the extraction and testing methodologies are considered in the context of the use of the test article in medical applications and with respect to establishing best demonstrated practices for extractables profiling of materials used in PODP-related packaging systems and devices. Plastic and rubber materials are commonly encountered in medical devices and packaging/delivery systems for drug products. Characterizing the extractables from these materials is an important part of determining that they are suitable for use. In this study, five materials representative of plastics and rubbers used in packaging and medical devices were extracted by several means, and the extracts were analytically characterized to establish each material's profile of extracted organic compounds and trace element/metals. This information was utilized to make generalizations about the appropriateness of the test methods and the appropriate use of the test materials.
Remote Sensing Information Sciences Research Group, year four
NASA Technical Reports Server (NTRS)
Estes, John E.; Smith, Terence; Star, Jeffrey L.
1987-01-01
The needs of the remote sensing research and application community which will be served by the Earth Observing System (EOS) and space station, including associated polar and co-orbiting platforms are examined. Research conducted was used to extend and expand existing remote sensing research activities in the areas of georeferenced information systems, machine assisted information extraction from image data, artificial intelligence, and vegetation analysis and modeling. Projects are discussed in detail.
NASA Technical Reports Server (NTRS)
1974-01-01
User models defined as any explicit process or procedure used to transform information extracted from remotely sensed data into a form useful as a resource management information input are discussed. The role of the user models as information, technological, and operations interfaces between the TERSSE and the resource managers is emphasized. It is recommended that guidelines and management strategies be developed for a systems approach to user model development.
Kinetics from Replica Exchange Molecular Dynamics Simulations.
Stelzl, Lukas S; Hummer, Gerhard
2017-08-08
Transitions between metastable states govern many fundamental processes in physics, chemistry and biology, from nucleation events in phase transitions to the folding of proteins. The free energy surfaces underlying these processes can be obtained from simulations using enhanced sampling methods. However, their altered dynamics makes kinetic and mechanistic information difficult or impossible to extract. Here, we show that, with replica exchange molecular dynamics (REMD), one can not only sample equilibrium properties but also extract kinetic information. For systems that strictly obey first-order kinetics, the procedure to extract rates is rigorous. For actual molecular systems whose long-time dynamics are captured by kinetic rate models, accurate rate coefficients can be determined from the statistics of the transitions between the metastable states at each replica temperature. We demonstrate the practical applicability of the procedure by constructing master equation (Markov state) models of peptide and RNA folding from REMD simulations.
NASA Technical Reports Server (NTRS)
Estes, John E.; Smith, Terence; Star, Jeffrey L.
1987-01-01
Information Sciences Research Group (ISRG) research continues to focus on improving the type, quantity, and quality of information which can be derived from remotely sensed data. Particular focus in on the needs of the remote sensing research and application science community which will be served by the Earth Observing System (EOS) and Space Station, including associated polar and co-orbiting platforms. The areas of georeferenced information systems, machine assisted information extraction from image data, artificial intelligence and both natural and cultural vegetation analysis and modeling research will be expanded.
NASA Technical Reports Server (NTRS)
Billingsley, F.
1982-01-01
Concerns are expressed about the data handling aspects of system design and about enabling technology for data handling and data analysis. The status, contributing factors, critical issues, and recommendations for investigations are listed for data handling, rectification and registration, and information extraction. Potential supports to individual P.I., research tasks, systematic data system design, and to system operation. The need for an airborne spectrometer class instrument for fundamental research in high spectral and spatial resolution is indicated. Geographic information system formatting and labelling techniques, very large scale integration, and methods for providing multitype data sets must also be developed.
Medem, Anna V; Seidling, Hanna M; Eichler, Hans-Georg; Kaltschmidt, Jens; Metzner, Michael; Hubert, Carina M; Czock, David; Haefeli, Walter E
2017-05-01
Electronic clinical decision support systems (CDSS) require drug information that can be processed by computers. The goal of this project was to determine and evaluate a compilation of variables that comprehensively capture the information contained in the summary of product characteristic (SmPC) and unequivocally describe the drug, its dosage options, and clinical pharmacokinetics. An expert panel defined and structured a set of variables and drafted a guideline to extract and enter information on dosage and clinical pharmacokinetics from textual SmPCs as published by the European Medicines Agency (EMA). The set of variables was iteratively revised and evaluated by data extraction and variable allocation of roughly 7% of all centrally approved drugs. The information contained in the SmPC was allocated to three information clusters consisting of 260 variables. The cluster "drug characterization" specifies the nature of the drug. The cluster "dosage" provides information on approved drug dosages and defines corresponding specific conditions. The cluster "clinical pharmacokinetics" includes pharmacokinetic parameters of relevance for dosing in clinical practice. A first evaluation demonstrated that, despite the complexity of the current free text SmPCs, dosage and pharmacokinetic information can be reliably extracted from the SmPCs and comprehensively described by a limited set of variables. By proposing a compilation of variables well describing drug dosage and clinical pharmacokinetics, the project represents a step forward towards the development of a comprehensive database system serving as information source for sophisticated CDSS.
Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G
2010-01-01
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853
Generating disease-pertinent treatment vocabularies from MEDLINE citations.
Wang, Liqin; Del Fiol, Guilherme; Bray, Bruce E; Haug, Peter J
2017-01-01
Healthcare communities have identified a significant need for disease-specific information. Disease-specific ontologies are useful in assisting the retrieval of disease-relevant information from various sources. However, building these ontologies is labor intensive. Our goal is to develop a system for an automated generation of disease-pertinent concepts from a popular knowledge resource for the building of disease-specific ontologies. A pipeline system was developed with an initial focus of generating disease-specific treatment vocabularies. It was comprised of the components of disease-specific citation retrieval, predication extraction, treatment predication extraction, treatment concept extraction, and relevance ranking. A semantic schema was developed to support the extraction of treatment predications and concepts. Four ranking approaches (i.e., occurrence, interest, degree centrality, and weighted degree centrality) were proposed to measure the relevance of treatment concepts to the disease of interest. We measured the performance of four ranks in terms of the mean precision at the top 100 concepts with five diseases, as well as the precision-recall curves against two reference vocabularies. The performance of the system was also compared to two baseline approaches. The pipeline system achieved a mean precision of 0.80 for the top 100 concepts with the ranking by interest. There were no significant different among the four ranks (p=0.53). However, the pipeline-based system had significantly better performance than the two baselines. The pipeline system can be useful for an automated generation of disease-relevant treatment concepts from the biomedical literature. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Proux, Denys; Segond, Frédérique; Gerbier, Solweig; Metzger, Marie Hélène
Hospital Acquired Infections (HAI) is a real burden for doctors and risk surveillance experts. The impact on patients' health and related healthcare cost is very significant and a major concern even for rich countries. Furthermore required data to evaluate the threat is generally not available to experts and that prevents from fast reaction. However, recent advances in Computational Intelligence Techniques such as Information Extraction, Risk Patterns Detection in documents and Decision Support Systems allow now to address this problem.
Full-field optical coherence tomography used for security and document identity
NASA Astrophysics Data System (ADS)
Chang, Shoude; Mao, Youxin; Sherif, Sherif; Flueraru, Costel
2006-09-01
The optical coherence tomography (OCT) is an emerging technology for high-resolution cross-sectional imaging of 3D structures. In the past years, OCT systems have been used mainly for medical, especially ophthalmological diagnostics. Concerning the nature of OCT system being capable to explore the internal features of an object, we apply the OCT technology to directly retrieve the 2D information pre-stored in a multiple-layer information carrier. The standard depth-resolution of an OCT system is at micrometer level. If a 20mm by 20mm sampling area with a 1024 x 1024 CCD array is used in the OCT system having 10 μm, an information carrier having a volume of 20mm x 20mm x 2mm could contain 200 Mega-pixel images. Because of its tiny size and large information volume, the information carrier, with its OCT retrieving system, will have potential applications in documents security and object identification. In addition, as the information carrier can be made by low-scattering transparent material, the signal/noise ratio will be improved dramatically. As a consequence, the specific hardware and complicated software can also be greatly simplified. Owing to non-scanning along X-Y axis, the full-field OCT could be the simplest and most economic imaging system for extracting information from such a multilayer information carrier. In this paper, deign and implementation of a full-field OCT system is described and the related algorithms are introduced. In our experiments, a four layers information carrier is used, which contains 4 layers of image pattern, two text images and two fingerprint images. The extracted tomography images of each layer are also provided.
NASA Astrophysics Data System (ADS)
Li, Da; Cheung, Chifai; Zhao, Xing; Ren, Mingjun; Zhang, Juan; Zhou, Liqiu
2016-10-01
Autostereoscopy based three-dimensional (3D) digital reconstruction has been widely applied in the field of medical science, entertainment, design, industrial manufacture, precision measurement and many other areas. The 3D digital model of the target can be reconstructed based on the series of two-dimensional (2D) information acquired by the autostereoscopic system, which consists multiple lens and can provide information of the target from multiple angles. This paper presents a generalized and precise autostereoscopic three-dimensional (3D) digital reconstruction method based on Direct Extraction of Disparity Information (DEDI) which can be used to any transform autostereoscopic systems and provides accurate 3D reconstruction results through error elimination process based on statistical analysis. The feasibility of DEDI method has been successfully verified through a series of optical 3D digital reconstruction experiments on different autostereoscopic systems which is highly efficient to perform the direct full 3D digital model construction based on tomography-like operation upon every depth plane with the exclusion of the defocused information. With the absolute focused information processed by DEDI method, the 3D digital model of the target can be directly and precisely formed along the axial direction with the depth information.
NASA Astrophysics Data System (ADS)
Arozarena, A.; Villa, G.; Valcárcel, N.; Pérez, B.
2016-06-01
Remote sensing satellites, together with aerial and terrestrial platforms (mobile and fixed), produce nowadays huge amounts of data coming from a wide variety of sensors. These datasets serve as main data sources for the extraction of Geospatial Reference Information (GRI), constituting the "skeleton" of any Spatial Data Infrastructure (SDI). Since very different situations can be found around the world in terms of geographic information production and management, the generation of global GRI datasets seems extremely challenging. Remotely sensed data, due to its wide availability nowadays, is able to provide fundamental sources for any production or management system present in different countries. After several automatic and semiautomatic processes including ancillary data, the extracted geospatial information is ready to become part of the GRI databases. In order to optimize these data flows for the production of high quality geospatial information and to promote its use to address global challenges several initiatives at national, continental and global levels have been put in place, such as European INSPIRE initiative and Copernicus Programme, and global initiatives such as the Group on Earth Observation/Global Earth Observation System of Systems (GEO/GEOSS) and United Nations Global Geospatial Information Management (UN-GGIM). These workflows are established mainly by public organizations, with the adequate institutional arrangements at national, regional or global levels. Other initiatives, such as Volunteered Geographic Information (VGI), on the other hand may contribute to maintain the GRI databases updated. Remotely sensed data hence becomes one of the main pillars underpinning the establishment of a global SDI, as those datasets will be used by public agencies or institutions as well as by volunteers to extract the required spatial information that in turn will feed the GRI databases. This paper intends to provide an example of how institutional arrangements and cooperative production systems can be set up at any territorial level in order to exploit remotely sensed data in the most intensive manner, taking advantage of all its potential.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-19
... crash and inspection records. Data extract from the FMCSA Motor Carrier Management Information System... Transaction Records: Pursuant to GRS 24, ``Information Technology Operations and Management Records,'' Item 6... information that is created and used by the Department's Pre-Employment Screening program to provide...
Activities of the Remote Sensing Information Sciences Research Group
NASA Technical Reports Server (NTRS)
Estes, J. E.; Botkin, D.; Peuquet, D.; Smith, T.; Star, J. L. (Principal Investigator)
1984-01-01
Topics on the analysis and processing of remotely sensed data in the areas of vegetation analysis and modelling, georeferenced information systems, machine assisted information extraction from image data, and artificial intelligence are investigated. Discussions on support field data and specific applications of the proposed technologies are also included.
Road and Roadside Feature Extraction Using Imagery and LIDAR Data for Transportation Operation
NASA Astrophysics Data System (ADS)
Ural, S.; Shan, J.; Romero, M. A.; Tarko, A.
2015-03-01
Transportation agencies require up-to-date, reliable, and feasibly acquired information on road geometry and features within proximity to the roads as input for evaluating and prioritizing new or improvement road projects. The information needed for a robust evaluation of road projects includes road centerline, width, and extent together with the average grade, cross-sections, and obstructions near the travelled way. Remote sensing is equipped with a large collection of data and well-established tools for acquiring the information and extracting aforementioned various road features at various levels and scopes. Even with many remote sensing data and methods available for road extraction, transportation operation requires more than the centerlines. Acquiring information that is spatially coherent at the operational level for the entire road system is challenging and needs multiple data sources to be integrated. In the presented study, we established a framework that used data from multiple sources, including one-foot resolution color infrared orthophotos, airborne LiDAR point clouds, and existing spatially non-accurate ancillary road networks. We were able to extract 90.25% of a total of 23.6 miles of road networks together with estimated road width, average grade along the road, and cross sections at specified intervals. Also, we have extracted buildings and vegetation within a predetermined proximity to the extracted road extent. 90.6% of 107 existing buildings were correctly identified with 31% false detection rate.
Sieve-based relation extraction of gene regulatory networks from biological literature
2015-01-01
Background Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. Results We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Conclusions Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. PMID:26551454
Sieve-based relation extraction of gene regulatory networks from biological literature.
Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko
2015-01-01
Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.
Image processing and analysis using neural networks for optometry area
NASA Astrophysics Data System (ADS)
Netto, Antonio V.; Ferreira de Oliveira, Maria C.
2002-11-01
In this work we describe the framework of a functional system for processing and analyzing images of the human eye acquired by the Hartmann-Shack technique (HS), in order to extract information to formulate a diagnosis of eye refractive errors (astigmatism, hypermetropia and myopia). The analysis is to be carried out using an Artificial Intelligence system based on Neural Nets, Fuzzy Logic and Classifier Combination. The major goal is to establish the basis of a new technology to effectively measure ocular refractive errors that is based on methods alternative those adopted in current patented systems. Moreover, analysis of images acquired with the Hartmann-Shack technique may enable the extraction of additional information on the health of an eye under exam from the same image used to detect refraction errors.
User's guide to the SEPHIS computer code for calculating the Thorex solvent extraction system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Watson, S.B.; Rainey, R.H.
1979-05-01
The SEPHIS computer program was developed to simulate the countercurrent solvent extraction process. The code has now been adapted to model the Acid Thorex flow sheet. This report represents a practical user's guide to SEPHIS - Thorex containing a program description, user information, program listing, and sample input and output.
Scene text recognition in mobile applications by character descriptor and structure configuration.
Yi, Chucai; Tian, Yingli
2014-07-01
Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.
Speech information retrieval: a review
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hafen, Ryan P.; Henry, Michael J.
Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding andmore » to provide direction for further study.« less
Development of Availability and Sustainability Spares Optimization Models for Aircraft Reparables
2013-09-01
the integrated SAP ® Enterprise Resource Planning ( ERP ) information system of the RSAF. A more in-depth review of OPUS10 capabilities will be provided...Dynamic Multi-Echelon Technique for Recoverable Item Control EBO: Expected Backorder EOQ: Economic Order Quantity ERP : Enterprise Resource...particular, the propulsion sub-system was expanded to include SSRUs. Spares information are extracted from the RSAF ERP system and include: 22
Knowledge Acquisition of Generic Queries for Information Retrieval
Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.
2002-01-01
Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.
On the Limitations of Breakthrough Curve Analysis in Fixed-Bed Adsorption
NASA Technical Reports Server (NTRS)
Knox, James C.; Ebner, Armin D.; LeVan, M. Douglas; Coker, Robert F.; Ritter, James A.
2016-01-01
This work examined in detail the a priori prediction of the axial dispersion coefficient from available correlations versus obtaining it and also mass transfer information from experimental breakthrough data and the consequences that may arise when doing so based on using a 1-D axially dispersed plug flow model and its associated Danckwerts outlet boundary condition. These consequences mainly included determining the potential for erroneous extraction of the axial dispersion coefficient and/or the LDF mass transfer coefficient from experimental data, especially when non-plug flow conditions prevailed in the bed. Two adsorbent/adsorbate cases were considered, i.e., carbon dioxide and water vapor in zeolite 5A, because they both experimentally exhibited significant non-plug flow behavior, and the water-zeolite 5A system exhibited unusual concentration front sharpening that destroyed the expected constant pattern behavior (CPB) when modeled with the 1-D axially dispersed plug flow model. Overall, this work showed that it was possible to extract accurate mass transfer and dispersion information from experimental breakthrough curves using a 1-D axial dispersed plug flow model when they were measured both inside and outside the bed. To ensure the extracted information was accurate, the inside the bed breakthrough curves and their derivatives from the model were plotted to confirm whether or not the adsorbate/adsorbent system was exhibiting CPB or any concentration front sharpening near the bed exit. Even when concentration front sharpening was occurring with the water-zeolite 5A system, it was still possible to use the experimental inside and outside the bed breakthrough curves to extract fundamental mass transfer and dispersion information from the 1-D axial dispersed plug flow model based on the systematic methodology developed in this work.
Developing a disease outbreak event corpus.
Conway, Mike; Kawazoe, Ai; Chanlekha, Hutchatai; Collier, Nigel
2010-09-28
In recent years, there has been a growth in work on the use of information extraction technologies for tracking disease outbreaks from online news texts, yet publicly available evaluation standards (and associated resources) for this new area of research have been noticeably lacking. This study seeks to create a "gold standard" data set against which to test how accurately disease outbreak information extraction systems can identify the semantics of disease outbreak events. Additionally, we hope that the provision of an annotation scheme (and associated corpus) to the community will encourage open evaluation in this new and growing application area. We developed an annotation scheme for identifying infectious disease outbreak events in news texts. An event--in the context of our annotation scheme--consists minimally of geographical (eg, country and province) and disease name information. However, the scheme also allows for the rich encoding of other domain salient concepts (eg, international travel, species, and food contamination). The work resulted in a 200-document corpus of event-annotated disease outbreak reports that can be used to evaluate the accuracy of event detection algorithms (in this case, for the BioCaster biosurveillance online news information extraction system). In the 200 documents, 394 distinct events were identified (mean 1.97 events per document, range 0-25 events per document). We also provide a download script and graphical user interface (GUI)-based event browsing software to facilitate corpus exploration. In summary, we present an annotation scheme and corpus that can be used in the evaluation of disease outbreak event extraction algorithms. The annotation scheme and corpus were designed both with the particular evaluation requirements of the BioCaster system in mind as well as the wider need for further evaluation resources in this growing research area.
A biomedical information system for retrieval and manipulation of NHANES data.
Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A
2013-01-01
The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.
Using decision-tree classifier systems to extract knowledge from databases
NASA Technical Reports Server (NTRS)
St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.
1990-01-01
One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.
A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories.
Yang, Wei; Ai, Tinghua; Lu, Wei
2018-04-19
Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality.
A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories
Yang, Wei
2018-01-01
Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality. PMID:29671792
NASA Technical Reports Server (NTRS)
Estes, J. E.; Smith, T.; Star, J. L.
1986-01-01
Research continues to focus on improving the type, quantity, and quality of information which can be derived from remotely sensed data. The focus is on remote sensing and application for the Earth Observing System (Eos) and Space Station, including associated polar and co-orbiting platforms. The remote sensing research activities are being expanded, integrated, and extended into the areas of global science, georeferenced information systems, machine assissted information extraction from image data, and artificial intelligence. The accomplishments in these areas are examined.
A tutorial on information retrieval: basic terms and concepts
Zhou, Wei; Smalheiser, Neil R; Yu, Clement
2006-01-01
This informal tutorial is intended for investigators and students who would like to understand the workings of information retrieval systems, including the most frequently used search engines: PubMed and Google. Having a basic knowledge of the terms and concepts of information retrieval should improve the efficiency and productivity of searches. As well, this knowledge is needed in order to follow current research efforts in biomedical information retrieval and text mining that are developing new systems not only for finding documents on a given topic, but extracting and integrating knowledge across documents. PMID:16722601
Multichannel Doppler Processing for an Experimental Low-Angle Tracking System
1990-05-01
estimation techniques at sea. Because of clutter and noise, it is necessary to use a number of different processing algorithms to extract the required...a number of different processing algorithms to extract the required information. Consequently, the ELAT radar system is composed of multiple...corresponding to RF frequencies, f, and f2. For mode 3, the ambiguities occur at vbi = 15.186 knots and vb2 = 16.96 knots. The sea clutter, with a spectrum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiskoot, R.J.J.
Accurate and reliable sampling systems are imperative when confirming natural gas' commercial value. Buyers and sellers need accurate hydrocarbon-composition information to conduct fair sale transactions. Because of poor sample extraction, preparation or analysis can invalidate the sale, more attention should be directed toward improving representative sampling. Consider all sampling components, i.e., gas types, line pressure and temperature, equipment maintenance and service needs, etc. The paper discusses gas sampling, design considerations (location, probe type, extraction devices, controller, and receivers), operating requirements, and system integration.
Elayavilli, Ravikumar Komandur; Liu, Hongfang
2016-01-01
Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.
Model of experts for decision support in the diagnosis of leukemia patients.
Corchado, Juan M; De Paz, Juan F; Rodríguez, Sara; Bajo, Javier
2009-07-01
Recent advances in the field of biomedicine, specifically in the field of genomics, have led to an increase in the information available for conducting expression analysis. Expression analysis is a technique used in transcriptomics, a branch of genomics that deals with the study of messenger ribonucleic acid (mRNA) and the extraction of information contained in the genes. This increase in information is reflected in the exon arrays, which require the use of new techniques in order to extract the information. The purpose of this study is to provide a tool based on a mixture of experts model that allows the analysis of the information contained in the exon arrays, from which automatic classifications for decision support in diagnoses of leukemia patients can be made. The proposed model integrates several cooperative algorithms characterized for their efficiency for data processing, filtering, classification and knowledge extraction. The Cancer Institute of the University of Salamanca is making an effort to develop tools to automate the evaluation of data and to facilitate de analysis of information. This proposal is a step forward in this direction and the first step toward the development of a mixture of experts tool that integrates different cognitive and statistical approaches to deal with the analysis of exon arrays. The mixture of experts model presented within this work provides great capacities for learning and adaptation to the characteristics of the problem in consideration, using novel algorithms in each of the stages of the analysis process that can be easily configured and combined, and provides results that notably improve those provided by the existing methods for exon arrays analysis. The material used consists of data from exon arrays provided by the Cancer Institute that contain samples from leukemia patients. The methodology used consists of a system based on a mixture of experts. Each one of the experts incorporates novel artificial intelligence techniques that improve the process of carrying out various tasks such as pre-processing, filtering, classification and extraction of knowledge. This article will detail the manner in which individual experts are combined so that together they generate a system capable of extracting knowledge, thus permitting patients to be classified in an automatic and efficient manner that is also comprehensible for medical personnel. The system has been tested in a real setting and has been used for classifying patients who suffer from different forms of leukemia at various stages. Personnel from the Cancer Institute supervised and participated throughout the testing period. Preliminary results are promising, notably improving the results obtained with previously used tools. The medical staff from the Cancer Institute considers the tools that have been developed to be positive and very useful in a supporting capacity for carrying out their daily tasks. Additionally the mixture of experts supplies a tool for the extraction of necessary information in order to explain the associations that have been made in simple terms. That is, it permits the extraction of knowledge for each classification made and generalized in order to be used in subsequent classifications. This allows for a large amount of learning and adaptation within the proposed system.
García-Remesal, Miguel; Maojo, Victor; Crespo, José
2010-01-01
In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.
Ad Hoc Information Extraction for Clinical Data Warehouses.
Dietrich, Georg; Krebs, Jonathan; Fette, Georg; Ertl, Maximilian; Kaspar, Mathias; Störk, Stefan; Puppe, Frank
2018-05-01
Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with "heart failure" including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [1] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [2] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts. Schattauer GmbH.
2007-03-01
partners for their mutual benefit. Unfortunately, based on government reports, FEMA did not have adequate control of its supply chain information ...is one attractor . “Edge of chaos” systems have two to eight attractors and in chaotic systems many attractors . Some are called strange attractors ...investigates whether chaos theory, part of complexity science, can extract information from Katrina contracting data to help managers make better logistics
KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process
NASA Technical Reports Server (NTRS)
Gettig, Gary A.
1988-01-01
Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.
Information Extraction for System-Software Safety Analysis: Calendar Year 2007 Year-End Report
NASA Technical Reports Server (NTRS)
Malin, Jane T.
2008-01-01
This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis on the models to identify possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations; 4) perform discrete-time-based simulation on the models to investigate scenarios where these paths may play a role in failures and mishaps; and 5) identify resulting candidate scenarios for software integration testing. This paper describes new challenges in a NASA abort system case, and enhancements made to develop the integrated tool set.
[Technologies for Complex Intelligent Clinical Data Analysis].
Baranov, A A; Namazova-Baranova, L S; Smirnov, I V; Devyatkin, D A; Shelmanov, A O; Vishneva, E A; Antonova, E V; Smirnov, V I
2016-01-01
The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient's features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented. Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality. the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center. Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as "negation" (indicates that the disease is absent), "no patient" (indicates that the disease refers to the patient's family member, but not to the patient), "severity of illness", disease course", "body region to which the disease refers". Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the methodfor determining the most informative patients'features are also proposed. Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records ofpatients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases. The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare.
Strong Similarity Measures for Ordered Sets of Documents in Information Retrieval.
ERIC Educational Resources Information Center
Egghe, L.; Michel, Christine
2002-01-01
Presents a general method to construct ordered similarity measures in information retrieval based on classical similarity measures for ordinary sets. Describes a test of some of these measures in an information retrieval system that extracted ranked document sets and discuses the practical usability of the ordered similarity measures. (Author/LRW)
SAMS--a systems architecture for developing intelligent health information systems.
Yılmaz, Özgün; Erdur, Rıza Cenk; Türksever, Mustafa
2013-12-01
In this paper, SAMS, a novel health information system architecture for developing intelligent health information systems is proposed and also some strategies for developing such systems are discussed. The systems fulfilling this architecture will be able to store electronic health records of the patients using OWL ontologies, share patient records among different hospitals and provide physicians expertise to assist them in making decisions. The system is intelligent because it is rule-based, makes use of rule-based reasoning and has the ability to learn and evolve itself. The learning capability is provided by extracting rules from previously given decisions by the physicians and then adding the extracted rules to the system. The proposed system is novel and original in all of these aspects. As a case study, a system is implemented conforming to SAMS architecture for use by dentists in the dental domain. The use of the developed system is described with a scenario. For evaluation, the developed dental information system will be used and tried by a group of dentists. The development of this system proves the applicability of SAMS architecture. By getting decision support from a system derived from this architecture, the cognitive gap between experienced and inexperienced physicians can be compensated. Thus, patient satisfaction can be achieved, inexperienced physicians are supported in decision making and the personnel can improve their knowledge. A physician can diagnose a case, which he/she has never diagnosed before, using this system. With the help of this system, it will be possible to store general domain knowledge in this system and the personnel's need to medical guideline documents will be reduced.
Yanagida, Akio; Yamakawa, Yutaka; Noji, Ryoko; Oda, Ako; Shindo, Heisaburo; Ito, Yoichiro; Shibusawa, Yoichi
2007-06-01
High-speed counter-current chromatography (HSCCC) using the three-phase solvent system n-hexane-methyl acetate-acetonitrile-water at a volume ratio of 4:4:3:4 was applied to the comprehensive separation of secondary metabolites in several natural product extracts. A wide variety of secondary metabolites in each natural product was effectively extracted with the three-phase solvent system, and the filtered extract was directly submitted to the HSCCC separation using the same three-phase system. In the HSCCC profiles of crude natural drugs listed in the Japanese Pharmacopoeia, several physiologically active compounds were clearly separated from other components in the extracts. The HSCCC profiles of several tea products, each manufactured by a different process, clearly showed their compositional difference in main compounds such as catechins, caffeine, and pigments. These HSCCC profiles also provide useful information about hydrophobic diversity of whole components present in each natural product.
DesAutels, Spencer J.; Fox, Zachary E.; Giuse, Dario A.; Williams, Annette M.; Kou, Qing-hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia
2016-01-01
Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems. PMID:28269846
Park, Yu Rang; Kim*, Ju Han
2006-01-01
Standardized management of data elements (DEs) for Case Report Form (CRF) is crucial in Clinical Trials Information System (CTIS). Traditional CTISs utilize organization-specific definitions and storage methods for Des and CRFs. We developed metadata-based DE management system for clinical trials, Clinical and Histopathological Metadata Registry (CHMR), using international standard for metadata registry (ISO 11179) for the management of cancer clinical trials information. CHMR was evaluated in cancer clinical trials with 1625 DEs extracted from the College of American Pathologists Cancer Protocols for 20 major cancers. PMID:17238675
Biomedical question answering using semantic relations.
Hristovski, Dimitar; Dinevski, Dejan; Kastrin, Andrej; Rindflesch, Thomas C
2015-01-16
The proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature. We extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si ). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%). In this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.
Staccini, Pascal; Joubert, Michel; Quaranta, Jean-François; Fieschi, Marius
2005-03-01
Today, the economic and regulatory environment, involving activity-based and prospective payment systems, healthcare quality and risk analysis, traceability of the acts performed and evaluation of care practices, accounts for the current interest in clinical and hospital information systems. The structured gathering of information relative to users' needs and system requirements is fundamental when installing such systems. This stage takes time and is generally misconstrued by caregivers and is of limited efficacy to analysts. We used a modelling technique designed for manufacturing processes (IDEF0/SADT). We enhanced the basic model of an activity with descriptors extracted from the Ishikawa cause-and-effect diagram (methods, men, materials, machines, and environment). We proposed an object data model of a process and its components, and programmed a web-based tool in an object-oriented environment. This tool makes it possible to extract the data dictionary of a given process from the description of its elements and to locate documents (procedures, recommendations, instructions) according to each activity or role. Aimed at structuring needs and storing information provided by directly involved teams regarding the workings of an institution (or at least part of it), the process-mapping approach has an important contribution to make in the analysis of clinical information systems.
Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry
NASA Astrophysics Data System (ADS)
Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.
2015-09-01
Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heller, Forrest D.; Casella, Amanda J.; Lumetta, Gregg J.
A Lewis cell was designed and constructed for investigating solvent extraction systems by spectrophotometrically monitoring both the organic and aqueous phases in real time. This new Lewis cell was tested and shown to perform well compared to other previously reported Lewis cell designs. The advantage of the new design is that the spectroscopic measurement allows determination of not only metal ion concentrations, but also information regarding chemical speciation—information not available with previous Lewis cell designs. For convenience, the new Lewis cell design was dubbed COSMOFLEX (COntinuous Spectroscopic MOnitoring of Forrest’s Liquid-liquid EXtraction cell).
A prototype system to support evidence-based practice.
Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E; Clayton, Jennifer; Thoma, George R
2008-11-06
Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans.
A Prototype System to Support Evidence-based Practice
Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E.; Clayton, Jennifer; Thoma, George R.
2008-01-01
Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients’ problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans. PMID:18998835
a Geographic Data Gathering System for Image Geolocalization Refining
NASA Astrophysics Data System (ADS)
Semaan, B.; Servières, M.; Moreau, G.; Chebaro, B.
2017-09-01
Image geolocalization has become an important research field during the last decade. This field is divided into two main sections. The first is image geolocalization that is used to find out which country, region or city the image belongs to. The second one is refining image localization for uses that require more accuracy such as augmented reality and three dimensional environment reconstruction using images. In this paper we present a processing chain that gathers geographic data from several sources in order to deliver a better geolocalization than the GPS one of an image and precise camera pose parameters. In order to do so, we use multiple types of data. Among this information some are visible in the image and are extracted using image processing, other types of data can be extracted from image file headers or online image sharing platforms related information. Extracted information elements will not be expressive enough if they remain disconnected. We show that grouping these information elements helps finding the best geolocalization of the image.
The application of machine vision in fire protection system
NASA Astrophysics Data System (ADS)
Rong, Jiang
2018-04-01
Based on the previous research, this paper introduces the theory of wavelet, collects the situation through the video system, and calculates the key information needed in the fire protection system. That is, through the algorithm to collect the information, according to the flame color characteristics and smoke characteristics were extracted, and as the characteristic information corresponding processing. Alarm system set the corresponding alarm threshold, when more than this alarm threshold, the system will alarm. This combination of flame color characteristics and smoke characteristics of the fire method not only improve the accuracy of judgment, but also improve the efficiency of judgments. Experiments show that the scheme is feasible.
Yang, Zhi; Wu, Youqian; Wu, Shihua
2016-01-29
Despite of substantial developments of extraction and separation techniques, isolation of natural products from natural resources is still a challenging task. In this work, an efficient strategy for extraction and isolation of multi-component natural products has been successfully developed by combination of systematic two-phase liquid-liquid extraction-(13)C NMR pattern recognition and following conical counter-current chromatography separation. A small-scale crude sample was first distributed into 9 systematic hexane-ethyl acetate-methanol-water (HEMWat) two-phase solvent systems for determination of the optimum extraction solvents and partition coefficients of the prominent components. Then, the optimized solvent systems were used in succession to enrich the hydrophilic and lipophilic components from the large-scale crude sample. At last, the enriched components samples were further purified by a new conical counter-current chromatography (CCC). Due to the use of (13)C NMR pattern recognition, the kinds and structures of major components in the solvent extracts could be predicted. Therefore, the method could collect simultaneously the partition coefficients and the structural information of components in the selected two-phase solvents. As an example, a cytotoxic extract of podophyllotoxins and flavonoids from Dysosma versipellis (Hance) was selected. After the systematic HEMWat system solvent extraction and (13)C NMR pattern recognition analyses, the crude extract of D. versipellis was first degreased by the upper phase of HEMWat system (9:1:9:1, v/v), and then distributed in the two phases of the system of HEMWat (2:8:2:8, v/v) to obtain the hydrophilic lower phase extract and lipophilic upper phase extract, respectively. These extracts were further separated by conical CCC with the HEMWat systems (1:9:1:9 and 4:6:4:6, v/v). As results, total 17 cytotoxic compounds were isolated and identified. In general, whole results suggested that the strategy was very efficient for the systematic extraction and isolation of biological active components from the complex biomaterials. Copyright © 2016 Elsevier B.V. All rights reserved.
Beyond Information Retrieval: Ways To Provide Content in Context.
ERIC Educational Resources Information Center
Wiley, Deborah Lynne
1998-01-01
Provides an overview of information retrieval from mainframe systems to Web search engines; discusses collaborative filtering, data extraction, data visualization, agent technology, pattern recognition, classification and clustering, and virtual communities. Argues that rather than huge data-storage centers and proprietary software, we need…
MedEx: a medication information extraction system for clinical narratives
Stenner, Shane P; Doan, Son; Johnson, Kevin B; Waitman, Lemuel R; Denny, Joshua C
2010-01-01
Medication information is one of the most important types of clinical data in electronic medical records. It is critical for healthcare safety and quality, as well as for clinical research that uses electronic medical record data. However, medication data are often recorded in clinical notes as free-text. As such, they are not accessible to other computerized applications that rely on coded data. We describe a new natural language processing system (MedEx), which extracts medication information from clinical notes. MedEx was initially developed using discharge summaries. An evaluation using a data set of 50 discharge summaries showed it performed well on identifying not only drug names (F-measure 93.2%), but also signature information, such as strength, route, and frequency, with F-measures of 94.5%, 93.9%, and 96.0% respectively. We then applied MedEx unchanged to outpatient clinic visit notes. It performed similarly with F-measures over 90% on a set of 25 clinic visit notes. PMID:20064797
Mapping female bodily features of attractiveness
Bovet, Jeanne; Lao, Junpeng; Bartholomée, Océane; Caldara, Roberto; Raymond, Michel
2016-01-01
“Beauty is bought by judgment of the eye” (Shakespeare, Love’s Labour’s Lost), but the bodily features governing this critical biological choice are still debated. Eye movement studies have demonstrated that males sample coarse body regions expanding from the face, the breasts and the midriff, while making female attractiveness judgements with natural vision. However, the visual system ubiquitously extracts diagnostic extra-foveal information in natural conditions, thus the visual information actually used by men is still unknown. We thus used a parametric gaze-contingent design while males rated attractiveness of female front- and back-view bodies. Males used extra-foveal information when available. Critically, when bodily features were only visible through restricted apertures, fixations strongly shifted to the hips, to potentially extract hip-width and curvature, then the breast and face. Our hierarchical mapping suggests that the visual system primary uses hip information to compute the waist-to-hip ratio and the body mass index, the crucial factors in determining sexual attractiveness and mate selection. PMID:26791105
NASA Astrophysics Data System (ADS)
Brook, A.; Cristofani, E.; Vandewal, M.; Matheis, C.; Jonuscheit, J.; Beigang, R.
2012-05-01
The present study proposes a fully integrated, semi-automatic and near real-time mode-operated image processing methodology developed for Frequency-Modulated Continuous-Wave (FMCW) THz images with the center frequencies around: 100 GHz and 300 GHz. The quality control of aeronautics composite multi-layered materials and structures using Non-Destructive Testing is the main focus of this work. Image processing is applied on the 3-D images to extract useful information. The data is processed by extracting areas of interest. The detected areas are subjected to image analysis for more particular investigation managed by a spatial model. Finally, the post-processing stage examines and evaluates the spatial accuracy of the extracted information.
Gstruct: a system for extracting schemas from GML documents
NASA Astrophysics Data System (ADS)
Chen, Hui; Zhu, Fubao; Guan, Jihong; Zhou, Shuigeng
2008-10-01
Geography Markup Language (GML) becomes the de facto standard for geographic information representation on the internet. GML schema provides a way to define the structure, content, and semantic of GML documents. It contains useful structural information of GML documents and plays an important role in storing, querying and analyzing GML data. However, GML schema is not mandatory, and it is common that a GML document contains no schema. In this paper, we present Gstruct, a tool for GML schema extraction. Gstruct finds the features in the input GML documents, identifies geometry datatypes as well as simple datatypes, then integrates all these features and eliminates improper components to output the optimal schema. Experiments demonstrate that Gstruct is effective in extracting semantically meaningful schemas from GML documents.
ERIC Educational Resources Information Center
Scott-Bracey, Pamela
2011-01-01
The purpose of this study was to explore the alignment of soft skills sought by current business IS entry-level employers in electronic job postings, with the integration of soft skills in undergraduate business information systems (IS) syllabi of public four-year universities in Texas. One hundred fifty job postings were extracted from two major…
Medical document anonymization with a semantic lexicon.
Ruch, P.; Baud, R. H.; Rassinoux, A. M.; Bouillon, P.; Robert, G.
2000-01-01
We present an original system for locating and removing personally-identifying information in patient records. In this experiment, anonymization is seen as a particular case of knowledge extraction. We use natural language processing tools provided by the MEDTAG framework: a semantic lexicon specialized in medicine, and a toolkit for word-sense and morpho-syntactic tagging. The system finds 98-99% of all personally-identifying information. PMID:11079980
NASA Astrophysics Data System (ADS)
Ye, Fa-wang; Liu, De-chang
2008-12-01
Practices of sandstone-type uranium exploration in recent years in China indicate that the uranium mineralization alteration information is of great importance for selecting a new uranium target or prospecting in outer area of the known uranium ore district. Taking a case study of BASHIBULAKE uranium ore district, this paper mainly presents the technical minds and methods of extracting the reduced alteration information by oil and gas in BASHIBULAKE ore district using ASTER data. First, the regional geological setting and study status in BASHIBULAKE uranium ore district are introduced in brief. Then, the spectral characteristics of altered sandstone and un-altered sandstone in BASHIBULAKE ore district are analyzed deeply. Based on the spectral analysis, two technical minds to extract the remote sensing reduced alteration information are proposed, and the un-mixing method is introduced to process ASTER data to extract the reduced alteration information in BASHIBULAKE ore district. From the enhanced images, three remote sensing anomaly zones are discovered, and their geological and prospecting significances are further made sure by taking the advantages of multi-bands in SWIR of ASTER data. Finally, the distribution and intensity of the reduced alteration information in Cretaceous system and its relationship with the genesis of uranium deposit are discussed, the specific suggestions for uranium prospecting orientation in outer of BASHIBULAKE ore district are also proposed.
NASA Technical Reports Server (NTRS)
Haste, Deepak; Azam, Mohammad; Ghoshal, Sudipto; Monte, James
2012-01-01
Health management (HM) in any engineering systems requires adequate understanding about the system s functioning; a sufficient amount of monitored data; the capability to extract, analyze, and collate information; and the capability to combine understanding and information for HM-related estimation and decision-making. Rotorcraft systems are, in general, highly complex. Obtaining adequate understanding about functioning of such systems is quite difficult, because of the proprietary (restricted access) nature of their designs and dynamic models. Development of an EIM (exact inverse map) solution for rotorcraft requires a process that can overcome the abovementioned difficulties and maximally utilize monitored information for HM facilitation via employing advanced analytic techniques. The goal was to develop a versatile HM solution for rotorcraft for facilitation of the Condition Based Maintenance Plus (CBM+) capabilities. The effort was geared towards developing analytic and reasoning techniques, and proving the ability to embed the required capabilities on a rotorcraft platform, paving the way for implementing the solution on an aircraft-level system for consolidation and reporting. The solution for rotorcraft can he used offboard or embedded directly onto a rotorcraft system. The envisioned solution utilizes available monitored and archived data for real-time fault detection and identification, failure precursor identification, and offline fault detection and diagnostics, health condition forecasting, optimal guided troubleshooting, and maintenance decision support. A variant of the onboard version is a self-contained hardware and software (HW+SW) package that can be embedded on rotorcraft systems. The HM solution comprises components that gather/ingest data and information, perform information/feature extraction, analyze information in conjunction with the dependency/diagnostic model of the target system, facilitate optimal guided troubleshooting, and offer decision support for optimal maintenance.
NASA Astrophysics Data System (ADS)
Chan, Yi-Tung; Wang, Shuenn-Jyi; Tsai, Chung-Hsien
2017-09-01
Public safety is a matter of national security and people's livelihoods. In recent years, intelligent video-surveillance systems have become important active-protection systems. A surveillance system that provides early detection and threat assessment could protect people from crowd-related disasters and ensure public safety. Image processing is commonly used to extract features, e.g., people, from a surveillance video. However, little research has been conducted on the relationship between foreground detection and feature extraction. Most current video-surveillance research has been developed for restricted environments, in which the extracted features are limited by having information from a single foreground; they do not effectively represent the diversity of crowd behavior. This paper presents a general framework based on extracting ensemble features from the foreground of a surveillance video to analyze a crowd. The proposed method can flexibly integrate different foreground-detection technologies to adapt to various monitored environments. Furthermore, the extractable representative features depend on the heterogeneous foreground data. Finally, a classification algorithm is applied to these features to automatically model crowd behavior and distinguish an abnormal event from normal patterns. The experimental results demonstrate that the proposed method's performance is both comparable to that of state-of-the-art methods and satisfies the requirements of real-time applications.
Educational System Efficiency Improvement Using Knowledge Discovery in Databases
ERIC Educational Resources Information Center
Lukaš, Mirko; Leškovic, Darko
2007-01-01
This study describes one of possible way of usage ICT in education system. We basically treated educational system like Business Company and develop appropriate model for clustering of student population. Modern educational systems are forced to extract the most necessary and purposeful information from a large amount of available data. Clustering…
Botsis, T.; Woo, E. J.; Ball, R.
2013-01-01
Background We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. Objective To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). Methods We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. Results MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. Conclusion For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority. PMID:23650490
Refraction-based X-ray Computed Tomography for Biomedical Purpose Using Dark Field Imaging Method
NASA Astrophysics Data System (ADS)
Sunaguchi, Naoki; Yuasa, Tetsuya; Huo, Qingkai; Ichihara, Shu; Ando, Masami
We have proposed a tomographic x-ray imaging system using DFI (dark field imaging) optics along with a data-processing method to extract information on refraction from the measured intensities, and a reconstruction algorithm to reconstruct a refractive-index field from the projections generated from the extracted refraction information. The DFI imaging system consists of a tandem optical system of Bragg- and Laue-case crystals, a positioning device system for a sample, and two CCD (charge coupled device) cameras. Then, we developed a software code to simulate the data-acquisition, data-processing, and reconstruction methods to investigate the feasibility of the proposed methods. Finally, in order to demonstrate its efficacy, we imaged a sample with DCIS (ductal carcinoma in situ) excised from a breast cancer patient using a system constructed at the vertical wiggler beamline BL-14C in KEK-PF. Its CT images depicted a variety of fine histological structures, such as milk ducts, duct walls, secretions, adipose and fibrous tissue. They correlate well with histological sections.
The Information System at CeSAM
NASA Astrophysics Data System (ADS)
Agneray, F.; Gimenez, S.; Moreau, C.; Roehlly, Y.
2012-09-01
Modern large observational programmes produce important amounts of data from various origins, and need high level quality control, fast data access via easy-to-use graphic interfaces, as well as possibility to cross-correlate informations coming from different observations. The Centre de donnéeS Astrophysique de Marseille (CeSAM) offer web access to VO compliant Information Systems to access data of different projects (VVDS, HeDAM, EXODAT, HST-COSMOS,…), including ancillary data obtained outside Laboratoire d'Astrophysique de Marseille (LAM) control. The CeSAM Information Systems provides download of catalogues and some additional services like: search, extract and display imaging and spectroscopic data by multi-criteria and Cone Search interfaces.
Text mining and its potential applications in systems biology.
Ananiadou, Sophia; Kell, Douglas B; Tsujii, Jun-ichi
2006-12-01
With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.
Gao, Yingbin; Kong, Xiangyu; Zhang, Huihui; Hou, Li'an
2017-05-01
Minor component (MC) plays an important role in signal processing and data analysis, so it is a valuable work to develop MC extraction algorithms. Based on the concepts of weighted subspace and optimum theory, a weighted information criterion is proposed for searching the optimum solution of a linear neural network. This information criterion exhibits a unique global minimum attained if and only if the state matrix is composed of the desired MCs of an autocorrelation matrix of an input signal. By using gradient ascent method and recursive least square (RLS) method, two algorithms are developed for multiple MCs extraction. The global convergences of the proposed algorithms are also analyzed by the Lyapunov method. The proposed algorithms can extract the multiple MCs in parallel and has advantage in dealing with high dimension matrices. Since the weighted matrix does not require an accurate value, it facilitates the system design of the proposed algorithms for practical applications. The speed and computation advantages of the proposed algorithms are verified through simulations. Copyright © 2017 Elsevier Ltd. All rights reserved.
GDRMS: a system for automatic extraction of the disease-centre relation
NASA Astrophysics Data System (ADS)
Yang, Ronggen; Zhang, Yue; Gong, Lejun
2012-01-01
With the rapidly increasing of biomedical literature, the deluge of new articles is leading to information overload. Extracting the available knowledge from the huge amount of biomedical literature has become a major challenge. GDRMS is developed as a tool that extracts the relationship between disease and gene, gene and gene from biomedical literatures using text mining technology. It is a ruled-based system which also provides disease-centre network visualization, constructs the disease-gene database, and represents a gene engine for understanding the function of the gene. The main focus of GDRMS is to provide a valuable opportunity to explore the relationship between disease and gene for the research community about etiology of disease.
Program review presentation to Level 1, Interagency Coordination Committee
NASA Technical Reports Server (NTRS)
1982-01-01
Progress in the development of crop inventory technology is reported. Specific topics include the results of a thematic mapper analysis, variable selection studies/early season estimator improvements, the agricultural information system simulator, large unit proportion estimation, and development of common features for multi-satellite information extraction.
NASA Astrophysics Data System (ADS)
Susanto, Arif; Mulyono, Nur Budi
2018-02-01
The changes of environmental management system standards into the latest version, i.e. ISO 14001:2015, may cause a change on a data and information need in decision making and achieving the objectives in the organization coverage. Information management is the organization's responsibility to ensure that effectiveness and efficiency start from its creating, storing, processing and distribution processes to support operations and effective decision making activity in environmental performance management. The objective of this research was to set up an information management program and to adopt the technology as the supporting component of the program which was done by PTFI Concentrating Division so that it could be in line with the desirable organization objective in environmental management based on ISO 14001:2015 environmental management system standards. Materials and methods used covered technical aspects in information management, i.e. with web-based application development by using usage centered design. The result of this research showed that the use of Single Sign On gave ease to its user to interact further on the use of the environmental management system. Developing a web-based through creating entity relationship diagram (ERD) and information extraction by conducting information extraction which focuses on attributes, keys, determination of constraints. While creating ERD is obtained from relational database scheme from a number of database from environmental performances in Concentrating Division.
Populating the Semantic Web by Macro-reading Internet Text
NASA Astrophysics Data System (ADS)
Mitchell, Tom M.; Betteridge, Justin; Carlson, Andrew; Hruschka, Estevam; Wang, Richard
A key question regarding the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web. We also describe preliminary results demonstrating that machine learning algorithms can learn to extract tens of thousands of facts to populate a diverse ontology, with imperfect but reasonably good accuracy.
Garvin, Jennifer H; DuVall, Scott L; South, Brett R; Bray, Bruce E; Bolton, Daniel; Heavirland, Julia; Pickard, Steve; Heidenreich, Paul; Shen, Shuying; Weir, Charlene; Samore, Matthew; Goldstein, Mary K
2012-01-01
Left ventricular ejection fraction (EF) is a key component of heart failure quality measures used within the Department of Veteran Affairs (VA). Our goals were to build a natural language processing system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to validate the accuracy of the system using a comparison reference standard developed through human review. This project was a Translational Use Case Project within the VA Consortium for Healthcare Informatics. We created a set of regular expressions and rules to capture the EF using a random sample of 765 echocardiograms from seven VA medical centers. The documents were randomly assigned to two sets: a set of 275 used for training and a second set of 490 used for testing and validation. To establish the reference standard, two independent reviewers annotated all documents in both sets; a third reviewer adjudicated disagreements. System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9% (95% CI 87.7% to 90.0%), a positive predictive value of 95% (95% CI 94.2% to 95.9%), and an F measure of 91.9% (95% CI 91.2% to 92.7%). An EF value of <40% can be accurately identified in VA echocardiogram reports. An automated information extraction system can be used to accurately extract EF for quality measurement.
Method and system for analyzing and classifying electronic information
McGaffey, Robert W.; Bell, Michael Allen; Kortman, Peter J.; Wilson, Charles H.
2003-04-29
A data analysis and classification system that reads the electronic information, analyzes the electronic information according to a user-defined set of logical rules, and returns a classification result. The data analysis and classification system may accept any form of computer-readable electronic information. The system creates a hash table wherein each entry of the hash table contains a concept corresponding to a word or phrase which the system has previously encountered. The system creates an object model based on the user-defined logical associations, used for reviewing each concept contained in the electronic information in order to determine whether the electronic information is classified. The data analysis and classification system extracts each concept in turn from the electronic information, locates it in the hash table, and propagates it through the object model. In the event that the system can not find the electronic information token in the hash table, that token is added to a missing terms list. If any rule is satisfied during propagation of the concept through the object model, the electronic information is classified.
Observing a quantum Maxwell demon at work
NASA Astrophysics Data System (ADS)
Cottet, Nathanaël; Jezouin, Sébastien; Bretheau, Landry; Campagne-Ibarcq, Philippe; Ficheux, Quentin; Anders, Janet; Auffèves, Alexia; Azouit, Rémi; Rouchon, Pierre; Huard, Benjamin
2017-07-01
In apparent contradiction to the laws of thermodynamics, Maxwell’s demon is able to cyclically extract work from a system in contact with a thermal bath, exploiting the information about its microstate. The resolution of this paradox required the insight that an intimate relationship exists between information and thermodynamics. Here, we realize a Maxwell demon experiment that tracks the state of each constituent in both the classical and quantum regimes. The demon is a microwave cavity that encodes quantum information about a superconducting qubit and converts information into work by powering up a propagating microwave pulse by stimulated emission. Thanks to the high level of control of superconducting circuits, we directly measure the extracted work and quantify the entropy remaining in the demon’s memory. This experiment provides an enlightening illustration of the interplay of thermodynamics with quantum information.
NASA Astrophysics Data System (ADS)
Mouloudakis, K.; Kominis, I. K.
2017-02-01
Radical-ion-pair reactions, central for understanding the avian magnetic compass and spin transport in photosynthetic reaction centers, were recently shown to be a fruitful paradigm of the new synthesis of quantum information science with biological processes. We show here that the master equation so far constituting the theoretical foundation of spin chemistry violates fundamental bounds for the entropy of quantum systems, in particular the Ozawa bound. In contrast, a recently developed theory based on quantum measurements, quantum coherence measures, and quantum retrodiction, thus exemplifying the paradigm of quantum biology, satisfies the Ozawa bound as well as the Lanford-Robinson bound on information extraction. By considering Groenewold's information, the quantum information extracted during the reaction, we reproduce the known and unravel other magnetic-field effects not conveyed by reaction yields.
NASA Astrophysics Data System (ADS)
Zhang, George Z.; Myers, Kyle J.; Park, Subok
2013-03-01
Digital breast tomosynthesis (DBT) has shown promise for improving the detection of breast cancer, but it has not yet been fully optimized due to a large space of system parameters to explore. A task-based statistical approach1 is a rigorous method for evaluating and optimizing this promising imaging technique with the use of optimal observers such as the Hotelling observer (HO). However, the high data dimensionality found in DBT has been the bottleneck for the use of a task-based approach in DBT evaluation. To reduce data dimensionality while extracting salient information for performing a given task, efficient channels have to be used for the HO. In the past few years, 2D Laguerre-Gauss (LG) channels, which are a complete basis for stationary backgrounds and rotationally symmetric signals, have been utilized for DBT evaluation2, 3 . But since background and signal statistics from DBT data are neither stationary nor rotationally symmetric, LG channels may not be efficient in providing reliable performance trends as a function of system parameters. Recently, partial least squares (PLS) has been shown to generate efficient channels for the Hotelling observer in detection tasks involving random backgrounds and signals.4 In this study, we investigate the use of PLS as a method for extracting salient information from DBT in order to better evaluate such systems.
Text Information Extraction System (TIES) | Informatics Technology for Cancer Research (ITCR)
TIES is a service based software system for acquiring, deidentifying, and processing clinical text reports using natural language processing, and also for querying, sharing and using this data to foster tissue and image based research, within and between institutions.
41 CFR 105-60.103-1 - Availability of records.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Regulations System (Continued) GENERAL SERVICES ADMINISTRATION Regional Offices-General Services Administration 60-PUBLIC AVAILABILITY OF AGENCY RECORDS AND INFORMATIONAL MATERIALS 60.1-General Provisions § 105... perform minor reprogramming to extract records from a database or system when doing so will not...
41 CFR 105-60.103-1 - Availability of records.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Regulations System (Continued) GENERAL SERVICES ADMINISTRATION Regional Offices-General Services Administration 60-PUBLIC AVAILABILITY OF AGENCY RECORDS AND INFORMATIONAL MATERIALS 60.1-General Provisions § 105... perform minor reprogramming to extract records from a database or system when doing so will not...
41 CFR 105-60.103-1 - Availability of records.
Code of Federal Regulations, 2013 CFR
2013-07-01
... Regulations System (Continued) GENERAL SERVICES ADMINISTRATION Regional Offices-General Services Administration 60-PUBLIC AVAILABILITY OF AGENCY RECORDS AND INFORMATIONAL MATERIALS 60.1-General Provisions § 105... perform minor reprogramming to extract records from a database or system when doing so will not...
41 CFR 105-60.103-1 - Availability of records.
Code of Federal Regulations, 2010 CFR
2010-07-01
... Regulations System (Continued) GENERAL SERVICES ADMINISTRATION Regional Offices-General Services Administration 60-PUBLIC AVAILABILITY OF AGENCY RECORDS AND INFORMATIONAL MATERIALS 60.1-General Provisions § 105... perform minor reprogramming to extract records from a database or system when doing so will not...
2012-01-01
Background In recent years, biological event extraction has emerged as a key natural language processing task, aiming to address the information overload problem in accessing the molecular biology literature. The BioNLP shared task competitions have contributed to this recent interest considerably. The first competition (BioNLP'09) focused on extracting biological events from Medline abstracts from a narrow domain, while the theme of the latest competition (BioNLP-ST'11) was generalization and a wider range of text types, event types, and subject domains were considered. We view event extraction as a building block in larger discourse interpretation and propose a two-phase, linguistically-grounded, rule-based methodology. In the first phase, a general, underspecified semantic interpretation is composed from syntactic dependency relations in a bottom-up manner. The notion of embedding underpins this phase and it is informed by a trigger dictionary and argument identification rules. Coreference resolution is also performed at this step, allowing extraction of inter-sentential relations. The second phase is concerned with constraining the resulting semantic interpretation by shared task specifications. We evaluated our general methodology on core biological event extraction and speculation/negation tasks in three main tracks of BioNLP-ST'11 (GENIA, EPI, and ID). Results We achieved competitive results in GENIA and ID tracks, while our results in the EPI track leave room for improvement. One notable feature of our system is that its performance across abstracts and articles bodies is stable. Coreference resolution results in minor improvement in system performance. Due to our interest in discourse-level elements, such as speculation/negation and coreference, we provide a more detailed analysis of our system performance in these subtasks. Conclusions The results demonstrate the viability of a robust, linguistically-oriented methodology, which clearly distinguishes general semantic interpretation from shared task specific aspects, for biological event extraction. Our error analysis pinpoints some shortcomings, which we plan to address in future work within our incremental system development methodology. PMID:22759461
NASA Astrophysics Data System (ADS)
Zhukovskiy, Y.; Korolev, N.; Koteleva, N.
2018-05-01
This article is devoted to expanding the possibilities of assessing the technical state of the current consumption of asynchronous electric drives, as well as increasing the information capacity of diagnostic methods, in conditions of limited access to equipment and incompleteness of information. The method of spectral analysis of the electric drive current can be supplemented by an analysis of the components of the current of the Park's vector. The research of the hodograph evolution in the moment of appearance and development of defects was carried out using the example of current asymmetry in the phases of an induction motor. The result of the study is the new diagnostic parameters of the asynchronous electric drive. During the research, it was proved that the proposed diagnostic parameters allow determining the type and level of the defect. At the same time, there is no need to stop the equipment and taky it out of service for repair. Modern digital control and monitoring systems can use the proposed parameters based on the stator current of an electrical machine to improve the accuracy and reliability of obtaining diagnostic patterns and predicting their changes in order to improve the equipment maintenance systems. This approach can also be used in systems and objects where there are significant parasitic vibrations and unsteady loads. The extraction of useful information can be carried out in electric drive systems in the structure of which there is a power electric converter.
NASA Astrophysics Data System (ADS)
Jun, An Won
2006-01-01
We implement a first practical holographic security system using electrical biometrics that combines optical encryption and digital holographic memory technologies. Optical information for identification includes a picture of face, a name, and a fingerprint, which has been spatially multiplexed by random phase mask used for a decryption key. For decryption in our biometric security system, a bit-error-detection method that compares the digital bit of live fingerprint with of fingerprint information extracted from hologram is used.
Remote sensing information sciences research group
NASA Technical Reports Server (NTRS)
Estes, John E.; Smith, Terence; Star, Jeffrey L.
1988-01-01
Research conducted under this grant was used to extend and expand existing remote sensing activities at the University of California, Santa Barbara in the areas of georeferenced information systems, matching assisted information extraction from image data and large spatial data bases, artificial intelligence, and vegetation analysis and modeling. The research thrusts during the past year are summarized. The projects are discussed in some detail.
Using Process Redesign and Information Technology to Improve Procurement
1994-04-01
contrac- tor. Many large-volume contractors have automated order processing tied to ac- counting, manufacturing, and shipping subsystems. Currently...the contractor must receive the mailed order, analyze it, extract pertinent information, and en- ter that information into the automated order ... processing system. Almost all orders for small purchases are unilateral documents that do not require acceptance or acknowledgment by the contractor. For
Jenke, Dennis; Carlson, Tage
2014-01-01
Demonstrating suitability for intended use is necessary to register packaging, delivery/administration, or manufacturing systems for pharmaceutical products. During their use, such systems may interact with the pharmaceutical product, potentially adding extraneous entities to those products. These extraneous entities, termed leachables, have the potential to affect the product's performance and/or safety. To establish the potential safety impact, drug products and their packaging, delivery, or manufacturing systems are tested for leachables or extractables, respectively. This generally involves testing a sample (either the extract or the drug product) by a means that produces a test method response and then correlating the test method response with the identity and concentration of the entity causing the response. Oftentimes, analytical tests produce responses that cannot readily establish the associated entity's identity. Entities associated with un-interpretable responses are termed unknowns. Scientifically justifiable thresholds are used to establish those individual unknowns that represent an acceptable patient safety risk and thus which do not require further identification and, conversely, those unknowns whose potential safety impact require that they be identified. Such thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article documents toxicological information for over 540 extractables identified in laboratory testing of polymeric materials used in pharmaceutical applications. Relevant toxicological endpoints, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others were collated for these extractables or their structurally similar surrogates and were systematically assessed to produce a risk index, which represents a daily intake value for life-long intravenous administration. This systematic approach uses four uncertainty factors, each assigned a factor of 10, which consider the quality and relevance of the data, differences in route of administration, non-human species to human extrapolations, and inter-individual variation among humans. In addition to the risk index values, all extractables and most of their surrogates were classified for structural safety alerts using Cramer rules and for mutagenicity alerts using an in silico approach (Benigni/Bossa rule base for mutagenicity via Toxtree). Lastly, in vitro mutagenicity data (Ames Salmonella typimurium and Mouse Lymphoma tests) were collected from available databases (Chemical Carcinogenesis Research Information and Carcinogenic Potency Database). The frequency distributions of the resulting data were established; in general risk index values were normally distributed around a band ranging from 5 to 20 mg/day. The risk index associated with 95% level of the cumulative distribution plot was approximately 0.1 mg/day. Thirteen extractables in the dataset had individual risk index values less than 0.1 mg/day, although four of these had additional risk indices, based on multiple different toxicological endpoints, above 0.1 mg/day. Additionally, approximately 50% of the extractables were classified in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (no basis to assume safety). Lastly, roughly 20% of the extractables triggered either an in vitro or in silico alert for mutagenicity. When Cramer classifications and the mutagenicity alerts were compared to the risk indices, extractables with safety alerts generally had lower risk index values, although the differences in the risk index data distributions, extractables with or without alerts, were small and subtle. Leachables from packaging systems, manufacturing systems, or delivery devices can accumulate in drug products and potentially affect the drug product. Although drug products can be analyzed for leachables (and material extracts can be analyzed for extractables), not all leachables or extractables can be fully identified. Safety thresholds can be used to establish whether the unidentified substances can be deemed to be safe or whether additional analytical efforts need to be made to secure the identities. These thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article contains safety data for over 500 extractables that were identified in laboratory characterizations of polymers used in pharmaceutical applications. The safety data consists of structural toxicity classifications of the extractables as well as calculated risk indices, where the risk indices were obtained by subjecting toxicological safety data, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others to a systematic evaluation process using appropriate uncertainty factors. Thus the risk index values represent daily exposures for the lifetime intravenous administration of drugs. The frequency distributions of the risk indices and Cramer classifications were examined. The risk index values were normally distributed around a range of 5 to 20 mg/day, and the risk index associated with the 95% level of the cumulative frequency plot was 0.1 mg/day. Approximately 50% of the extractables were in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (high risk of toxicity). Approximately 20% of the extractables produced an in vitro or in silico mutagenicity alert. In general, the distribution of risk index values was not strongly correlated with the either extractables' Cramer classification or by mutagenicity alerts. However, extractables with either in vitro or in silico alerts were somewhat more likely to have low risk index values. © PDA, Inc. 2014.
Hong, Keehoon; Hong, Jisoo; Jung, Jae-Hyun; Park, Jae-Hyeung; Lee, Byoungho
2010-05-24
We propose a new method for rectifying a geometrical distortion in the elemental image set and extracting an accurate lens lattice lines by projective image transformation. The information of distortion in the acquired elemental image set is found by Hough transform algorithm. With this initial information of distortions, the acquired elemental image set is rectified automatically without the prior knowledge on the characteristics of pickup system by stratified image transformation procedure. Computer-generated elemental image sets with distortion on purpose are used for verifying the proposed rectification method. Experimentally-captured elemental image sets are optically reconstructed before and after the rectification by the proposed method. The experimental results support the validity of the proposed method with high accuracy of image rectification and lattice extraction.
Inertial navigation sensor integrated obstacle detection system
NASA Technical Reports Server (NTRS)
Bhanu, Bir (Inventor); Roberts, Barry A. (Inventor)
1992-01-01
A system that incorporates inertial sensor information into optical flow computations to detect obstacles and to provide alternative navigational paths free from obstacles. The system is a maximally passive obstacle detection system that makes selective use of an active sensor. The active detection typically utilizes a laser. Passive sensor suite includes binocular stereo, motion stereo and variable fields-of-view. Optical flow computations involve extraction, derotation and matching of interest points from sequential frames of imagery, for range interpolation of the sensed scene, which in turn provides obstacle information for purposes of safe navigation.
Wang, Zhifei; Xie, Yanming; Wang, Yongyan
2011-10-01
Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.
Nagy, Paul G; Warnock, Max J; Daly, Mark; Toland, Christopher; Meenan, Christopher D; Mezrich, Reuben S
2009-11-01
Radiology departments today are faced with many challenges to improve operational efficiency, performance, and quality. Many organizations rely on antiquated, paper-based methods to review their historical performance and understand their operations. With increased workloads, geographically dispersed image acquisition and reading sites, and rapidly changing technologies, this approach is increasingly untenable. A Web-based dashboard was constructed to automate the extraction, processing, and display of indicators and thereby provide useful and current data for twice-monthly departmental operational meetings. The feasibility of extracting specific metrics from clinical information systems was evaluated as part of a longer-term effort to build a radiology business intelligence architecture. Operational data were extracted from clinical information systems and stored in a centralized data warehouse. Higher-level analytics were performed on the centralized data, a process that generated indicators in a dynamic Web-based graphical environment that proved valuable in discussion and root cause analysis. Results aggregated over a 24-month period since implementation suggest that this operational business intelligence reporting system has provided significant data for driving more effective management decisions to improve productivity, performance, and quality of service in the department.
No oral-cavity-only discrimination of purely olfactory odorants.
Stephenson, Dejaimenay; Halpern, Bruce P
2009-02-01
The purely olfactory odorants coumarin, octanoic acid, phenylethyl alcohol, and vanillin had been found to be consistently identified when presented retronasally but could not be identified when presented oral-cavity only (OCO). However, OCO discrimination of these odorants was not tested. Consequently, it remained possible that the oral cavity trigeminal system might provide sufficient information to differentiate these purely olfactory odorants. To evaluate this, 20 participants attempted to discriminate vapor-phase coumarin, octanoic acid, phenylethyl alcohol, and vanillin and, as a control, the trigeminal stimulus peppermint extract, from their glycerin solvent, all presented OCO. None of the purely olfactory odorants could be discriminated OCO, but, as expected, peppermint extract was consistently discriminated. This inability to discriminate clarifies and expands the previous report of lack of OCO identification of purely olfactory odorants. Taken together with prior data, these results suggest that the oral cavity trigeminal system is fully unresponsive to these odorants in vapor phase and that coumarin, octanoic acid, phenylethyl alcohol, and vanillin are indeed purely olfactory stimuli. The OCO discrimination of peppermint extract demonstrated that the absence of discrimination for the purely olfactory odorants was odorant dependent and confirmed that the oral cavity trigeminal system will provide differential response information to some vapor-phase stimuli.
NASA Astrophysics Data System (ADS)
Zakaria, Nursyahda; Zulkifli, Razauden Mohamed; Akhir, Fazrena Nadia Md; Basar, Norazah
2014-03-01
Grape has become a fast growing agricultural sector in Malaysia producing between 0.62 kg to 2.03 kg waste per vinestock. This study aims to generate useful information on anti-oxidative properties as well as polyphenolic composition of grapevine waste. Stems and leaves of Vitis vinifera cultivated in Perlis, Malaysia were extracted using methanol, ethyl acetate and petroleum ether. Ethyl acetate stems extract exhibited highest total phenolic content. While in DPPH assay, methanolic stems extract show the highest antioxidant activities. This result indicates that total phenolic content in the extracts may not contribute directly to the antioxidant activities. Thin Layer Chromatograms of all crude extracts exhibited good separation under solvent system petroleum ether-ethyl acetate (2:3) resulted in detection of resveratrol in ethyl acetate stems crude extract.
Layout-aware text extraction from full-text PDF of scientific articles.
Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc
2012-05-28
The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.
Chen, Yili; Fu, Jixiang; Chu, Dawei; Li, Rongmao; Xie, Yaoqin
2017-11-27
A retinal prosthesis is designed to help the blind to obtain some sight. It consists of an external part and an internal part. The external part is made up of a camera, an image processor and an RF transmitter. The internal part is made up of an RF receiver, implant chip and microelectrode. Currently, the number of microelectrodes is in the hundreds, and we do not know the mechanism for using an electrode to stimulate the optic nerve. A simple hypothesis is that the pixels in an image correspond to the electrode. The images captured by the camera should be processed by suitable strategies to correspond to stimulation from the electrode. Thus, it is a question of how to obtain the important information from the image captured in the picture. Here, we use the region of interest (ROI), a useful algorithm for extracting the ROI, to retain the important information, and to remove the redundant information. This paper explains the details of the principles and functions of the ROI. Because we are investigating a real-time system, we need a fast processing ROI as a useful algorithm to extract the ROI. Thus, we simplified the ROI algorithm and used it in an outside image-processing digital signal processing (DSP) system of the retinal prosthesis. The results show that our image-processing strategies are suitable for a real-time retinal prosthesis and can eliminate redundant information and provide useful information for expression in a low-size image.
Web-Based Knowledge Exchange through Social Links in the Workplace
ERIC Educational Resources Information Center
Filipowski, Tomasz; Kazienko, Przemyslaw; Brodka, Piotr; Kajdanowicz, Tomasz
2012-01-01
Knowledge exchange between employees is an essential feature of recent commercial organisations on the competitive market. Based on the data gathered by various information technology (IT) systems, social links can be extracted and exploited in knowledge exchange systems of a new kind. Users of such a system ask their queries and the system…
Storing Data and Video on One Tape
NASA Technical Reports Server (NTRS)
Nixon, J. H.; Cater, J. P.
1985-01-01
Microprocessor-based system originally developed for anthropometric research merges digital data with video images for storage on video cassette recorder. Combined signals later retrieved and displayed simultaneously on television monitor. System also extracts digital portion of stored information and transfers it to solid-state memory.
ExaCT: automatic extraction of clinical trial characteristics from journal publications
2010-01-01
Background Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). Methods ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. Results We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. Conclusions Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols). PMID:20920176
Domain-independent information extraction in unstructured text
DOE Office of Scientific and Technical Information (OSTI.GOV)
Irwin, N.H.
Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness whenmore » compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.« less
Bacillus subtilis Lipid Extract, A Branched-Chain Fatty Acid Model Membrane
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nickels, Jonathan D.; Chatterjee, Sneha; Mostofian, Barmak
Lipid extracts are an excellent choice of model biomembrane; however at present, there are no commercially available lipid extracts or computational models that mimic microbial membranes containing the branched-chain fatty acids found in many pathogenic and industrially relevant bacteria. Here, we advance the extract of Bacillus subtilis as a standard model for these diverse systems, providing a detailed experimental description and equilibrated atomistic bilayer model included as Supporting Information to this Letter and at (http://cmb.ornl.gov/members/cheng). The development and validation of this model represents an advance that enables more realistic simulations and experiments on bacterial membranes and reconstituted bacterial membrane proteins.
Automated extraction of chemical structure information from digital raster images
Park, Jungkap; Rosania, Gus R; Shedden, Kerby A; Nguyen, Mandee; Lyu, Naesung; Saitou, Kazuhiro
2009-01-01
Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles. PMID:19196483
Clinical records anonymisation and text extraction (CRATE): an open-source software system.
Cardinal, Rudolf N
2017-04-26
Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.
2013-01-01
Background Medical blogs have emerged as new media, extending to a wider range of medical audiences, including health professionals and patients to share health-related information. However, extraction of quality health-related information from medical blogs is challenging primarily because these blogs lack systematic methods to organize their posts. Medical blogs can be categorized according to their author into (1) physician-written blogs, (2) nurse-written blogs, and (3) patient-written blogs. This study focuses on how to organize physician-written blog posts that discuss disease-related issues and how to extract quality information from these posts. Objective The goal of this study was to create and implement a prototype for a Web-based system, called ICDTag, based on a hybrid taxonomy–folksonomy approach that follows a combination of a taxonomy classification schemes and user-generated tags to organize physician-written blog posts and extract information from these posts. Methods First, the design specifications for the Web-based system were identified. This system included two modules: (1) a blogging module that was implemented as one or more blogs, and (2) an aggregator module that aggregated posts from different blogs into an aggregator website. We then developed a prototype for this system in which the blogging module included two blogs, the cardiology blog and the gastroenterology blog. To analyze the usage patterns of the prototype, we conducted an experiment with data provided by cardiologists and gastroenterologists. Next, we conducted two evaluation types: (1) an evaluation of the ICDTag blog, in which the browsing functionalities of the blogging module were evaluated from the end-user’s perspective using an online questionnaire, and (2) an evaluation of information quality, in which the quality of the content on the aggregator website was assessed from the perspective of medical experts using an emailed questionnaire. Results Participants of this experiment included 23 cardiologists and 24 gastroenterologists. Positive evaluations on the main functions and the organization of information on the ICDTag blogs were given by 18 of the participants via an online questionnaire. These results supported our hypothesis that the use of a taxonomy-folksonomy structure has significant potential to improve the organization of information in physician-written blogs. The quality of the content on the aggregator website was assessed by 3 cardiology experts and 3 gastroenterology experts via an email questionnaire. The results of this questionnaire demonstrated that the experts considered the aggregated tags and categories semantically related to the posts’ content. Conclusions This study demonstrated that applying the hybrid taxonomy–folksonomy approach to physician-written blogs that discuss disease-related issues has valuable potential to make these blogs a more organized and systematic medium and supports the extraction of quality information from their posts. Thus, it is worthwhile to develop more mature systems that make use of the hybrid approach to organize posts in physician-written blogs. PMID:23470419
Batch, Yamen; Yusof, Maryati Mohd; Noah, Shahrul Azman
2013-02-27
Medical blogs have emerged as new media, extending to a wider range of medical audiences, including health professionals and patients to share health-related information. However, extraction of quality health-related information from medical blogs is challenging primarily because these blogs lack systematic methods to organize their posts. Medical blogs can be categorized according to their author into (1) physician-written blogs, (2) nurse-written blogs, and (3) patient-written blogs. This study focuses on how to organize physician-written blog posts that discuss disease-related issues and how to extract quality information from these posts. The goal of this study was to create and implement a prototype for a Web-based system, called ICDTag, based on a hybrid taxonomy-folksonomy approach that follows a combination of a taxonomy classification schemes and user-generated tags to organize physician-written blog posts and extract information from these posts. First, the design specifications for the Web-based system were identified. This system included two modules: (1) a blogging module that was implemented as one or more blogs, and (2) an aggregator module that aggregated posts from different blogs into an aggregator website. We then developed a prototype for this system in which the blogging module included two blogs, the cardiology blog and the gastroenterology blog. To analyze the usage patterns of the prototype, we conducted an experiment with data provided by cardiologists and gastroenterologists. Next, we conducted two evaluation types: (1) an evaluation of the ICDTag blog, in which the browsing functionalities of the blogging module were evaluated from the end-user's perspective using an online questionnaire, and (2) an evaluation of information quality, in which the quality of the content on the aggregator website was assessed from the perspective of medical experts using an emailed questionnaire. Participants of this experiment included 23 cardiologists and 24 gastroenterologists. Positive evaluations on the main functions and the organization of information on the ICDTag blogs were given by 18 of the participants via an online questionnaire. These results supported our hypothesis that the use of a taxonomy-folksonomy structure has significant potential to improve the organization of information in physician-written blogs. The quality of the content on the aggregator website was assessed by 3 cardiology experts and 3 gastroenterology experts via an email questionnaire. The results of this questionnaire demonstrated that the experts considered the aggregated tags and categories semantically related to the posts' content. This study demonstrated that applying the hybrid taxonomy-folksonomy approach to physician-written blogs that discuss disease-related issues has valuable potential to make these blogs a more organized and systematic medium and supports the extraction of quality information from their posts. Thus, it is worthwhile to develop more mature systems that make use of the hybrid approach to organize posts in physician-written blogs.
Digital disaster evaluation and its application to 2015 Ms 8.1 Nepal Earthquake
NASA Astrophysics Data System (ADS)
WANG, Xiaoqing; LV, Jinxia; DING, Xiang; DOU, Aixia
2016-11-01
The purpose of the article is to probe the technique resolution of disaster information extraction and evaluation from the digital RS images based on the internet environment and aided by the social and geographic information. The solution is composed with such methods that the fast post-disaster assessment system will assess automatically the disaster area and grade, the multi-phase satellite and airborne high resolution digital RS images will provide the basis to extract the disaster areas or spots, assisted by the fast position of potential serious damage risk targets according to the geographic, administrative, population, buildings and other information in the estimated disaster region, the 2D digital map system or 3D digital earth system will provide platforms to interpret cooperatively the damage information in the internet environment, and further to estimate the spatial distribution of damage index or intensity, casualties or economic losses, which are very useful for the decision-making of emergency rescue and disaster relief, resettlement and reconstruction. The spatial seismic damage distribution of 2015 Ms 8.1 Nepal earthquake, as an example of the above solution, is evaluated by using the high resolution digital RS images, auxiliary geographic information and ground survey. The results are compared with the statistical disaster information issued by the ground truth by field surveying, and show good consistency.
NASA Astrophysics Data System (ADS)
Vimal, S.; Tarboton, D. G.; Band, L. E.; Duncan, J. M.; Lovette, J. P.; Corzo, G.; Miles, B.
2015-12-01
Prioritizing river restoration requires information on river geometry. In many states in the US detailed river geometry has been collected for floodplain mapping and is available in Flood Risk Information Systems (FRIS). In particular, North Carolina has, for its 100 Counties, developed a database of numerous HEC-RAS models which are available through its Flood Risk Information System (FRIS). These models that include over 260 variables were developed and updated by numerous contractors. They contain detailed surveyed or LiDAR derived cross-sections and modeled flood extents for different extreme event return periods. In this work, over 4700 HEC-RAS models' data was integrated and upscaled to utilize detailed cross-section information and 100-year modelled flood extent information to enable river restoration prioritization for the entire state of North Carolina. We developed procedures to extract geomorphic properties such as entrenchment ratio, incision ratio, etc. from these models. Entrenchment ratio quantifies the vertical containment of rivers and thereby their vulnerability to flooding and incision ratio quantifies the depth per unit width. A map of entrenchment ratio for the whole state was derived by linking these model results to a geodatabase. A ranking of highly entrenched counties enabling prioritization for flood allowance and mitigation was obtained. The results were shared through HydroShare and web maps developed for their visualization using Google Maps Engine API.
Challenges in Managing Information Extraction
ERIC Educational Resources Information Center
Shen, Warren H.
2009-01-01
This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…
Toxics Release Inventory Chemical Hazard Information Profiles (TRI-CHIP) Dataset
The Toxics Release Inventory (TRI) Chemical Hazard Information Profiles (TRI-CHIP) dataset contains hazard information about the chemicals reported in TRI. Users can use this XML-format dataset to create their own databases and hazard analyses of TRI chemicals. The hazard information is compiled from a series of authoritative sources including the Integrated Risk Information System (IRIS). The dataset is provided as a downloadable .zip file that when extracted provides XML files and schemas for the hazard information tables.
Liu, Bo; Wu, Huayi; Wang, Yandong; Liu, Wenming
2015-01-01
Main road features extracted from remotely sensed imagery play an important role in many civilian and military applications, such as updating Geographic Information System (GIS) databases, urban structure analysis, spatial data matching and road navigation. Current methods for road feature extraction from high-resolution imagery are typically based on threshold value segmentation. It is difficult however, to completely separate road features from the background. We present a new method for extracting main roads from high-resolution grayscale imagery based on directional mathematical morphology and prior knowledge obtained from the Volunteered Geographic Information found in the OpenStreetMap. The two salient steps in this strategy are: (1) using directional mathematical morphology to enhance the contrast between roads and non-roads; (2) using OpenStreetMap roads as prior knowledge to segment the remotely sensed imagery. Experiments were conducted on two ZiYuan-3 images and one QuickBird high-resolution grayscale image to compare our proposed method to other commonly used techniques for road feature extraction. The results demonstrated the validity and better performance of the proposed method for urban main road feature extraction. PMID:26397832
The architecture of the management system of complex steganographic information
NASA Astrophysics Data System (ADS)
Evsutin, O. O.; Meshcheryakov, R. V.; Kozlova, A. S.; Solovyev, T. M.
2017-01-01
The aim of the study is to create a wide area information system that allows one to control processes of generation, embedding, extraction, and detection of steganographic information. In this paper, the following problems are considered: the definition of the system scope and the development of its architecture. For creation of algorithmic maintenance of the system, classic methods of steganography are used to embed information. Methods of mathematical statistics and computational intelligence are used to identify the embedded information. The main result of the paper is the development of the architecture of the management system of complex steganographic information. The suggested architecture utilizes cloud technology in order to provide service using the web-service via the Internet. It is meant to provide streams of multimedia data processing that are streams with many sources of different types. The information system, built in accordance with the proposed architecture, will be used in the following areas: hidden transfer of documents protected by medical secrecy in telemedicine systems; copyright protection of online content in public networks; prevention of information leakage caused by insiders.
Extraction of endoscopic images for biomedical figure classification
NASA Astrophysics Data System (ADS)
Xue, Zhiyun; You, Daekeun; Chachra, Suchet; Antani, Sameer; Long, L. R.; Demner-Fushman, Dina; Thoma, George R.
2015-03-01
Modality filtering is an important feature in biomedical image searching systems and may significantly improve the retrieval performance of the system. This paper presents a new method for extracting endoscopic image figures from photograph images in biomedical literature, which are found to have highly diverse content and large variability in appearance. Our proposed method consists of three main stages: tissue image extraction, endoscopic image candidate extraction, and ophthalmic image filtering. For tissue image extraction we use image patch level clustering and MRF relabeling to detect images containing skin/tissue regions. Next, we find candidate endoscopic images by exploiting the round shape characteristics that commonly appear in these images. However, this step needs to compensate for images where endoscopic regions are not entirely round. In the third step we filter out the ophthalmic images which have shape characteristics very similar to the endoscopic images. We do this by using text information, specifically, anatomy terms, extracted from the figure caption. We tested and evaluated our method on a dataset of 115,370 photograph figures, and achieved promising precision and recall rates of 87% and 84%, respectively.
Gas chromatography-mass spectrometry of biofluids and extracts.
Emwas, Abdul-Hamid M; Al-Talla, Zeyad A; Yang, Yang; Kharbatia, Najeh M
2015-01-01
Gas chromatography-mass spectrometry (GC-MS) has been widely used in metabonomics analyses of biofluid samples. Biofluids provide a wealth of information about the metabolism of the whole body and from multiple regions of the body that can be used to study general health status and organ function. Blood serum and blood plasma, for example, can provide a comprehensive picture of the whole body, while urine can be used to monitor the function of the kidneys, and cerebrospinal fluid (CSF) will provide information about the status of the brain and central nervous system (CNS). Different methods have been developed for the extraction of metabolites from biofluids, these ranging from solvent extracts, acids, heat denaturation, and filtration. These methods vary widely in terms of efficiency of protein removal and in the number of metabolites extracted. Consequently, for all biofluid-based metabonomics studies, it is vital to optimize and standardize all steps of sample preparation, including initial extraction of metabolites. In this chapter, recommendations are made of the optimum experimental conditions for biofluid samples for GC-MS, with a particular focus on blood serum and plasma samples.
Enhancing biomedical text summarization using semantic relation extraction.
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
Patterson, Olga V; Freiberg, Matthew S; Skanderson, Melissa; J Fodeh, Samah; Brandt, Cynthia A; DuVall, Scott L
2017-06-12
In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.
D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D
2011-01-01
Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.
Impact of a surgical site infection (SSI) surveillance program in orthopedics and traumatology.
Mabit, C; Marcheix, P S; Mounier, M; Dijoux, P; Pestourie, N; Bonnevialle, P; Bonnomet, F
2012-10-01
Surveillance of surgical site infections (SSI) is a priority. One of the fundamental principles for the surveillance of SSI is based on receiving effective field feedback (retro-information). The aim of this study was to report the results of a program of SSI surveillance and validate the hypothesis that there is a correlation between creating a SSI surveillance program and a reduction in SSI. The protocol was based on the weekly collection of surveillance data obtained directly from the different information systems in different departments. A delay of 3 months was established before extraction and analysis of data and information from the surgical teams. The NNIS index (National Nosocomial Infections Surveillance System) developed by the American surveillance system and the reduction of length of hospital stay index Journées d'hospitalisation évitées (JHE). Since the end of 2009, 7156 surgical procedures were evaluated (rate of inclusion 97.3%), and 84 SSI were registered with a significant decrease over time from 1.86% to 0.66%. A total of 418 days of hospitalization have been saved since the beginning of the surveillance system. Our surveillance system has three strong points: follow-up is continuous, specifically adapted to orthopedic traumatology and nearly exhaustive. The extraction of data directly from hospital information systems effectively improves the collection of data on surgical procedures. The implementation of a SSI surveillance protocol reduces SSI. Level III. Prospective study. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
ECO: A Framework for Entity Co-Occurrence Exploration with Faceted Navigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halliday, K. D.
2010-08-20
Even as highly structured databases and semantic knowledge bases become more prevalent, a substantial amount of human knowledge is reported as written prose. Typical textual reports, such as news articles, contain information about entities (people, organizations, and locations) and their relationships. Automatically extracting such relationships from large text corpora is a key component of corporate and government knowledge bases. The primary goal of the ECO project is to develop a scalable framework for extracting and presenting these relationships for exploration using an easily navigable faceted user interface. ECO uses entity co-occurrence relationships to identify related entities. The system aggregates andmore » indexes information on each entity pair, allowing the user to rapidly discover and mine relational information.« less
An approach to 3D model fusion in GIS systems and its application in a future ECDIS
NASA Astrophysics Data System (ADS)
Liu, Tao; Zhao, Depeng; Pan, Mingyang
2016-04-01
Three-dimensional (3D) computer graphics technology is widely used in various areas and causes profound changes. As an information carrier, 3D models are becoming increasingly important. The use of 3D models greatly helps to improve the cartographic expression and design. 3D models are more visually efficient, quicker and easier to understand and they can express more detailed geographical information. However, it is hard to efficiently and precisely fuse 3D models in local systems. The purpose of this study is to propose an automatic and precise approach to fuse 3D models in geographic information systems (GIS). It is the basic premise for subsequent uses of 3D models in local systems, such as attribute searching, spatial analysis, and so on. The basic steps of our research are: (1) pose adjustment by principal component analysis (PCA); (2) silhouette extraction by simple mesh silhouette extraction and silhouette merger; (3) size adjustment; (4) position matching. Finally, we implement the above methods in our system Automotive Intelligent Chart (AIC) 3D Electronic Chart Display and Information Systems (ECDIS). The fusion approach we propose is a common method and each calculation step is carefully designed. This approach solves the problem of cross-platform model fusion. 3D models can be from any source. They may be stored in the local cache or retrieved from Internet, or may be manually created by different tools or automatically generated by different programs. The system can be any kind of 3D GIS system.
Foreign Language Analysis and Recognition (FLARE) Progress
2015-02-01
Copies may be obtained from the Defense Technical Information Center (DTIC) (http://www.dtic.mil). AFRL- RH -WP-TR-2015-0007 HAS BEEN REVIEWED AND IS... retrieval (IR). 15. SUBJECT TERMS Automatic speech recognition (ASR), information retrieval (IR). 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF...to the Haystack Multilingual Multimedia Information Extraction and Retrieval (MMIER) system that was initially developed under a prior work unit
Intelligent Semantic Query of Notices to Airmen (NOTAMs)
2006-07-01
definition of the airspace is constantly changing, new vocabulary is added and old words retired on a monthly basis, and the information specifying this is...NOTAMs are notices containing information on the conditions, or changes to, aeronautical facilities, services, procedures, or hazards, which are...develop a new parsing system, employing and extending ideas developed by the information-extraction community, rather than on classical computational
NASA Astrophysics Data System (ADS)
Lv, Zheng; Sui, Haigang; Zhang, Xilin; Huang, Xianfeng
2007-11-01
As one of the most important geo-spatial objects and military establishment, airport is always a key target in fields of transportation and military affairs. Therefore, automatic recognition and extraction of airport from remote sensing images is very important and urgent for updating of civil aviation and military application. In this paper, a new multi-source data fusion approach on automatic airport information extraction, updating and 3D modeling is addressed. Corresponding key technologies including feature extraction of airport information based on a modified Ostu algorithm, automatic change detection based on new parallel lines-based buffer detection algorithm, 3D modeling based on gradual elimination of non-building points algorithm, 3D change detecting between old airport model and LIDAR data, typical CAD models imported and so on are discussed in detail. At last, based on these technologies, we develop a prototype system and the results show our method can achieve good effects.
Recommendation System Based On Association Rules For Distributed E-Learning Management Systems
NASA Astrophysics Data System (ADS)
Mihai, Gabroveanu
2015-09-01
Traditional Learning Management Systems are installed on a single server where learning materials and user data are kept. To increase its performance, the Learning Management System can be installed on multiple servers; learning materials and user data could be distributed across these servers obtaining a Distributed Learning Management System. In this paper is proposed the prototype of a recommendation system based on association rules for Distributed Learning Management System. Information from LMS databases is analyzed using distributed data mining algorithms in order to extract the association rules. Then the extracted rules are used as inference rules to provide personalized recommendations. The quality of provided recommendations is improved because the rules used to make the inferences are more accurate, since these rules aggregate knowledge from all e-Learning systems included in Distributed Learning Management System.
a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information
NASA Astrophysics Data System (ADS)
Lian, Shizhong; Chen, Jiangping; Luo, Minghai
2016-06-01
Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.
An Expertise Recommender using Web Mining
NASA Technical Reports Server (NTRS)
Joshi, Anupam; Chandrasekaran, Purnima; ShuYang, Michelle; Ramakrishnan, Ramya
2001-01-01
This report explored techniques to mine web pages of scientists to extract information regarding their expertise, build expertise chains and referral webs, and semi automatically combine this information with directory information services to create a recommender system that permits query by expertise. The approach included experimenting with existing techniques that have been reported in research literature in recent past , and adapted them as needed. In addition, software tools were developed to capture and use this information.
Remote Video Monitor of Vehicles in Cooperative Information Platform
NASA Astrophysics Data System (ADS)
Qin, Guofeng; Wang, Xiaoguo; Wang, Li; Li, Yang; Li, Qiyan
Detection of vehicles plays an important role in the area of the modern intelligent traffic management. And the pattern recognition is a hot issue in the area of computer vision. An auto- recognition system in cooperative information platform is studied. In the cooperative platform, 3G wireless network, including GPS, GPRS (CDMA), Internet (Intranet), remote video monitor and M-DMB networks are integrated. The remote video information can be taken from the terminals and sent to the cooperative platform, then detected by the auto-recognition system. The images are pretreated and segmented, including feature extraction, template matching and pattern recognition. The system identifies different models and gets vehicular traffic statistics. Finally, the implementation of the system is introduced.
Ro, Chul-Un; Kim, HyeKyeong; Van Grieken, René
2004-03-01
An electron probe X-ray microanalysis (EPMA) technique, using an energy-dispersive X-ray detector with an ultrathin window, designated a low-Z particle EPMA, has been developed. The low-Z particle EPMA allows the quantitative determination of concentrations of low-Z elements, such as C, N, and O, as well as chemical elements that can be analyzed by conventional energy-dispersive EPMA, in individual particles. Since a data set is usually composed of data for several thousands of particles in order to make environmentally meaningful observations of real atmospheric aerosol samples, the development of a method that fully extracts chemical information contained in the low-Z particle EPMA data is important. An expert system that can rapidly and reliably perform chemical speciation from the low-Z particle EPMA data is presented. This expert system tries to mimic the logic used by experts and is implemented by applying macroprogramming available in MS Excel software. Its feasibility is confirmed by applying the expert system to data for various types of standard particles and a real atmospheric aerosol sample. By applying the expert system, the time necessary for chemical speciation becomes shortened very much and detailed information on particle data can be saved and extracted later if more information is needed for further analysis.
An automatic system to detect and extract texts in medical images for de-identification
NASA Astrophysics Data System (ADS)
Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael
2010-03-01
Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.
a R-Shiny Based Phenology Analysis System and Case Study Using Digital Camera Dataset
NASA Astrophysics Data System (ADS)
Zhou, Y. K.
2018-05-01
Accurate extracting of the vegetation phenology information play an important role in exploring the effects of climate changes on vegetation. Repeated photos from digital camera is a useful and huge data source in phonological analysis. Data processing and mining on phenological data is still a big challenge. There is no single tool or a universal solution for big data processing and visualization in the field of phenology extraction. In this paper, we proposed a R-shiny based web application for vegetation phenological parameters extraction and analysis. Its main functions include phenological site distribution visualization, ROI (Region of Interest) selection, vegetation index calculation and visualization, data filtering, growth trajectory fitting, phenology parameters extraction, etc. the long-term observation photography data from Freemanwood site in 2013 is processed by this system as an example. The results show that: (1) this system is capable of analyzing large data using a distributed framework; (2) The combination of multiple parameter extraction and growth curve fitting methods could effectively extract the key phenology parameters. Moreover, there are discrepancies between different combination methods in unique study areas. Vegetation with single-growth peak is suitable for using the double logistic module to fit the growth trajectory, while vegetation with multi-growth peaks should better use spline method.
DuVall, Scott L; South, Brett R; Bray, Bruce E; Bolton, Daniel; Heavirland, Julia; Pickard, Steve; Heidenreich, Paul; Shen, Shuying; Weir, Charlene; Samore, Matthew; Goldstein, Mary K
2012-01-01
Objectives Left ventricular ejection fraction (EF) is a key component of heart failure quality measures used within the Department of Veteran Affairs (VA). Our goals were to build a natural language processing system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to validate the accuracy of the system using a comparison reference standard developed through human review. This project was a Translational Use Case Project within the VA Consortium for Healthcare Informatics. Materials and methods We created a set of regular expressions and rules to capture the EF using a random sample of 765 echocardiograms from seven VA medical centers. The documents were randomly assigned to two sets: a set of 275 used for training and a second set of 490 used for testing and validation. To establish the reference standard, two independent reviewers annotated all documents in both sets; a third reviewer adjudicated disagreements. Results System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9% (95% CI 87.7% to 90.0%), a positive predictive value of 95% (95% CI 94.2% to 95.9%), and an F measure of 91.9% (95% CI 91.2% to 92.7%). Discussion An EF value of <40% can be accurately identified in VA echocardiogram reports. Conclusions An automated information extraction system can be used to accurately extract EF for quality measurement. PMID:22437073
NEEDS - Information Adaptive System
NASA Technical Reports Server (NTRS)
Kelly, W. L.; Benz, H. F.; Meredith, B. D.
1980-01-01
The Information Adaptive System (IAS) is an element of the NASA End-to-End Data System (NEEDS) Phase II and is focused toward onboard image processing. The IAS is a data preprocessing system which is closely coupled to the sensor system. Some of the functions planned for the IAS include sensor response nonuniformity correction, geometric correction, data set selection, data formatting, packetization, and adaptive system control. The inclusion of these sensor data preprocessing functions onboard the spacecraft will significantly improve the extraction of information from the sensor data in a timely and cost effective manner, and provide the opportunity to design sensor systems which can be reconfigured in near real-time for optimum performance. The purpose of this paper is to present the preliminary design of the IAS and the plans for its development.
Integrated Computational System for Aerodynamic Steering and Visualization
NASA Technical Reports Server (NTRS)
Hesselink, Lambertus
1999-01-01
In February of 1994, an effort from the Fluid Dynamics and Information Sciences Divisions at NASA Ames Research Center with McDonnel Douglas Aerospace Company and Stanford University was initiated to develop, demonstrate, validate and disseminate automated software for numerical aerodynamic simulation. The goal of the initiative was to develop a tri-discipline approach encompassing CFD, Intelligent Systems, and Automated Flow Feature Recognition to improve the utility of CFD in the design cycle. This approach would then be represented through an intelligent computational system which could accept an engineer's definition of a problem and construct an optimal and reliable CFD solution. Stanford University's role focused on developing technologies that advance visualization capabilities for analysis of CFD data, extract specific flow features useful for the design process, and compare CFD data with experimental data. During the years 1995-1997, Stanford University focused on developing techniques in the area of tensor visualization and flow feature extraction. Software libraries were created enabling feature extraction and exploration of tensor fields. As a proof of concept, a prototype system called the Integrated Computational System (ICS) was developed to demonstrate CFD design cycle. The current research effort focuses on finding a quantitative comparison of general vector fields based on topological features. Since the method relies on topological information, grid matching and vector alignment is not needed in the comparison. This is often a problem with many data comparison techniques. In addition, since only topology based information is stored and compared for each field, there is a significant compression of information that enables large databases to be quickly searched. This report will (1) briefly review the technologies developed during 1995-1997 (2) describe current technologies in the area of comparison techniques, (4) describe the theory of our new method researched during the grant year (5) summarize a few of the results and finally (6) discuss work within the last 6 months that are direct extensions from the grant.
Cho, Minwoo; Kim, Jee Hyun; Kong, Hyoun Joong; Hong, Kyoung Sup; Kim, Sungwan
2018-05-01
The colonoscopy adenoma detection rate depends largely on physician experience and skill, and overlooked colorectal adenomas could develop into cancer. This study assessed a system that detects polyps and summarizes meaningful information from colonoscopy videos. One hundred thirteen consecutive patients had colonoscopy videos prospectively recorded at the Seoul National University Hospital. Informative video frames were extracted using a MATLAB support vector machine (SVM) model and classified as bleeding, polypectomy, tool, residue, thin wrinkle, folded wrinkle, or common. Thin wrinkle, folded wrinkle, and common frames were reanalyzed using SVM for polyp detection. The SVM model was applied hierarchically for effective classification and optimization of the SVM. The mean classification accuracy according to type was over 93%; sensitivity was over 87%. The mean sensitivity for polyp detection was 82.1%, and the positive predicted value (PPV) was 39.3%. Polyps detected using the system were larger (6.3 ± 6.4 vs. 4.9 ± 2.5 mm; P = 0.003) with a more pedunculated morphology (Yamada type III, 10.2 vs. 0%; P < 0.001; Yamada type IV, 2.8 vs. 0%; P < 0.001) than polyps missed by the system. There were no statistically significant differences in polyp distribution or histology between the groups. Informative frames and suspected polyps were presented on a timeline. This summary was evaluated using the system usability scale questionnaire; 89.3% of participants expressed positive opinions. We developed and verified a system to extract meaningful information from colonoscopy videos. Although further improvement and validation of the system is needed, the proposed system is useful for physicians and patients.
Layout-aware text extraction from full-text PDF of scientific articles
2012-01-01
Background The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. Conclusions LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/. PMID:22640904
Research of infrared laser based pavement imaging and crack detection
NASA Astrophysics Data System (ADS)
Hong, Hanyu; Wang, Shu; Zhang, Xiuhua; Jing, Genqiang
2013-08-01
Road crack detection is seriously affected by many factors in actual applications, such as some shadows, road signs, oil stains, high frequency noise and so on. Due to these factors, the current crack detection methods can not distinguish the cracks in complex scenes. In order to solve this problem, a novel method based on infrared laser pavement imaging is proposed. Firstly, single sensor laser pavement imaging system is adopted to obtain pavement images, high power laser line projector is well used to resist various shadows. Secondly, the crack extraction algorithm which has merged multiple features intelligently is proposed to extract crack information. In this step, the non-negative feature and contrast feature are used to extract the basic crack information, and circular projection based on linearity feature is applied to enhance the crack area and eliminate noise. A series of experiments have been performed to test the proposed method, which shows that the proposed automatic extraction method is effective and advanced.
Extracting Databases from Dark Data with DeepDive.
Zhang, Ce; Shin, Jaeho; Ré, Christopher; Cafarella, Michael; Niu, Feng
2016-01-01
DeepDive is a system for extracting relational databases from dark data : the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data - scientific papers, Web classified ads, customer service notes, and so on - were instead in a relational database, it would give analysts a massive and valuable new set of "big data." DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference.
Extraction of In-Medium Nucleon-Nucleon Amplitude From Experiment
NASA Technical Reports Server (NTRS)
Tripathi, R. K.; Cucinotta, Francis A.; Wilson, John W.
1998-01-01
The in-medium nucleon-nucleon amplitudes are extracted from the available proton-nucleus total reaction cross sections data. The retrieval of the information from the experiment makes the estimate of reaction cross sections very reliable. Simple expressions are given for the in-medium nucleon-nucleon amplitudes for any system of colliding nuclei as a function of energy. Excellent agreement with experimental observations is demonstrated in the ion-nucleus interactions.
Compressive Information Extraction: A Dynamical Systems Approach
2016-01-24
sparsely encoded in very large data streams. (a) Target tracking in an urban canyon; (b) and (c) sample frames showing contextually abnormal events: onset...extraction to identify contextually abnormal se- quences (see section 2.2.3). Formally, the problem of interest can be stated as establishing whether a noisy...relaxations with optimality guarantees can be obtained using tools from semi-algebraic geometry. 2.2 Application: Detecting Contextually Abnormal Events
Nguyen, Dat Tien; Hong, Hyung Gil; Kim, Ki Wan; Park, Kang Ryoung
2017-03-16
The human body contains identity information that can be used for the person recognition (verification/recognition) problem. In this paper, we propose a person recognition method using the information extracted from body images. Our research is novel in the following three ways compared to previous studies. First, we use the images of human body for recognizing individuals. To overcome the limitations of previous studies on body-based person recognition that use only visible light images for recognition, we use human body images captured by two different kinds of camera, including a visible light camera and a thermal camera. The use of two different kinds of body image helps us to reduce the effects of noise, background, and variation in the appearance of a human body. Second, we apply a state-of-the art method, called convolutional neural network (CNN) among various available methods, for image features extraction in order to overcome the limitations of traditional hand-designed image feature extraction methods. Finally, with the extracted image features from body images, the recognition task is performed by measuring the distance between the input and enrolled samples. The experimental results show that the proposed method is efficient for enhancing recognition accuracy compared to systems that use only visible light or thermal images of the human body.
Investigation related to multispectral imaging systems
NASA Technical Reports Server (NTRS)
Nalepka, R. F.; Erickson, J. D.
1974-01-01
A summary of technical progress made during a five year research program directed toward the development of operational information systems based on multispectral sensing and the use of these systems in earth-resource survey applications is presented. Efforts were undertaken during this program to: (1) improve the basic understanding of the many facets of multispectral remote sensing, (2) develop methods for improving the accuracy of information generated by remote sensing systems, (3) improve the efficiency of data processing and information extraction techniques to enhance the cost-effectiveness of remote sensing systems, (4) investigate additional problems having potential remote sensing solutions, and (5) apply the existing and developing technology for specific users and document and transfer that technology to the remote sensing community.
Managing interoperability and complexity in health systems.
Bouamrane, M-M; Tao, C; Sarkar, I N
2015-01-01
In recent years, we have witnessed substantial progress in the use of clinical informatics systems to support clinicians during episodes of care, manage specialised domain knowledge, perform complex clinical data analysis and improve the management of health organisations' resources. However, the vision of fully integrated health information eco-systems, which provide relevant information and useful knowledge at the point-of-care, remains elusive. This journal Focus Theme reviews some of the enduring challenges of interoperability and complexity in clinical informatics systems. Furthermore, a range of approaches are proposed in order to address, harness and resolve some of the many remaining issues towards a greater integration of health information systems and extraction of useful or new knowledge from heterogeneous electronic data repositories.
Fusion of monocular cues to detect man-made structures in aerial imagery
NASA Technical Reports Server (NTRS)
Shufelt, Jefferey; Mckeown, David M.
1991-01-01
The extraction of buildings from aerial imagery is a complex problem for automated computer vision. It requires locating regions in a scene that possess properties distinguishing them as man-made objects as opposed to naturally occurring terrain features. It is reasonable to assume that no single detection method can correctly delineate or verify buildings in every scene. A cooperative-methods paradigm is useful in approaching the building extraction problem. Using this paradigm, each extraction technique provides information which can be added or assimilated into an overall interpretation of the scene. Thus, the main objective is to explore the development of computer vision system that integrates the results of various scene analysis techniques into an accurate and robust interpretation of the underlying three dimensional scene. The problem of building hypothesis fusion in aerial imagery is discussed. Building extraction techniques are briefly surveyed, including four building extraction, verification, and clustering systems. A method for fusing the symbolic data generated by these systems is described, and applied to monocular image and stereo image data sets. Evaluation methods for the fusion results are described, and the fusion results are analyzed using these methods.
Neutron Polarization Analysis for Biphasic Solvent Extraction Systems
Motokawa, Ryuhei; Endo, Hitoshi; Nagao, Michihiro; ...
2016-06-16
Here we performed neutron polarization analysis (NPA) of extracted organic phases containing complexes, comprised of Zr(NO 3) 4 and tri-n-butyl phosphate, which enabled decomposition of the intensity distribution of small-angle neutron scattering (SANS) into the coherent and incoherent scattering components. The coherent scattering intensity, containing structural information, and the incoherent scattering compete over a wide range of magnitude of scattering vector, q, specifically when q is larger than q* ≈ 1/R g, where R g is the radius of gyration of scatterer. Therefore, it is important to determine the incoherent scattering intensity exactly to perform an accurate structural analysis frommore » SANS data when R g is small, such as the aforementioned extracted coordination species. Although NPA is the best method for evaluating the incoherent scattering component for accurately determining the coherent scattering in SANS, this method is not used frequently in SANS data analysis because it is technically challenging. In this study, we successfully demonstrated that experimental determination of the incoherent scattering using NPA is suitable for sample systems containing a small scatterer with a weak coherent scattering intensity, such as extracted complexes in biphasic solvent extraction systems.« less
Misra, Dharitri; Chen, Siyuan; Thoma, George R
2009-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
Landsat surface reflectance quality assurance extraction (version 1.7)
Jones, J.W.; Starbuck, M.J.; Jenkerson, Calli B.
2013-01-01
The U.S. Geological Survey (USGS) Land Remote Sensing Program is developing an operational capability to produce Climate Data Records (CDRs) and Essential Climate Variables (ECVs) from the Landsat Archive to support a wide variety of science and resource management activities from regional to global scale. The USGS Earth Resources Observation and Science (EROS) Center is charged with prototyping systems and software to generate these high-level data products. Various USGS Geographic Science Centers are charged with particular ECV algorithm development and (or) selection as well as the evaluation and application demonstration of various USGS CDRs and ECVs. Because it is a foundation for many other ECVs, the first CDR in development is the Landsat Surface Reflectance Product (LSRP). The LSRP incorporates data quality information in a bit-packed structure that is not readily accessible without postprocessing services performed by the user. This document describes two general methods of LSRP quality-data extraction for use in image processing systems. Helpful hints for the installation and use of software originally developed for manipulation of Hierarchical Data Format (HDF) produced through the National Aeronautics and Space Administration (NASA) Earth Observing System are first provided for users who wish to extract quality data into separate HDF files. Next, steps follow to incorporate these extracted data into an image processing system. Finally, an alternative example is illustrated in which the data are extracted within a particular image processing system.
Feature extraction via KPCA for classification of gait patterns.
Wu, Jianning; Wang, Jue; Liu, Li
2007-06-01
Automated recognition of gait pattern change is important in medical diagnostics as well as in the early identification of at-risk gait in the elderly. We evaluated the use of Kernel-based Principal Component Analysis (KPCA) to extract more gait features (i.e., to obtain more significant amounts of information about human movement) and thus to improve the classification of gait patterns. 3D gait data of 24 young and 24 elderly participants were acquired using an OPTOTRAK 3020 motion analysis system during normal walking, and a total of 36 gait spatio-temporal and kinematic variables were extracted from the recorded data. KPCA was used first for nonlinear feature extraction to then evaluate its effect on a subsequent classification in combination with learning algorithms such as support vector machines (SVMs). Cross-validation test results indicated that the proposed technique could allow spreading the information about the gait's kinematic structure into more nonlinear principal components, thus providing additional discriminatory information for the improvement of gait classification performance. The feature extraction ability of KPCA was affected slightly with different kernel functions as polynomial and radial basis function. The combination of KPCA and SVM could identify young-elderly gait patterns with 91% accuracy, resulting in a markedly improved performance compared to the combination of PCA and SVM. These results suggest that nonlinear feature extraction by KPCA improves the classification of young-elderly gait patterns, and holds considerable potential for future applications in direct dimensionality reduction and interpretation of multiple gait signals.
Aqueous biphasic systems in the separation of food colorants.
Santos, João H P M; Capela, Emanuel V; Boal-Palheiros, Isabel; Coutinho, João A P; Freire, Mara G; Ventura, Sónia P M
2018-04-25
Aqueous biphasic systems (ABS) composed of polypropylene glycol and carbohydrates, two benign substances are proposed to separate two food colorants (E122 and E133). ABS are promising extractive platforms, particularly for biomolecules, due to their aqueous and mild nature (pH and temperature), reduced environmental impact and processing costs. Another major aspect considered, particularly useful in downstream processing, is the "tuning" ability for the extraction and purification of these systems by a proper choice of the ABS components. In this work, our intention is to show the concept of ABS as an alternative and volatile organic solvent-free tool to separate two different biomolecules in a simple way, so simple that teachers can effectively adopt it in their classes to explain the concept of bioseparation processes. Informative documents and general information about the preparation of binodal curves and their use in the partition of biomolecules is available in this work to be used by teachers in their classes. In this sense, the students use different carbohydrates to build ABS, then study the partition of two food color dyes (synthetic origin), thus evaluating their ability on the separation of both food colorants. Through these experiments, the students get acquainted with ABS, learn how to determine solubility curves and perform extraction procedures using colorant food additives, that can also be applied in the extraction of various (bio)molecules. © 2018 by The International Union of Biochemistry and Molecular Biology, 2018. © 2018 The International Union of Biochemistry and Molecular Biology.
NASA Technical Reports Server (NTRS)
Enslin, William R.; Ton, Jezching; Jain, Anil
1987-01-01
Landsat TM data were combined with land cover and planimetric data layers contained in the State of Michigan's geographic information system (GIS) to identify changes in forestlands, specifically new oil/gas wells. A GIS-guided feature-based classification method was developed. The regions extracted by the best image band/operator combination were studied using a set of rules based on the characteristics of the GIS oil/gas pads.
Automated endoscopic navigation and advisory system from medical image
NASA Astrophysics Data System (ADS)
Kwoh, Chee K.; Khan, Gul N.; Gillies, Duncan F.
1999-05-01
In this paper, we present a review of the research conducted by our group to design an automatic endoscope navigation and advisory system. The whole system can be viewed as a two-layer system. The first layer is at the signal level, which consists of the processing that will be performed on a series of images to extract all the identifiable features. The information is purely dependent on what can be extracted from the 'raw' images. At the signal level, the first task is performed by detecting a single dominant feature, lumen. Few methods of identifying the lumen are proposed. The first method used contour extraction. Contours are extracted by edge detection, thresholding and linking. This method required images to be divided into overlapping squares (8 by 8 or 4 by 4) where line segments are extracted by using a Hough transform. Perceptual criteria such as proximity, connectivity, similarity in orientation, contrast and edge pixel intensity, are used to group edges both strong and weak. This approach is called perceptual grouping. The second method is based on a region extraction using split and merge approach using spatial domain data. An n-level (for a 2' by 2' image) quadtree based pyramid structure is constructed to find the most homogenous large dark region, which in most cases corresponds to the lumen. The algorithm constructs the quadtree from the bottom (pixel) level upward, recursively and computes the mean and variance of image regions corresponding to quadtree nodes. On reaching the root, the largest uniform seed region, whose mean corresponds to a lumen is selected that is grown by merging with its neighboring regions. In addition to the use of two- dimensional information in the form of regions and contours, three-dimensional shape can provide additional information that will enhance the system capabilities. Shape or depth information from an image is estimated by various methods. A particular technique suitable for endoscopy is the shape from shading, which is developed to obtain the relative depth of the colon surface in the image by assuming a point light source very close to the camera. If we assume the colon has a shape similar to a tube, then a reasonable approximation of the position of the center of the colon (lumen) will be a function of the direction in which the majority of the normal vectors of shape are pointing. The second layer is the control layer and at this level, a decision model must be built for endoscope navigation and advisory system. The system that we built is the models of probabilistic networks that create a basic, artificial intelligence system for navigation in the colon. We have constructed the probabilistic networks from correlated objective data using the maximum weighted spanning tree algorithm. In the construction of a probabilistic network, it is always assumed that the variables starting from the same parent are conditionally independent. However, this may not hold and will give rise to incorrect inferences. In these cases, we proposed the creation of a hidden node to modify the network topology, which in effect models the dependency of correlated variables, to solve the problem. The conditional probability matrices linking the hidden node to its neighbors are determined using a gradient descent method which minimizing the objective cost function. The error gradients can be treated as updating messages and ca be propagated in any direction throughout any singly connected network to adjust the network parameters. With the above two- level approach, we have been able to build an automated endoscope navigation and advisory system successfully.
Information-reality complementarity in photonic weak measurements
NASA Astrophysics Data System (ADS)
Mancino, Luca; Sbroscia, Marco; Roccia, Emanuele; Gianani, Ilaria; Cimini, Valeria; Paternostro, Mauro; Barbieri, Marco
2018-06-01
The emergence of realistic properties is a key problem in understanding the quantum-to-classical transition. In this respect, measurements represent a way to interface quantum systems with the macroscopic world: these can be driven in the weak regime, where a reduced back-action can be imparted by choosing meter states able to extract different amounts of information. Here we explore the implications of such weak measurement for the variation of realistic properties of two-level quantum systems pre- and postmeasurement, and extend our investigations to the case of open systems implementing the measurements.
Large deviation analysis of a simple information engine
NASA Astrophysics Data System (ADS)
Maitland, Michael; Grosskinsky, Stefan; Harris, Rosemary J.
2015-11-01
Information thermodynamics provides a framework for studying the effect of feedback loops on entropy production. It has enabled the understanding of novel thermodynamic systems such as the information engine, which can be seen as a modern version of "Maxwell's Dæmon," whereby a feedback controller processes information gained by measurements in order to extract work. Here, we analyze a simple model of such an engine that uses feedback control based on measurements to obtain negative entropy production. We focus on the distribution and fluctuations of the information obtained by the feedback controller. Significantly, our model allows an analytic treatment for a two-state system with exact calculation of the large deviation rate function. These results suggest an approximate technique for larger systems, which is corroborated by simulation data.
Multiple-stage pure phase encoding with biometric information
NASA Astrophysics Data System (ADS)
Chen, Wen
2018-01-01
In recent years, many optical systems have been developed for securing information, and optical encryption/encoding has attracted more and more attention due to the marked advantages, such as parallel processing and multiple-dimensional characteristics. In this paper, an optical security method is presented based on pure phase encoding with biometric information. Biometric information (such as fingerprint) is employed as security keys rather than plaintext used in conventional optical security systems, and multiple-stage phase-encoding-based optical systems are designed for generating several phase-only masks with biometric information. Subsequently, the extracted phase-only masks are further used in an optical setup for encoding an input image (i.e., plaintext). Numerical simulations are conducted to illustrate the validity, and the results demonstrate that high flexibility and high security can be achieved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nilsson, Mikael
Advanced nuclear fuel cycles rely on successful chemical separation of various elements in the used fuel. Numerous solvent extraction (SX) processes have been developed for the recovery and purification of metal ions from this used material. However, the predictability of process operations has been challenged by the lack of a fundamental understanding of the chemical interactions in several of these separation systems. For example, gaps in the thermodynamic description of the mechanism and the complexes formed will make predictions very challenging. Recent studies of certain extraction systems under development and a number of more established SX processes have suggested thatmore » aggregate formation in the organic phase results in a transformation of its selectivity and efficiency. Aggregation phenomena have consistently been interfering in SX process development, and have, over the years, become synonymous with an undesirable effect that must be prevented. This multiyear, multicollaborative research effort was carried out to study solvation and self-organization in non-aqueous solutions at conditions promoting aggregation phenomena. Our approach to this challenging topic was to investigate extraction systems comprising more than one extraction reagent where synergy of the metal ion could be observed. These systems were probed for the existence of stable microemulsions in the organic phase, and a number of high-end characterization tools were employed to elucidate the role of the aggregates in metal ion extraction. The ultimate goal was to find connections between synergy of metal ion extraction and reverse micellar formation. Our main accomplishment for this project was the expansion of the understanding of metal ion complexation in the extraction system combining tributyl phosphate (TBP) and dibutyl phosphoric acid (HDBP). We have found that for this system no direct correlation exists for the metal ion extraction and the formation of aggregates, meaning that the metal ion is not solubilized in a reverse micelle core. Rather we have found solid evidence that the metal ions are extracted and coordinated by the organic ligands as suggested by classic SX theories. However, we have challenged the existence of mixed complexes that have been suggested to exist in this particular extraction system. Most importantly we have generated a wealth of information and trained students on important lab techniques and strengthened the collaboration between the DOE national laboratories and US educational institution involved in this work.« less
[A new tool for retrieving clinical data from various sources].
Nielsen, Erik Waage; Hovland, Anders; Strømsnes, Oddgeir
2006-02-23
A doctor's tool for extracting clinical data from various sources on groups of hospital patients into one file has been in demand. For this purpose we evaluated Qlikview. Based on clinical information required by two cardiologists, an IT specialist with thorough knowledge of the hospital's data system (www.dips.no) used 30 days to assemble one Qlikview file. Data was also assembled from a pre-hospital ambulance system. The 13 Mb Qlikview file held various information on 12430 patients admitted to the cardiac unit 26,287 times over the last 21 years. Included were also 530,912 clinical laboratory analyses from these patients during the past five years. Some information required by the cardiologists was inaccessible due to lack of coding or data storage. Some databases could not export their data. Others were encrypted by the software company. A major part of the required data could be extracted to Qlikview. Searches went fast in spite of the huge amount of data. Qlikview could assemble clinical information to doctors from different data systems. Doctors from different hospitals could share and further refine empty Qlikview files for their own use. When the file is assembled, doctors can, on their own, search for answers to constantly changing clinical questions, also at odd hours.
NASA Astrophysics Data System (ADS)
Parinsi, M. T.; Palilingan, V. R.; Sukardi; Surjono, H. D.
2018-02-01
This paper is aim to developing Information system for the labor market which specifically linking vocational schools (SMK) graduates and industries. The methods of this application using Research and Development (R&D) from Borg and Gall conducting in North Sulawesi Province in Indonesia. The result are reliable and acceptable for the graduate students. The Labor Market Information system (LMIS) can help the industries to find a labor/graduates that matched with the company requirement at a real time. SMK may have a benefit by extracting the Information from the application, they can prepare their students for the specific work in the industries. The next development of the application will designed to be available not only for the SMK graduate students.
Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S
2012-01-01
The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.
Multidimensional biochemical information processing of dynamical patterns
NASA Astrophysics Data System (ADS)
Hasegawa, Yoshihiko
2018-02-01
Cells receive signaling molecules by receptors and relay information via sensory networks so that they can respond properly depending on the type of signal. Recent studies have shown that cells can extract multidimensional information from dynamical concentration patterns of signaling molecules. We herein study how biochemical systems can process multidimensional information embedded in dynamical patterns. We model the decoding networks by linear response functions, and optimize the functions with the calculus of variations to maximize the mutual information between patterns and output. We find that, when the noise intensity is lower, decoders with different linear response functions, i.e., distinct decoders, can extract much information. However, when the noise intensity is higher, distinct decoders do not provide the maximum amount of information. This indicates that, when transmitting information by dynamical patterns, embedding information in multiple patterns is not optimal when the noise intensity is very large. Furthermore, we explore the biochemical implementations of these decoders using control theory and demonstrate that these decoders can be implemented biochemically through the modification of cascade-type networks, which are prevalent in actual signaling pathways.
Multidimensional biochemical information processing of dynamical patterns.
Hasegawa, Yoshihiko
2018-02-01
Cells receive signaling molecules by receptors and relay information via sensory networks so that they can respond properly depending on the type of signal. Recent studies have shown that cells can extract multidimensional information from dynamical concentration patterns of signaling molecules. We herein study how biochemical systems can process multidimensional information embedded in dynamical patterns. We model the decoding networks by linear response functions, and optimize the functions with the calculus of variations to maximize the mutual information between patterns and output. We find that, when the noise intensity is lower, decoders with different linear response functions, i.e., distinct decoders, can extract much information. However, when the noise intensity is higher, distinct decoders do not provide the maximum amount of information. This indicates that, when transmitting information by dynamical patterns, embedding information in multiple patterns is not optimal when the noise intensity is very large. Furthermore, we explore the biochemical implementations of these decoders using control theory and demonstrate that these decoders can be implemented biochemically through the modification of cascade-type networks, which are prevalent in actual signaling pathways.
Variability extraction and modeling for product variants.
Linsbauer, Lukas; Lopez-Herrejon, Roberto Erick; Egyed, Alexander
2017-01-01
Fast-changing hardware and software technologies in addition to larger and more specialized customer bases demand software tailored to meet very diverse requirements. Software development approaches that aim at capturing this diversity on a single consolidated platform often require large upfront investments, e.g., time or budget. Alternatively, companies resort to developing one variant of a software product at a time by reusing as much as possible from already-existing product variants. However, identifying and extracting the parts to reuse is an error-prone and inefficient task compounded by the typically large number of product variants. Hence, more disciplined and systematic approaches are needed to cope with the complexity of developing and maintaining sets of product variants. Such approaches require detailed information about the product variants, the features they provide and their relations. In this paper, we present an approach to extract such variability information from product variants. It identifies traces from features and feature interactions to their implementation artifacts, and computes their dependencies. This work can be useful in many scenarios ranging from ad hoc development approaches such as clone-and-own to systematic reuse approaches such as software product lines. We applied our variability extraction approach to six case studies and provide a detailed evaluation. The results show that the extracted variability information is consistent with the variability in our six case study systems given by their variability models and available product variants.
Pattern Mining for Extraction of mentions of Adverse Drug Reactions from User Comments
Nikfarjam, Azadeh; Gonzalez, Graciela H.
2011-01-01
Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%. PMID:22195162
Average Orientation Is More Accessible through Object Boundaries than Surface Features
ERIC Educational Resources Information Center
Choo, Heeyoung; Levinthal, Brian R.; Franconeri, Steven L.
2012-01-01
In a glance, the visual system can provide a summary of some kinds of information about objects in a scene. We explore how summary information about "orientation" is extracted and find that some representations of orientation are privileged over others. Participants judged the average orientation of either a set of 6 bars or 6 circular…
NASA Technical Reports Server (NTRS)
Morgan, Timothy E.
1995-01-01
The objective of the Reusable Software System (RSS) is to provide NASA Langley Research Center and its contractor personnel with a reusable software technology through the Internet. The RSS is easily accessible, provides information that is extractable, and the capability to submit information or data for the purpose of scientific research at NASA Langley Research Center within the Atmospheric Science Division.
Review of Extracting Information From the Social Web for Health Personalization
Karlsen, Randi; Bonander, Jason
2011-01-01
In recent years the Web has come into its own as a social platform where health consumers are actively creating and consuming Web content. Moreover, as the Web matures, consumers are gaining access to personalized applications adapted to their health needs and interests. The creation of personalized Web applications relies on extracted information about the users and the content to personalize. The Social Web itself provides many sources of information that can be used to extract information for personalization apart from traditional Web forms and questionnaires. This paper provides a review of different approaches for extracting information from the Social Web for health personalization. We reviewed research literature across different fields addressing the disclosure of health information in the Social Web, techniques to extract that information, and examples of personalized health applications. In addition, the paper includes a discussion of technical and socioethical challenges related to the extraction of information for health personalization. PMID:21278049
Real-time incident detection using social media data.
DOT National Transportation Integrated Search
2016-05-09
The effectiveness of traditional incident detection is often limited by sparse sensor coverage, and reporting incidents to emergency response systems : is labor-intensive. This research project mines tweet texts to extract incident information on bot...
Legal Medicine Information System using CDISC ODM.
Kiuchi, Takahiro; Yoshida, Ken-ichi; Kotani, Hirokazu; Tamaki, Keiji; Nagai, Hisashi; Harada, Kazuki; Ishikawa, Hirono
2013-11-01
We have developed a new database system for forensic autopsies, called the Legal Medicine Information System, using the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM). This system comprises two subsystems, namely the Institutional Database System (IDS) located in each institute and containing personal information, and the Central Anonymous Database System (CADS) located in the University Hospital Medical Information Network Center containing only anonymous information. CDISC ODM is used as the data transfer protocol between the two subsystems. Using the IDS, forensic pathologists and other staff can register and search for institutional autopsy information, print death certificates, and extract data for statistical analysis. They can also submit anonymous autopsy information to the CADS semi-automatically. This reduces the burden of double data entry, the time-lag of central data collection, and anxiety regarding legal and ethical issues. Using the CADS, various studies on the causes of death can be conducted quickly and easily, and the results can be used to prevent similar accidents, diseases, and abuse. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tsagaan, Baigalmaa; Abe, Keiichi; Goto, Masahiro; Yamamoto, Seiji; Terakawa, Susumu
2006-03-01
This paper presents a segmentation method of brain tissues from MR images, invented for our image-guided neurosurgery system under development. Our goal is to segment brain tissues for creating biomechanical model. The proposed segmentation method is based on 3-D region growing and outperforms conventional approaches by stepwise usage of intensity similarities between voxels in conjunction with edge information. Since the intensity and the edge information are complementary to each other in the region-based segmentation, we use them twice by performing a coarse-to-fine extraction. First, the edge information in an appropriate neighborhood of the voxel being considered is examined to constrain the region growing. The expanded region of the first extraction result is then used as the domain for the next processing. The intensity and the edge information of the current voxel only are utilized in the final extraction. Before segmentation, the intensity parameters of the brain tissues as well as partial volume effect are estimated by using expectation-maximization (EM) algorithm in order to provide an accurate data interpretation into the extraction. We tested the proposed method on T1-weighted MR images of brain and evaluated the segmentation effectiveness comparing the results with ground truths. Also, the generated meshes from the segmented brain volume by using mesh generating software are shown in this paper.
Benchmarking infrastructure for mutation text mining
2014-01-01
Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.
Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo
2014-02-25
Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Semantic and syntactic interoperability in online processing of big Earth observation data.
Sudmanns, Martin; Tiede, Dirk; Lang, Stefan; Baraldi, Andrea
2018-01-01
The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation (EO) data is still unsolved. Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets. The proposed system wraps world models, (semantic interoperability) into OGC Web Processing Services (syntactic interoperability) for semantic online analyses. World models describe spatio-temporal entities and their relationships in a formal way. The proposed system serves as enabler for (1) technical interoperability using a standardised interface to be used by all types of clients and (2) allowing experts from different domains to develop complex analyses together as collaborative effort. Users are connecting the world models online to the data, which are maintained in a centralised storage as 3D spatio-temporal data cubes. It allows also non-experts to extract valuable information from EO data because data management, low-level interactions or specific software issues can be ignored. We discuss the concept of the proposed system, provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO, gridded, multi-temporal data sets (CORINE land cover).
Semantic and syntactic interoperability in online processing of big Earth observation data
Sudmanns, Martin; Tiede, Dirk; Lang, Stefan; Baraldi, Andrea
2018-01-01
ABSTRACT The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation (EO) data is still unsolved. Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets. The proposed system wraps world models, (semantic interoperability) into OGC Web Processing Services (syntactic interoperability) for semantic online analyses. World models describe spatio-temporal entities and their relationships in a formal way. The proposed system serves as enabler for (1) technical interoperability using a standardised interface to be used by all types of clients and (2) allowing experts from different domains to develop complex analyses together as collaborative effort. Users are connecting the world models online to the data, which are maintained in a centralised storage as 3D spatio-temporal data cubes. It allows also non-experts to extract valuable information from EO data because data management, low-level interactions or specific software issues can be ignored. We discuss the concept of the proposed system, provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO, gridded, multi-temporal data sets (CORINE land cover). PMID:29387171
DOE Office of Scientific and Technical Information (OSTI.GOV)
Veličković, Dušan; Chu, Rosalie K.; Carrell, Alyssa A.
One critical aspect of mass spectrometry imaging (MSI) is the need to confidently identify detected analytes. While orthogonal tandem MS (e.g., LC-MS 2) experiments from sample extracts can assist in annotating ions, the spatial information about these molecules is lost. Accordingly, this could cause mislead conclusions, especially in cases where isobaric species exhibit different distributions within a sample. In this Technical Note, we employed a multimodal imaging approach, using matrix assisted laser desorption/ionization (MALDI)-MSI and liquid extraction surface analysis (LESA)-MS 2I, to confidently annotate and One critical aspect of mass spectrometry imaging (MSI) is the need to confidently identify detectedmore » analytes. While orthogonal tandem MS (e.g., LC-MS2) experiments from sample extracts can assist in annotating ions, the spatial information about these molecules is lost. Accordingly, this could cause mislead conclusions, especially in cases where isobaric species exhibit different distributions within a sample. In this Technical Note, we employed a multimodal imaging approach, using matrix assisted laser desorption/ionization (MALDI)-MSI and liquid extraction surface analysis (LESA)-MS 2I, to confidently annotate and localize a broad range of metabolites involved in a tripartite symbiosis system of moss, cyanobacteria, and fungus. We found that the combination of these two imaging modalities generated very congruent ion images, providing the link between highly accurate structural information onfered by LESA and high spatial resolution attainable by MALDI. These results demonstrate how this combined methodology could be very useful in differentiating metabolite routes in complex systems.« less
Feature extraction for document text using Latent Dirichlet Allocation
NASA Astrophysics Data System (ADS)
Prihatini, P. M.; Suryawan, I. K.; Mandia, IN
2018-01-01
Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.
Optimality in Data Assimilation
NASA Astrophysics Data System (ADS)
Nearing, Grey; Yatheendradas, Soni
2016-04-01
It costs a lot more to develop and launch an earth-observing satellite than it does to build a data assimilation system. As such, we propose that it is important to understand the efficiency of our assimilation algorithms at extracting information from remote sensing retrievals. To address this, we propose that it is necessary to adopt completely general definition of "optimality" that explicitly acknowledges all differences between the parametric constraints of our assimilation algorithm (e.g., Gaussianity, partial linearity, Markovian updates) and the true nature of the environmetnal system and observing system. In fact, it is not only possible, but incredibly straightforward, to measure the optimality (in this more general sense) of any data assimilation algorithm as applied to any intended model or natural system. We measure the information content of remote sensing data conditional on the fact that we are already running a model and then measure the actual information extracted by data assimilation. The ratio of the two is an efficiency metric, and optimality is defined as occurring when the data assimilation algorithm is perfectly efficient at extracting information from the retrievals. We measure the information content of the remote sensing data in a way that, unlike triple collocation, does not rely on any a priori presumed relationship (e.g., linear) between the retrieval and the ground truth, however, like triple-collocation, is insensitive to the spatial mismatch between point-based measurements and grid-scale retrievals. This theory and method is therefore suitable for use with both dense and sparse validation networks. Additionally, the method we propose is *constructive* in the sense that it provides guidance on how to improve data assimilation systems. All data assimilation strategies can be reduced to approximations of Bayes' law, and we measure the fractions of total information loss that are due to individual assumptions or approximations in the prior (i.e., the model uncertainty distribution), and in the likelihood (i.e., the observation operator and observation uncertainty distribution). In this way, we can directly identify the parts of a data assimilation algorithm that contribute most to assimilation error in a way that (unlike traditional DA performance metrics) considers nonlinearity in the model and observation and non-optimality in the fit between filter assumptions and the real system. To reiterate, the method we propose is theoretically rigorous but also dead-to-rights simple, and can be implemented in no more than a few hours by a competent programmer. We use this to show that careful applications of the Ensemble Kalman Filter use substantially less than half of the information contained in remote sensing soil moisture retrievals (LPRM, AMSR-E, SMOS, and SMOPS). We propose that this finding may explain some of the results from several recent large-scale experiments that show lower-than-expected value to assimilating soil moisture retrievals into land surface models forced by high-quality precipitation data. Our results have important implications for the SMAP mission because over half of the SMAP-affiliated "early adopters" plan to use the EnKF as their primary method for extracting information from SMAP retrievals.
DARHT Multi-intelligence Seismic and Acoustic Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.
The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less
High-Resolution Remote Sensing Image Building Extraction Based on Markov Model
NASA Astrophysics Data System (ADS)
Zhao, W.; Yan, L.; Chang, Y.; Gong, L.
2018-04-01
With the increase of resolution, remote sensing images have the characteristics of increased information load, increased noise, more complex feature geometry and texture information, which makes the extraction of building information more difficult. To solve this problem, this paper designs a high resolution remote sensing image building extraction method based on Markov model. This method introduces Contourlet domain map clustering and Markov model, captures and enhances the contour and texture information of high-resolution remote sensing image features in multiple directions, and further designs the spectral feature index that can characterize "pseudo-buildings" in the building area. Through the multi-scale segmentation and extraction of image features, the fine extraction from the building area to the building is realized. Experiments show that this method can restrain the noise of high-resolution remote sensing images, reduce the interference of non-target ground texture information, and remove the shadow, vegetation and other pseudo-building information, compared with the traditional pixel-level image information extraction, better performance in building extraction precision, accuracy and completeness.
Automatic Requirements Specification Extraction from Natural Language (ARSENAL)
2014-10-01
designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise...communication of technical descriptions between the various stakeholders (e.g., customers, designers, imple- menters) involved in the design of software systems...the accuracy of the natural language processing stage, the degree of automation, and robustness to noise. 1 2 Introduction Software systems operate in
Neural computing for numeric-to-symbolic conversion in control systems
NASA Technical Reports Server (NTRS)
Passino, Kevin M.; Sartori, Michael A.; Antsaklis, Panos J.
1989-01-01
A type of neural network, the multilayer perceptron, is used to classify numeric data and assign appropriate symbols to various classes. This numeric-to-symbolic conversion results in a type of information extraction, which is similar to what is called data reduction in pattern recognition. The use of the neural network as a numeric-to-symbolic converter is introduced, its application in autonomous control is discussed, and several applications are studied. The perceptron is used as a numeric-to-symbolic converter for a discrete-event system controller supervising a continuous variable dynamic system. It is also shown how the perceptron can implement fault trees, which provide useful information (alarms) in a biological system and information for failure diagnosis and control purposes in an aircraft example.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nilsson, Mikael
This 3-year project was a collaboration between University of California Irvine (UC Irvine), Pacific Northwest National Laboratory (PNNL), Idaho National Laboratory (INL), Argonne National Laboratory (ANL) and with an international collaborator at ForschungZentrum Jülich (FZJ). The project was led from UC Irvine under the direction of Profs. Mikael Nilsson and Hung Nguyen. The leads at PNNL, INL, ANL and FZJ were Dr. Liem Dang, Dr. Peter Zalupski, Dr. Nathaniel Hoyt and Dr. Giuseppe Modolo, respectively. Involved in this project at UC Irvine were three full time PhD graduate students, Tro Babikian, Ted Yoo, and Quynh Vo, and one MS student,more » Alba Font Bosch. The overall objective of this project was to study how the kinetics and thermodynamics of metal ion extraction can be described by molecular dynamic (MD) simulations and how the simulations can be validated by experimental data. Furthermore, the project includes the applied separation by testing the extraction systems in a single stage annular centrifugal contactor and coupling the experimental data with computational fluid dynamic (CFD) simulations. Specific objectives of the proposed research were: Study and establish a rigorous connection between MD simulations based on polarizable force fields and extraction thermodynamic and kinetic data. Compare and validate CFD simulations of extraction processes for An/Ln separation using different sizes (and types) of annular centrifugal contactors. Provide a theoretical/simulation and experimental base for scale-up of batch-wise extraction to continuous contactors. We approached objective 1 and 2 in parallel. For objective 1 we started by studying a well established extraction system with a relatively simple extraction mechanism, namely tributyl phosphate. What we found was that well optimized simulations can inform experiments and new information on TBP behavior was presented in this project, as well be discussed below. The second objective proved a larger challenge and most of the efforts were devoted to experimental studies.« less
Jiang, Min; Chen, Yukun; Liu, Mei; Rosenbloom, S Trent; Mani, Subramani; Denny, Joshua C; Xu, Hua
2011-01-01
The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
Combining Feature Extraction Methods to Assist the Diagnosis of Alzheimer's Disease.
Segovia, F; Górriz, J M; Ramírez, J; Phillips, C
2016-01-01
Neuroimaging data as (18)F-FDG PET is widely used to assist the diagnosis of Alzheimer's disease (AD). Looking for regions with hypoperfusion/ hypometabolism, clinicians may predict or corroborate the diagnosis of the patients. Modern computer aided diagnosis (CAD) systems based on the statistical analysis of whole neuroimages are more accurate than classical systems based on quantifying the uptake of some predefined regions of interests (ROIs). In addition, these new systems allow determining new ROIs and take advantage of the huge amount of information comprised in neuroimaging data. A major branch of modern CAD systems for AD is based on multivariate techniques, which analyse a neuroimage as a whole, considering not only the voxel intensities but also the relations among them. In order to deal with the vast dimensionality of the data, a number of feature extraction methods have been successfully applied. In this work, we propose a CAD system based on the combination of several feature extraction techniques. First, some commonly used feature extraction methods based on the analysis of the variance (as principal component analysis), on the factorization of the data (as non-negative matrix factorization) and on classical magnitudes (as Haralick features) were simultaneously applied to the original data. These feature sets were then combined by means of two different combination approaches: i) using a single classifier and a multiple kernel learning approach and ii) using an ensemble of classifier and selecting the final decision by majority voting. The proposed approach was evaluated using a labelled neuroimaging database along with a cross validation scheme. As conclusion, the proposed CAD system performed better than approaches using only one feature extraction technique. We also provide a fair comparison (using the same database) of the selected feature extraction methods.
The Collaborative Lecture Annotation System (CLAS): A New TOOL for Distributed Learning
ERIC Educational Resources Information Center
Risko, E. F.; Foulsham, T.; Dawson, S.; Kingstone, A.
2013-01-01
In the context of a lecture, the capacity to readily recognize and synthesize key concepts is crucial for comprehension and overall educational performance. In this paper, we introduce a tool, the Collaborative Lecture Annotation System (CLAS), which has been developed to make the extraction of important information a more collaborative and…
NASA Technical Reports Server (NTRS)
1975-01-01
An on-line data storage and retrieval system which allows the user to extract and process information from stored data bases is described. The capabilities of the system are provided by a general purpose computer program containing several functional modules. The modules contained in MIRADS are briefly described along with user terminal operation procedures and MIRADS commands.
We've Got Plenty of Data, Now How Can We Use It?
ERIC Educational Resources Information Center
Weiler, Jeffrey K.; Mears, Robert L.
1999-01-01
To mine a large store of school data, a new technology (variously termed data warehousing, data marts, online analytical processing, and executive information systems) is emerging. Data warehousing helps school districts extract and restructure desired data from automated systems and create new databases designed to enhance analytical and…
Using a User-Interactive QA System for Personalized E-Learning
ERIC Educational Resources Information Center
Hu, Dawei; Chen, Wei; Zeng, Qingtian; Hao, Tianyong; Min, Feng; Wenyin, Liu
2008-01-01
A personalized e-learning framework based on a user-interactive question-answering (QA) system is proposed, in which a user-modeling approach is used to capture personal information of students and a personalized answer extraction algorithm is proposed for personalized automatic answering. In our approach, a topic ontology (or concept hierarchy)…
Washington State Community College Capital Master Plan, 1985-91. Management Summary.
ERIC Educational Resources Information Center
Washington State Board for Community Coll. Education, Olympia.
Designed for Washington State's decision makers and the trustees and administrators of the community college system, this report extracts information from the community college system's 1985-87 capital budget request and 1985-91 capital master plan to provide a brief description of the budget and plan. Introductory material discusses the…
Automatic Detection of Learning Styles for an E-Learning System
ERIC Educational Resources Information Center
Ozpolat, Ebru; Akar, Gozde B.
2009-01-01
A desirable characteristic for an e-learning system is to provide the learner the most appropriate information based on his requirements and preferences. This can be achieved by capturing and utilizing the learner model. Learner models can be extracted based on personality factors like learning styles, behavioral factors like user's browsing…
Mining and Analyzing Circulation and ILL Data for Informed Collection Development
ERIC Educational Resources Information Center
Link, Forrest E.; Tosaka, Yuji; Weng, Cathy
2015-01-01
The authors investigated quantitative methods of collection use analysis employing library data that are available in ILS and ILL systems to better understand library collection use and user needs. For the purpose of the study, the authors extracted circulation and ILL records from the library's systems using data-mining techniques. By comparing…
Learning of spatio-temporal codes in a coupled oscillator system.
Orosz, Gábor; Ashwin, Peter; Townley, Stuart
2009-07-01
In this paper, we consider a learning strategy that allows one to transmit information between two coupled phase oscillator systems (called teaching and learning systems) via frequency adaptation. The dynamics of these systems can be modeled with reference to a number of partially synchronized cluster states and transitions between them. Forcing the teaching system by steady but spatially nonhomogeneous inputs produces cyclic sequences of transitions between the cluster states, that is, information about inputs is encoded via a "winnerless competition" process into spatio-temporal codes. The large variety of codes can be learned by the learning system that adapts its frequencies to those of the teaching system. We visualize the dynamics using "weighted order parameters (WOPs)" that are analogous to "local field potentials" in neural systems. Since spatio-temporal coding is a mechanism that appears in olfactory systems, the developed learning rules may help to extract information from these neural ensembles.
Audio feature extraction using probability distribution function
NASA Astrophysics Data System (ADS)
Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.
2015-05-01
Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.
BioNLP Shared Task--The Bacteria Track.
Bossy, Robert; Jourde, Julien; Manine, Alain-Pierre; Veber, Philippe; Alphonse, Erick; van de Guchte, Maarten; Bessières, Philippe; Nédellec, Claire
2012-06-26
We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions. Three teams submitted to the Bacteria Gene Renaming task; the best team achieved an F-score of 87%. For the Bacteria Gene Interaction task, the only participant's score had reached a global F-score of 77%, although the system efficiency varies significantly from one sub-type to another. Three teams submitted to the Bacteria Biotopes task with very different approaches; the best team achieved an F-score of 45%. However, the detailed study of the participating systems efficiency reveals the strengths and weaknesses of each participating system. The three tasks of the Bacteria Track offer participants a chance to address a wide range of issues in Information Extraction, including entity recognition, semantic typing and coreference resolution. We found common trends in the most efficient systems: the systematic use of syntactic dependencies and machine learning. Nevertheless, the originality of the Bacteria Biotopes task encouraged the use of interesting novel methods and techniques, such as term compositionality, scopes wider than the sentence.
NEMO: Extraction and normalization of organization names from PubMed affiliations.
Jonnalagadda, Siddhartha Reddy; Topham, Philip
2010-10-04
Today, there are more than 18 million articles related to biomedical research indexed in MEDLINE, and information derived from them could be used effectively to save the great amount of time and resources spent by government agencies in understanding the scientific landscape, including key opinion leaders and centers of excellence. Associating biomedical articles with organization names could significantly benefit the pharmaceutical marketing industry, health care funding agencies and public health officials and be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or collaborators. Large amount of extracted information helps in disambiguating organization names using machine-learning algorithms. We propose NEMO, a system for extracting organization names in the affiliation and normalizing them to a canonical organization name. Our parsing process involves multi-layered rule matching with multiple dictionaries. The system achieves more than 98% f-score in extracting organization names. Our process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. A high precision was also observed in normalization. NEMO is the missing link in associating each biomedical paper and its authors to an organization name in its canonical form and the Geopolitical location of the organization. This research could potentially help in analyzing large social networks of organizations for landscaping a particular topic, improving performance of author disambiguation, adding weak links in the co-author network of authors, augmenting NLM's MARS system for correcting errors in OCR output of affiliation field, and automatically indexing the PubMed citations with the normalized organization name and country. Our system is available as a graphical user interface available for download along with this paper.
Enhancing Biomedical Text Summarization Using Semantic Relation Extraction
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336
Generation and confirmation of a (100 x 100)-dimensional entangled quantum system.
Krenn, Mario; Huber, Marcus; Fickler, Robert; Lapkiewicz, Radek; Ramelow, Sven; Zeilinger, Anton
2014-04-29
Entangled quantum systems have properties that have fundamentally overthrown the classical worldview. Increasing the complexity of entangled states by expanding their dimensionality allows the implementation of novel fundamental tests of nature, and moreover also enables genuinely new protocols for quantum information processing. Here we present the creation of a (100 × 100)-dimensional entangled quantum system, using spatial modes of photons. For its verification we develop a novel nonlinear criterion which infers entanglement dimensionality of a global state by using only information about its subspace correlations. This allows very practical experimental implementation as well as highly efficient extraction of entanglement dimensionality information. Applications in quantum cryptography and other protocols are very promising.
Generation and confirmation of a (100 × 100)-dimensional entangled quantum system
Krenn, Mario; Huber, Marcus; Fickler, Robert; Lapkiewicz, Radek; Ramelow, Sven; Zeilinger, Anton
2014-01-01
Entangled quantum systems have properties that have fundamentally overthrown the classical worldview. Increasing the complexity of entangled states by expanding their dimensionality allows the implementation of novel fundamental tests of nature, and moreover also enables genuinely new protocols for quantum information processing. Here we present the creation of a (100 × 100)-dimensional entangled quantum system, using spatial modes of photons. For its verification we develop a novel nonlinear criterion which infers entanglement dimensionality of a global state by using only information about its subspace correlations. This allows very practical experimental implementation as well as highly efficient extraction of entanglement dimensionality information. Applications in quantum cryptography and other protocols are very promising. PMID:24706902
Robot Command Interface Using an Audio-Visual Speech Recognition System
NASA Astrophysics Data System (ADS)
Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
Geothermal reservoir simulation of hot sedimentary aquifer system using FEFLOW®
NASA Astrophysics Data System (ADS)
Nur Hidayat, Hardi; Gala Permana, Maximillian
2017-12-01
The study presents the simulation of hot sedimentary aquifer for geothermal utilization. Hot sedimentary aquifer (HSA) is a conduction-dominated hydrothermal play type utilizing deep aquifer, which is heated by near normal heat flow. One of the examples of HSA is Bavarian Molasse Basin in South Germany. This system typically uses doublet wells: an injection and production well. The simulation was run for 3650 days of simulation time. The technical feasibility and performance are analysed in regards to the extracted energy from this concept. Several parameters are compared to determine the model performance. Parameters such as reservoir characteristics, temperature information and well information are defined. Several assumptions are also defined to simplify the simulation process. The main results of the simulation are heat period budget or total extracted heat energy, and heat rate budget or heat production rate. Qualitative approaches for sensitivity analysis are conducted by using five parameters in which assigned lower and higher value scenarios.
NASA Technical Reports Server (NTRS)
Almeida, Eduardo DeBrito
2012-01-01
This report discusses work completed over the summer at the Jet Propulsion Laboratory (JPL), California Institute of Technology. A system is presented to guide ground or aerial unmanned robots using computer vision. The system performs accurate camera calibration, camera pose refinement and surface extraction from images collected by a camera mounted on the vehicle. The application motivating the research is planetary exploration and the vehicles are typically rovers or unmanned aerial vehicles. The information extracted from imagery is used primarily for navigation, as robot location is the same as the camera location and the surfaces represent the terrain that rovers traverse. The processed information must be very accurate and acquired very fast in order to be useful in practice. The main challenge being addressed by this project is to achieve high estimation accuracy and high computation speed simultaneously, a difficult task due to many technical reasons.
Visual perception system and method for a humanoid robot
NASA Technical Reports Server (NTRS)
Chelian, Suhas E. (Inventor); Linn, Douglas Martin (Inventor); Wampler, II, Charles W. (Inventor); Bridgwater, Lyndon (Inventor); Wells, James W. (Inventor); Mc Kay, Neil David (Inventor)
2012-01-01
A robotic system includes a humanoid robot with robotic joints each moveable using an actuator(s), and a distributed controller for controlling the movement of each of the robotic joints. The controller includes a visual perception module (VPM) for visually identifying and tracking an object in the field of view of the robot under threshold lighting conditions. The VPM includes optical devices for collecting an image of the object, a positional extraction device, and a host machine having an algorithm for processing the image and positional information. The algorithm visually identifies and tracks the object, and automatically adapts an exposure time of the optical devices to prevent feature data loss of the image under the threshold lighting conditions. A method of identifying and tracking the object includes collecting the image, extracting positional information of the object, and automatically adapting the exposure time to thereby prevent feature data loss of the image.
Querying and Extracting Timeline Information from Road Traffic Sensor Data
Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen
2016-01-01
The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900
Mutual information, neural networks and the renormalization group
NASA Astrophysics Data System (ADS)
Koch-Janusz, Maciej; Ringel, Zohar
2018-06-01
Physical systems differing in their microscopic details often display strikingly similar behaviour when probed at macroscopic scales. Those universal properties, largely determining their physical characteristics, are revealed by the powerful renormalization group (RG) procedure, which systematically retains `slow' degrees of freedom and integrates out the rest. However, the important degrees of freedom may be difficult to identify. Here we demonstrate a machine-learning algorithm capable of identifying the relevant degrees of freedom and executing RG steps iteratively without any prior knowledge about the system. We introduce an artificial neural network based on a model-independent, information-theoretic characterization of a real-space RG procedure, which performs this task. We apply the algorithm to classical statistical physics problems in one and two dimensions. We demonstrate RG flow and extract the Ising critical exponent. Our results demonstrate that machine-learning techniques can extract abstract physical concepts and consequently become an integral part of theory- and model-building.
Automatic extraction of property norm-like data from large text corpora.
Kelly, Colin; Devereux, Barry; Korhonen, Anna
2014-01-01
Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.
Induced lexico-syntactic patterns improve information extraction from online medical forums.
Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D
2014-01-01
To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
A rapid extraction of landslide disaster information research based on GF-1 image
NASA Astrophysics Data System (ADS)
Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na
2015-08-01
In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.
NASA Astrophysics Data System (ADS)
McClanahan, James Patrick
Eddy Current Testing (ECT) is a Non-Destructive Examination (NDE) technique that is widely used in power generating plants (both nuclear and fossil) to test the integrity of heat exchanger (HX) and steam generator (SG) tubing. Specifically for this research, laboratory-generated, flawed tubing data were examined. The purpose of this dissertation is to develop and implement an automated method for the classification and an advanced characterization of defects in HX and SG tubing. These two improvements enhanced the robustness of characterization as compared to traditional bobbin-coil ECT data analysis methods. A more robust classification and characterization of the tube flaw in-situ (while the SG is on-line but not when the plant is operating), should provide valuable information to the power industry. The following are the conclusions reached from this research. A feature extraction program acquiring relevant information from both the mixed, absolute and differential data was successfully implemented. The CWT was utilized to extract more information from the mixed, complex differential data. Image Processing techniques used to extract the information contained in the generated CWT, classified the data with a high success rate. The data were accurately classified, utilizing the compressed feature vector and using a Bayes classification system. An estimation of the upper bound for the probability of error, using the Bhattacharyya distance, was successfully applied to the Bayesian classification. The classified data were separated according to flaw-type (classification) to enhance characterization. The characterization routine used dedicated, flaw-type specific ANNs that made the characterization of the tube flaw more robust. The inclusion of outliers may help complete the feature space so that classification accuracy is increased. Given that the eddy current test signals appear very similar, there may not be sufficient information to make an extremely accurate (>95%) classification or an advanced characterization using this system. It is necessary to have a larger database fore more accurate system learning.
NASA Astrophysics Data System (ADS)
Liu, Shuang; Liu, Fei; Hu, Shaohua; Yin, Zhenbiao
The major power information of the main transmission system in machine tools (MTSMT) during machining process includes effective output power (i.e. cutting power), input power and power loss from the mechanical transmission system, and the main motor power loss. These information are easy to obtain in the lab but difficult to evaluate in a manufacturing process. To solve this problem, a separation method is proposed here to extract the MTSMT power information during machining process. In this method, the energy flow and the mathematical models of major power information of MTSMT during the machining process are set up first. Based on the mathematical models and the basic data tables obtained from experiments, the above mentioned power information during machining process can be separated just by measuring the real time total input power of the spindle motor. The operation program of this method is also given.
Misra, Dharitri; Chen, Siyuan; Thoma, George R.
2010-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
NASA Astrophysics Data System (ADS)
Wang, Zhao; Yang, Shan; Wang, Shuguang; Shen, Yan
2017-10-01
The assessment of the dynamic urban structure has been affected by lack of timely and accurate spatial information for a long period, which has hindered the measurements of structural continuity at the macroscale. Defense meteorological satellite program's operational linescan system (DMSP/OLS) nighttime light (NTL) data provide an ideal source for urban information detection with a long-time span, short-time interval, and wide coverage. In this study, we extracted the physical boundaries of urban clusters from corrected NTL images and quantitatively analyzed the structure of the urban cluster system based on rank-size distribution, spatial metrics, and Mann-Kendall trend test. Two levels of urban cluster systems in the Yangtze River Delta region (YRDR) were examined. We found that (1) in the entire YRDR, the urban cluster system showed a periodic process, with a significant trend of even distribution before 2007 but an unequal growth pattern after 2007, and (2) at the metropolitan level, vast disparities exist in four metropolitan areas for the fluctuations of Pareto's exponent, the speed of cluster expansion, and the dominance of core cluster. The results suggest that the extracted urban cluster information from NTL data effectively reflect the evolving nature of regional urbanization, which in turn can aid in the planning of cities and help achieve more sustainable regional development.
Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction
ERIC Educational Resources Information Center
Sun, Chong
2012-01-01
More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…
NASA Technical Reports Server (NTRS)
Kelly, W. L.; Howle, W. M.; Meredith, B. D.
1980-01-01
The Information Adaptive System (IAS) is an element of the NASA End-to-End Data System (NEEDS) Phase II and is focused toward onbaord image processing. Since the IAS is a data preprocessing system which is closely coupled to the sensor system, it serves as a first step in providing a 'Smart' imaging sensor. Some of the functions planned for the IAS include sensor response nonuniformity correction, geometric correction, data set selection, data formatting, packetization, and adaptive system control. The inclusion of these sensor data preprocessing functions onboard the spacecraft will significantly improve the extraction of information from the sensor data in a timely and cost effective manner and provide the opportunity to design sensor systems which can be reconfigured in near real time for optimum performance. The purpose of this paper is to present the preliminary design of the IAS and the plans for its development.
Motion processing with two eyes in three dimensions.
Rokers, Bas; Czuba, Thaddeus B; Cormack, Lawrence K; Huk, Alexander C
2011-02-11
The movement of an object toward or away from the head is perhaps the most critical piece of information an organism can extract from its environment. Such 3D motion produces horizontally opposite motions on the two retinae. Little is known about how or where the visual system combines these two retinal motion signals, relative to the wealth of knowledge about the neural hierarchies involved in 2D motion processing and binocular vision. Canonical conceptions of primate visual processing assert that neurons early in the visual system combine monocular inputs into a single cyclopean stream (lacking eye-of-origin information) and extract 1D ("component") motions; later stages then extract 2D pattern motion from the cyclopean output of the earlier stage. Here, however, we show that 3D motion perception is in fact affected by the comparison of opposite 2D pattern motions between the two eyes. Three-dimensional motion sensitivity depends systematically on pattern motion direction when dichoptically viewing gratings and plaids-and a novel "dichoptic pseudoplaid" stimulus provides strong support for use of interocular pattern motion differences by precluding potential contributions from conventional disparity-based mechanisms. These results imply the existence of eye-of-origin information in later stages of motion processing and therefore motivate the incorporation of such eye-specific pattern-motion signals in models of motion processing and binocular integration.
Extracting DEM from airborne X-band data based on PolInSAR
NASA Astrophysics Data System (ADS)
Hou, X. X.; Huang, G. M.; Zhao, Z.
2015-06-01
Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) is a new trend of SAR remote sensing technology which combined polarized multichannel information and Interferometric information. It is of great significance for extracting DEM in some regions with low precision of DEM such as vegetation coverage area and building concentrated area. In this paper we describe our experiments with high-resolution X-band full Polarimetric SAR data acquired by a dual-baseline interferometric airborne SAR system over an area of Danling in southern China. Pauli algorithm is used to generate the double polarimetric interferometry data, Singular Value Decomposition (SVD), Numerical Radius (NR) and Phase diversity (PD) methods are used to generate the full polarimetric interferometry data. Then we can make use of the polarimetric interferometric information to extract DEM with processing of pre filtering , image registration, image resampling, coherence optimization, multilook processing, flat-earth removal, interferogram filtering, phase unwrapping, parameter calibration, height derivation and geo-coding. The processing system named SARPlore has been exploited based on VC++ led by Chinese Academy of Surveying and Mapping. Finally compared optimization results with the single polarimetric interferometry, it has been observed that optimization ways can reduce the interferometric noise and the phase unwrapping residuals, and improve the precision of DEM. The result of full polarimetric interferometry is better than double polarimetric interferometry. Meanwhile, in different terrain, the result of full polarimetric interferometry will have a different degree of increase.
Composition of extracts of airborne grain dusts: lectins and lymphocyte mitogens.
Olenchock, S A; Lewis, D M; Mull, J C
1986-01-01
Airborne grain dusts are heterogeneous materials that can elicit acute and chronic respiratory pathophysiology in exposed workers. Previous characterizations of the dusts include the identification of viable microbial contaminants, mycotoxins, and endotoxins. We provide information on the lectin-like activity of grain dust extracts and its possible biological relationship. Hemagglutination of erythrocytes and immunochemical modulation by antibody to specific lectins showed the presence of these substances in extracts of airborne dusts from barley, corn, and rye. Proliferation of normal rat splenic lymphocytes in vitro provided evidence for direct biological effects on the cells of the immune system. These data expand the knowledge of the composition of grain dusts (extracts), and suggest possible mechanisms that may contribute to respiratory disease in grain workers. PMID:3709474
Extracting Useful Semantic Information from Large Scale Corpora of Text
ERIC Educational Resources Information Center
Mendoza, Ray Padilla, Jr.
2012-01-01
Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…
NASA Technical Reports Server (NTRS)
Kempler, Steven; Lynnes, Christopher; Vollmer, Bruce; Alcott, Gary; Berrick, Stephen
2009-01-01
Increasingly sophisticated National Aeronautics and Space Administration (NASA) Earth science missions have driven their associated data and data management systems from providing simple point-to-point archiving and retrieval to performing user-responsive distributed multisensor information extraction. To fully maximize the use of remote-sensor-generated Earth science data, NASA recognized the need for data systems that provide data access and manipulation capabilities responsive to research brought forth by advancing scientific analysis and the need to maximize the use and usability of the data. The decision by NASA to purposely evolve the Earth Observing System Data and Information System (EOSDIS) at the Goddard Space Flight Center (GSFC) Earth Sciences (GES) Data and Information Services Center (DISC) and other information management facilities was timely and appropriate. The GES DISC evolution was focused on replacing the EOSDIS Core System (ECS) by reusing the In-house developed disk-based Simple, Scalable, Script-based Science Product Archive (S4PA) data management system and migrating data to the disk archives. Transition was completed in December 2007
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, M.C.; Nunz, G.J.; Cremer, G.M.
1979-09-01
The potential of energy extracted from hot dry rock (HDR) was investigated as a commercailly feasible alternate energy source. Run Segments 3 and 4 were completed in the prototype reservoir of the Phase I energy-extraction system at Fenton Hill, New Mexico. Results of these tests yielded significant data on the existing system and this information will be applicable to future HDR systems. Plans and operations initiating a Phase II system are underway at the Fenton Hill site. This system, a deeper, hotter commercial-size reservoir, is intended to demonstrate the longevity and economics of an HDR system. Major activity occurred inmore » evaluation of the national resource potential and in characterizing possible future HDR geothermal sites. Work has begun in the institutional and industrial support area to assess the economics and promote commercial interest in HDR systems as an alternate energy source.« less
NASA Astrophysics Data System (ADS)
Oschepkova, Elena; Vasinskaya, Irina; Sockoluck, Irina
2017-11-01
In view of changing educational paradigm (adopting of two-tier system of higher education concept - undergraduate and graduate programs) a need of using of modern learning and information and communications technologies arises putting into practice learner-centered approaches in training of highly qualified specialists for extraction and processing of solid commercial minerals enterprises. In the unstable market demand situation and changeable institutional environment, from one side, and necessity of work balancing, supplying conditions and product quality when mining-and-geological parameters change, from the other side, mining enterprises have to introduce and develop the integrated management process of product and informative and logistic flows under united management system. One of the main limitations, which keeps down the developing process on Russian mining enterprises, is staff incompetence at all levels of logistic management. Under present-day conditions extraction and processing of solid commercial minerals enterprises need highly qualified specialists who can do self-directed researches, develop new and improve present arranging, planning and managing technologies of technical operation and commercial exploitation of transport and transportation and processing facilities based on logistics. Learner-centered approach and individualization of the learning process necessitate the designing of individual learning route (ILR), which can help the students to realize their professional facilities according to requirements for specialists for extraction and processing of solid commercial minerals enterprises.
Nguyen, Dat Tien; Hong, Hyung Gil; Kim, Ki Wan; Park, Kang Ryoung
2017-01-01
The human body contains identity information that can be used for the person recognition (verification/recognition) problem. In this paper, we propose a person recognition method using the information extracted from body images. Our research is novel in the following three ways compared to previous studies. First, we use the images of human body for recognizing individuals. To overcome the limitations of previous studies on body-based person recognition that use only visible light images for recognition, we use human body images captured by two different kinds of camera, including a visible light camera and a thermal camera. The use of two different kinds of body image helps us to reduce the effects of noise, background, and variation in the appearance of a human body. Second, we apply a state-of-the art method, called convolutional neural network (CNN) among various available methods, for image features extraction in order to overcome the limitations of traditional hand-designed image feature extraction methods. Finally, with the extracted image features from body images, the recognition task is performed by measuring the distance between the input and enrolled samples. The experimental results show that the proposed method is efficient for enhancing recognition accuracy compared to systems that use only visible light or thermal images of the human body. PMID:28300783
Video monitoring of oxygen saturation during controlled episodes of acute hypoxia.
Addison, Paul S; Foo, David M H; Jacquel, Dominique; Borg, Ulf
2016-08-01
A method for extracting video photoplethysmographic information from an RGB video stream is tested on data acquired during a porcine model of acute hypoxia. Cardiac pulsatile information was extracted from the acquired signals and processed to determine a continuously reported oxygen saturation (SvidO2). A high degree of correlation was found to exist between the video and a reference from a pulse oximeter. The calculated mean bias and accuracy across all eight desaturation episodes were -0.03% (range: -0.21% to 0.24%) and accuracy 4.90% (range: 3.80% to 6.19%) respectively. The results support the hypothesis that oxygen saturation trending can be evaluated accurately from a video system during acute hypoxia.
[Application of hyper-spectral remote sensing technology in environmental protection].
Zhao, Shao-Hua; Zhang, Feng; Wang, Qiao; Yao, Yun-Jun; Wang, Zhong-Ting; You, Dai-An
2013-12-01
Hyper-spectral remote sensing (RS) technology has been widely used in environmental protection. The present work introduces its recent application in the RS monitoring of pollution gas, green-house gas, algal bloom, water quality of catch water environment, safety of drinking water sources, biodiversity, vegetation classification, soil pollution, and so on. Finally, issues such as scarce hyper-spectral satellites, the limits of data processing and information extract are related. Some proposals are also presented, including developing subsequent satellites of HJ-1 satellite with differential optical absorption spectroscopy, greenhouse gas spectroscopy and hyper-spectral imager, strengthening the study of hyper-spectral data processing and information extraction, and promoting the construction of environmental application system.
Unsupervised user similarity mining in GSM sensor networks.
Shad, Shafqat Ali; Chen, Enhong
2013-01-01
Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user's actual movement prediction, and context awareness. However, significant places extraction and user's actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining.
The Design of Case Products’ Shape Form Information Database Based on NURBS Surface
NASA Astrophysics Data System (ADS)
Liu, Xing; Liu, Guo-zhong; Xu, Nuo-qi; Zhang, Wei-she
2017-07-01
In order to improve the computer design of product shape design,applying the Non-uniform Rational B-splines(NURBS) of curves and surfaces surface to the representation of the product shape helps designers to design the product effectively.On the basis of the typical product image contour extraction and using Pro/Engineer(Pro/E) to extract the geometric feature of scanning mold,in order to structure the information data base system of value point,control point and node vector parameter information,this paper put forward a unified expression method of using NURBS curves and surfaces to describe products’ geometric shape and using matrix laboratory(MATLAB) to simulate when products have the same or similar function.A case study of electric vehicle’s front cover illustrates the access process of geometric shape information of case product in this paper.This method can not only greatly reduce the capacity of information debate,but also improve the effectiveness of computer aided geometric innovation modeling.
Multiplexed Sequence Encoding: A Framework for DNA Communication.
Zakeri, Bijan; Carr, Peter A; Lu, Timothy K
2016-01-01
Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication-data encoding, data transfer & data extraction-and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system-Multiplexed Sequence Encoding (MuSE)-that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA.
Geotherm: the U.S. geological survey geothermal information system
Bliss, J.D.; Rapport, A.
1983-01-01
GEOTHERM is a comprehensive system of public databases and software used to store, locate, and evaluate information on the geology, geochemistry, and hydrology of geothermal systems. Three main databases address the general characteristics of geothermal wells and fields, and the chemical properties of geothermal fluids; the last database is currently the most active. System tasks are divided into four areas: (1) data acquisition and entry, involving data entry via word processors and magnetic tape; (2) quality assurance, including the criteria and standards handbook and front-end data-screening programs; (3) operation, involving database backups and information extraction; and (4) user assistance, preparation of such items as application programs, and a quarterly newsletter. The principal task of GEOTHERM is to provide information and research support for the conduct of national geothermal-resource assessments. The principal users of GEOTHERM are those involved with the Geothermal Research Program of the U.S. Geological Survey. Information in the system is available to the public on request. ?? 1983.
NASA Astrophysics Data System (ADS)
Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab
2017-11-01
Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.
Modeling for Visual Feature Extraction Using Spiking Neural Networks
NASA Astrophysics Data System (ADS)
Kimura, Ichiro; Kuroe, Yasuaki; Kotera, Hiromichi; Murata, Tomoya
This paper develops models for “visual feature extraction” in biological systems by using “spiking neural network (SNN)”. The SNN is promising for developing the models because the information is encoded and processed by spike trains similar to biological neural networks. Two architectures of SNN are proposed for modeling the directionally selective and the motion parallax cell in neuro-sensory systems and they are trained so as to possess actual biological responses of each cell. To validate the developed models, their representation ability is investigated and their visual feature extraction mechanisms are discussed from the neurophysiological viewpoint. It is expected that this study can be the first step to developing a sensor system similar to the biological systems and also a complementary approach to investigating the function of the brain.
Secure alignment of coordinate systems using quantum correlation
NASA Astrophysics Data System (ADS)
Rezazadeh, F.; Mani, A.; Karimipour, V.
2017-08-01
We show that two parties far apart can use shared entangled states and classical communication to align their coordinate systems with a very high fidelity. Moreover, compared with previous methods proposed for such a task, i.e., sending parallel or antiparallel pairs or groups of spin states, our method has the extra advantages of using single-qubit measurements and also being secure, so that third parties do not extract any information about the aligned coordinate system established between the two parties. The latter property is important in many other quantum information protocols in which measurements inevitably play a significant role.
NASA Technical Reports Server (NTRS)
Foyle, David C.; Kaiser, Mary K.; Johnson, Walter W.
1992-01-01
This paper reviews some of the sources of visual information that are available in the out-the-window scene and describes how these visual cues are important for routine pilotage and training, as well as the development of simulator visual systems and enhanced or synthetic vision systems for aircraft cockpits. It is shown how these visual cues may change or disappear under environmental or sensor conditions, and how the visual scene can be augmented by advanced displays to capitalize on the pilot's excellent ability to extract visual information from the visual scene.
PCA Tomography: how to extract information from data cubes
NASA Astrophysics Data System (ADS)
Steiner, J. E.; Menezes, R. B.; Ricci, T. V.; Oliveira, A. S.
2009-05-01
Astronomy has evolved almost exclusively by the use of spectroscopic and imaging techniques, operated separately. With the development of modern technologies, it is possible to obtain data cubes in which one combines both techniques simultaneously, producing images with spectral resolution. To extract information from them can be quite complex, and hence the development of new methods of data analysis is desirable. We present a method of analysis of data cube (data from single field observations, containing two spatial and one spectral dimension) that uses Principal Component Analysis (PCA) to express the data in the form of reduced dimensionality, facilitating efficient information extraction from very large data sets. PCA transforms the system of correlated coordinates into a system of uncorrelated coordinates ordered by principal components of decreasing variance. The new coordinates are referred to as eigenvectors, and the projections of the data on to these coordinates produce images we will call tomograms. The association of the tomograms (images) to eigenvectors (spectra) is important for the interpretation of both. The eigenvectors are mutually orthogonal, and this information is fundamental for their handling and interpretation. When the data cube shows objects that present uncorrelated physical phenomena, the eigenvector's orthogonality may be instrumental in separating and identifying them. By handling eigenvectors and tomograms, one can enhance features, extract noise, compress data, extract spectra, etc. We applied the method, for illustration purpose only, to the central region of the low ionization nuclear emission region (LINER) galaxy NGC 4736, and demonstrate that it has a type 1 active nucleus, not known before. Furthermore, we show that it is displaced from the centre of its stellar bulge. Based on observations obtained at the Gemini Observatory, which is operated by the Association of Universities for Research in Astronomy, Inc., under a cooperative agreement with the National Science Foundation on behalf of the Gemini partnership: the National Science Foundation (United States), the Science and Technology Facilities Council (United Kingdom), the National Research Council (Canada), CONICYT (Chile), the Australian Research Council (Australia), Ministério da Ciência e Tecnologia (Brazil) and SECYT (Argentina). E-mail: steiner@astro.iag.usp.br
Plans for wind energy system simulation
NASA Technical Reports Server (NTRS)
Dreier, M. E.
1978-01-01
A digital computer code and a special purpose hybrid computer, were introduced. The digital computer program, the Root Perturbation Method or RPM, is an implementation of the classic floquet procedure which circumvents numerical problems associated with the extraction of Floquet roots. The hybrid computer, the Wind Energy System Time domain simulator (WEST), yields real time loads and deformation information essential to design and system stability investigations.
Extracting Databases from Dark Data with DeepDive
Zhang, Ce; Shin, Jaeho; Ré, Christopher; Cafarella, Michael; Niu, Feng
2016-01-01
DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data — scientific papers, Web classified ads, customer service notes, and so on — were instead in a relational database, it would give analysts a massive and valuable new set of “big data.” DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference. PMID:28316365
Automatic Extraction of Road Markings from Mobile Laser-Point Cloud Using Intensity Data
NASA Astrophysics Data System (ADS)
Yao, L.; Chen, Q.; Qin, C.; Wu, H.; Zhang, S.
2018-04-01
With the development of intelligent transportation, road's high precision information data has been widely applied in many fields. This paper proposes a concise and practical way to extract road marking information from point cloud data collected by mobile mapping system (MMS). The method contains three steps. Firstly, road surface is segmented through edge detection from scan lines. Then the intensity image is generated by inverse distance weighted (IDW) interpolation and the road marking is extracted by using adaptive threshold segmentation based on integral image without intensity calibration. Moreover, the noise is reduced by removing a small number of plaque pixels from binary image. Finally, point cloud mapped from binary image is clustered into marking objects according to Euclidean distance, and using a series of algorithms including template matching and feature attribute filtering for the classification of linear markings, arrow markings and guidelines. Through processing the point cloud data collected by RIEGL VUX-1 in case area, the results show that the F-score of marking extraction is 0.83, and the average classification rate is 0.9.
Yu, Hong; Agarwal, Shashank; Johnston, Mark; Cohen, Aaron
2009-01-06
Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension. Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score. Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39-68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86-97%; this indicates that researchers felt that only 3-14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles. We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend.
2009-01-01
Background Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension. Methods Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score. Results Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39–68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86–97%; this indicates that researchers felt that only 3–14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles. Conclusion We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend. PMID:19126221
Automatic generation of Web mining environments
NASA Astrophysics Data System (ADS)
Cibelli, Maurizio; Costagliola, Gennaro
1999-02-01
The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
NASA Astrophysics Data System (ADS)
Lucciani, Roberto; Laneve, Giovanni; Jahjah, Munzer; Mito, Collins
2016-08-01
The crop growth stage represents essential information for agricultural areas management. In this study we investigate the feasibility of a tool based on remotely sensed satellite (Landsat 8) imagery, capable of automatically classify crop fields and how much resolution enhancement based on pan-sharpening techniques and phenological information extraction, useful to create decision rules that allow to identify semantic class to assign to an object, can effectively support the classification process. Moreover we investigate the opportunity to extract vegetation health status information from remotely sensed assessment of the equivalent water thickness (EWT). Our case study is the Kenya's Great Rift valley, in this area a ground truth campaign was conducted during August 2015 in order to collect crop fields GPS measurements, leaf area index (LAI) and chlorophyll samples.
Supercritical fluid extraction of the non-polar organic compounds in meteorites
NASA Astrophysics Data System (ADS)
Sephton, M. A.; Pillinger, C. T.; Gilmour, I.
2001-01-01
The carbonaceous chondrite meteorites contain a variety of extraterrestrial organic molecules. These organic components provide a valuable insight into the formation and evolution of the solar system. Attempts at obtaining and interpreting this information source are hampered by the small sample sizes available for study and the interferences from terrestrial contamination. Supercritical fluid extraction represents an efficient and contamination-free means of isolating extraterrestrial molecules. Gas chromatography-mass spectrometry analyses of extracts from Orgueil and Cold Bokkeveld reveal a complex mixture of free non-polar organic molecules which include normal alkanes, isoprenoid alkanes, tetrahydronaphthalenes and aromatic hydrocarbons. These organic assemblages imply contributions from both terrestrial and extraterrestrial sources.
TRENCADIS--a WSRF grid MiddleWare for managing DICOM structured reporting objects.
Blanquer, Ignacio; Hernandez, Vicente; Segrelles, Damià
2006-01-01
The adoption of the digital processing of medical data, especially on radiology, has leaded to the availability of millions of records (images and reports). However, this information is mainly used at patient level, being the extraction of information, organised according to administrative criteria, which make the extraction of knowledge difficult. Moreover, legal constraints make the direct integration of information systems complex or even impossible. On the other side, the widespread of the DICOM format has leaded to the inclusion of other information different from just radiological images. The possibility of coding radiology reports in a structured form, adding semantic information about the data contained in the DICOM objects, eases the process of structuring images according to content. DICOM Structured Reporting (DICOM-SR) is a specification of tags and sections to code and integrate radiology reports, with seamless references to findings and regions of interests of the associated images, movies, waveforms, signals, etc. The work presented in this paper aims at developing of a framework to efficiently and securely share medical images and radiology reports, as well as to provide high throughput processing services. This system is based on a previously developed architecture in the framework of the TRENCADIS project, and uses other components such as the security system and the Grid processing service developed in previous activities. The work presented here introduces a semantic structuring and an ontology framework, to organise medical images considering standard terminology and disease coding formats (SNOMED, ICD9, LOINC..).
NASA Astrophysics Data System (ADS)
Guo, H., II
2016-12-01
Spatial distribution information of mountainous area settlement place is of great significance to the earthquake emergency work because most of the key earthquake hazardous areas of china are located in the mountainous area. Remote sensing has the advantages of large coverage and low cost, it is an important way to obtain the spatial distribution information of mountainous area settlement place. At present, fully considering the geometric information, spectral information and texture information, most studies have applied object-oriented methods to extract settlement place information, In this article, semantic constraints is to be added on the basis of object-oriented methods. The experimental data is one scene remote sensing image of domestic high resolution satellite (simply as GF-1), with a resolution of 2 meters. The main processing consists of 3 steps, the first is pretreatment, including ortho rectification and image fusion, the second is Object oriented information extraction, including Image segmentation and information extraction, the last step is removing the error elements under semantic constraints, in order to formulate these semantic constraints, the distribution characteristics of mountainous area settlement place must be analyzed and the spatial logic relation between settlement place and other objects must be considered. The extraction accuracy calculation result shows that the extraction accuracy of object oriented method is 49% and rise up to 86% after the use of semantic constraints. As can be seen from the extraction accuracy, the extract method under semantic constraints can effectively improve the accuracy of mountainous area settlement place information extraction. The result shows that it is feasible to extract mountainous area settlement place information form GF-1 image, so the article proves that it has a certain practicality to use domestic high resolution optical remote sensing image in earthquake emergency preparedness.
CRL/NMSU and Brandeis: Description of the MucBruce System as Used for MUC-4
1992-01-01
developing a method fo r identifying articles of interest and extracting and storing specific kinds of information from large volumes o f Japanese and...performance . Most of the information produced in our MUC template s is arrived at by probing the text which surrounds `significant’ words for the...strings with semantic information . The other two, the Relevant Template Filter and the Relevant Paragraph Filter, perform word frequency analysis to
Study of Development for RFID System to Hospital Environment.
Hong, Seung Kwon; Sung, Myung-Whun
2015-01-01
RFID/USN develops information systems for anytime, anywhere to anybody access Electronic Medical Records (EMR). The goal of the present study is to develop a RFID/USN-based information system for the hospital environment. First, unable to recognize, second, able to recognize as a pursuit of place and suppose the time of medical examination. A retrospective analysis of 235 RFID monitoring results, from four ENT ambulatory clinics of Seoul National University Hospital were extracted by a reader program and monitoring of RFID tag (2006.11.16~2006.12.16). RFID detection for sensing reader of this study has been put into representing "place" and "spending time" of patients for medical history taking and examination. Through the RFID of detection for specific place and spending time of medical examination, RFID/USN develops information system progressing in the EMR of hospital system.
NASA Astrophysics Data System (ADS)
Lancaster, N.; LeBlanc, D.; Bebis, G.; Nicolescu, M.
2015-12-01
Dune-field patterns are believed to behave as self-organizing systems, but what causes the patterns to form is still poorly understood. The most obvious (and in many cases the most significant) aspect of a dune system is the pattern of dune crest lines. Extracting meaningful features such as crest length, orientation, spacing, bifurcations, and merging of crests from image data can reveal important information about the specific dune-field morphological properties, development, and response to changes in boundary conditions, but manual methods are labor-intensive and time-consuming. We are developing the capability to recognize and characterize patterns of sand dunes on planetary surfaces. Our goal is to develop a robust methodology and the necessary algorithms for automated or semi-automated extraction of dune morphometric information from image data. Our main approach uses image processing methods to extract gradient information from satellite images of dune fields. Typically, the gradients have a dominant magnitude and orientation. In many cases, the images have two major dominant gradient orientations, for the sunny and shaded side of the dunes. A histogram of the gradient orientations is used to determine the dominant orientation. A threshold is applied to the image based on gradient orientations which agree with the dominant orientation. The contours of the binary image can then be used to determine the dune crest-lines, based on pixel intensity values. Once the crest-lines have been extracted, the morphological properties can be computed. We have tested our approach on a variety of images of linear and crescentic (transverse) dunes and compared dune detection algorithms with manually-digitized dune crest lines, achieving true positive values of 0.57-0.99; and false positives values of 0.30-0.67, indicating that out approach is generally robust.
Jenke, Dennis; Egert, Thomas; Hendricker, Alan; Castner, James; Feinberg, Tom; Houston, Christopher; Hunt, Desmond G; Lynch, Michael; Nicholas, Kumudini; Norwood, Daniel L; Paskiet, Diane; Ruberto, Michael; Smith, Edward J; Holcomb, Frank; Markovic, Ingrid
2017-01-01
A simulating leaching (migration) study was performed on a model container-closure system relevant to parenteral and ophthalmic drug products. This container-closure system consisted of a linear low-density polyethylene bottle (primary container), a polypropylene cap and an elastomeric cap liner (closure), an adhesive label (labeling), and a foil overpouch (secondary container). The bottles were filled with simulating solvents (aqueous salt/acid mixture at pH 2.5, aqueous buffer at pH 9.5, and 1/1 v/v isopropanol/water), a label was affixed to the filled and capped bottles, the filled bottles were placed into the foil overpouch, and the filled and pouched units were stored either upright or inverted for up to 6 months at 40 °C. After storage, the leaching solutions were tested for leached substances using multiple complementary analytical techniques to address volatile, semi-volatile, and non-volatile organic and inorganic extractables as potential leachables.The leaching data generated supported several conclusions, including that (1) the extractables (leachables) profile revealed by a simulating leaching study can qualitatively be correlated with compositional information for materials of construction, (2) the chemical nature of both the extracting medium and the individual extractables (leachables) can markedly affect the resulting profile, and (3) while direct contact between a drug product and a system's material of construction may exacerbate the leaching of substances from that material by the drug product, direct contact is not a prerequisite for migration and leaching to occur. LAY ABSTRACT: The migration of container-related extractables from a model pharmaceutical container-closure system and into simulated drug product solutions was studied, focusing on circumstances relevant to parenteral and ophthalmic drug products. The model system was constructed specifically to address the migration of extractables from labels applied to the outside of the primary container. The study demonstrated that (1) the extractables that do migrate can be correlated to the composition of the materials used to construct the container-closure systems, (2) the extent of migration is affected by the chemical nature of the simulating solutions and the extractables themselves, and (3) even though labels may not be in direct contact with a contained solution, label-related extractables can accumulate as leachables in those solutions. © PDA, Inc. 2017.
Yu, Zhan; Li, Yuanyang; Liu, Lisheng; Guo, Jin; Wang, Tingfeng; Yang, Guoqing
2017-11-10
The speckle pattern (line by line) sequential extraction (SPSE) metric is proposed by the one-dimensional speckle intensity level crossing theory. Through the sequential extraction of received speckle information, the speckle metrics for estimating the variation of focusing spot size on a remote diffuse target are obtained. Based on the simulation, we will give some discussions about the SPSE metric range of application under the theoretical conditions, and the aperture size will affect the metric performance of the observation system. The results of the analyses are verified by the experiment. This method is applied to the detection of relative static target (speckled jitter frequency is less than the CCD sampling frequency). The SPSE metric can determine the variation of the focusing spot size over a long distance, moreover, the metric will estimate the spot size under some conditions. Therefore, the monitoring and the feedback of far-field spot will be implemented laser focusing system applications and help the system to optimize the focusing performance.
Automated software system for checking the structure and format of ACM SIG documents
NASA Astrophysics Data System (ADS)
Mirza, Arsalan Rahman; Sah, Melike
2017-04-01
Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.
Jackson, Richard; Kartoglu, Ismail; Stringer, Clive; Gorrell, Genevieve; Roberts, Angus; Song, Xingyi; Wu, Honghan; Agrawal, Asha; Lui, Kenneth; Groza, Tudor; Lewsley, Damian; Northwood, Doug; Folarin, Amos; Stewart, Robert; Dobson, Richard
2018-06-25
Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King's College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present. To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King's College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall. We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.
Autism, Context/Noncontext Information Processing, and Atypical Development
Skoyles, John R.
2011-01-01
Autism has been attributed to a deficit in contextual information processing. Attempts to understand autism in terms of such a defect, however, do not include more recent computational work upon context. This work has identified that context information processing depends upon the extraction and use of the information hidden in higher-order (or indirect) associations. Higher-order associations underlie the cognition of context rather than that of situations. This paper starts by examining the differences between higher-order and first-order (or direct) associations. Higher-order associations link entities not directly (as with first-order ones) but indirectly through all the connections they have via other entities. Extracting this information requires the processing of past episodes as a totality. As a result, this extraction depends upon specialised extraction processes separate from cognition. This information is then consolidated. Due to this difference, the extraction/consolidation of higher-order information can be impaired whilst cognition remains intact. Although not directly impaired, cognition will be indirectly impaired by knock on effects such as cognition compensating for absent higher-order information with information extracted from first-order associations. This paper discusses the implications of this for the inflexible, literal/immediate, and inappropriate information processing of autistic individuals. PMID:22937255
A semi-supervised learning framework for biomedical event extraction based on hidden topics.
Zhou, Deyu; Zhong, Dayou
2015-05-01
Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.
PANTHER. Pattern ANalytics To support High-performance Exploitation and Reasoning.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czuchlewski, Kristina Rodriguez; Hart, William E.
Sandia has approached the analysis of big datasets with an integrated methodology that uses computer science, image processing, and human factors to exploit critical patterns and relationships in large datasets despite the variety and rapidity of information. The work is part of a three-year LDRD Grand Challenge called PANTHER (Pattern ANalytics To support High-performance Exploitation and Reasoning). To maximize data analysis capability, Sandia pursued scientific advances across three key technical domains: (1) geospatial-temporal feature extraction via image segmentation and classification; (2) geospatial-temporal analysis capabilities tailored to identify and process new signatures more efficiently; and (3) domain- relevant models of humanmore » perception and cognition informing the design of analytic systems. Our integrated results include advances in geographical information systems (GIS) in which we discover activity patterns in noisy, spatial-temporal datasets using geospatial-temporal semantic graphs. We employed computational geometry and machine learning to allow us to extract and predict spatial-temporal patterns and outliers from large aircraft and maritime trajectory datasets. We automatically extracted static and ephemeral features from real, noisy synthetic aperture radar imagery for ingestion into a geospatial-temporal semantic graph. We worked with analysts and investigated analytic workflows to (1) determine how experiential knowledge evolves and is deployed in high-demand, high-throughput visual search workflows, and (2) better understand visual search performance and attention. Through PANTHER, Sandia's fundamental rethinking of key aspects of geospatial data analysis permits the extraction of much richer information from large amounts of data. The project results enable analysts to examine mountains of historical and current data that would otherwise go untouched, while also gaining meaningful, measurable, and defensible insights into overlooked relationships and patterns. The capability is directly relevant to the nation's nonproliferation remote-sensing activities and has broad national security applications for military and intelligence- gathering organizations.« less
Tudor, Catalina O; Ross, Karen E; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N
2015-01-01
Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction networks involving 14-3-3 proteins identified from cancer-related versus diabetes-related articles. Comparison of the phosphorylation interaction network of kinases, phosphoproteins and interactants obtained from eFIP searches, along with enrichment analysis of the protein set, revealed several shared interactions, highlighting common pathways discussed in the context of both diseases. © The Author(s) 2015. Published by Oxford University Press.
Analysis of entropy extraction efficiencies in random number generation systems
NASA Astrophysics Data System (ADS)
Wang, Chao; Wang, Shuang; Chen, Wei; Yin, Zhen-Qiang; Han, Zheng-Fu
2016-05-01
Random numbers (RNs) have applications in many areas: lottery games, gambling, computer simulation, and, most importantly, cryptography [N. Gisin et al., Rev. Mod. Phys. 74 (2002) 145]. In cryptography theory, the theoretical security of the system calls for high quality RNs. Therefore, developing methods for producing unpredictable RNs with adequate speed is an attractive topic. Early on, despite the lack of theoretical support, pseudo RNs generated by algorithmic methods performed well and satisfied reasonable statistical requirements. However, as implemented, those pseudorandom sequences were completely determined by mathematical formulas and initial seeds, which cannot introduce extra entropy or information. In these cases, “random” bits are generated that are not at all random. Physical random number generators (RNGs), which, in contrast to algorithmic methods, are based on unpredictable physical random phenomena, have attracted considerable research interest. However, the way that we extract random bits from those physical entropy sources has a large influence on the efficiency and performance of the system. In this manuscript, we will review and discuss several randomness extraction schemes that are based on radiation or photon arrival times. We analyze the robustness, post-processing requirements and, in particular, the extraction efficiency of those methods to aid in the construction of efficient, compact and robust physical RNG systems.
Automated encoding of clinical documents based on natural language processing.
Friedman, Carol; Shagina, Lyudmila; Lussier, Yves; Hripcsak, George
2004-01-01
The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2017-08-18
The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.
Information-Based Analysis of Data Assimilation (Invited)
NASA Astrophysics Data System (ADS)
Nearing, G. S.; Gupta, H. V.; Crow, W. T.; Gong, W.
2013-12-01
Data assimilation is defined as the Bayesian conditioning of uncertain model simulations on observations for the purpose of reducing uncertainty about model states. Practical data assimilation methods make the application of Bayes' law tractable either by employing assumptions about the prior, posterior and likelihood distributions (e.g., the Kalman family of filters) or by using resampling methods (e.g., bootstrap filter). We propose to quantify the efficiency of these approximations in an OSSE setting using information theory and, in an OSSE or real-world validation setting, to measure the amount - and more importantly, the quality - of information extracted from observations during data assimilation. To analyze DA assumptions, uncertainty is quantified as the Shannon-type entropy of a discretized probability distribution. The maximum amount of information that can be extracted from observations about model states is the mutual information between states and observations, which is equal to the reduction in entropy in our estimate of the state due to Bayesian filtering. The difference between this potential and the actual reduction in entropy due to Kalman (or other type of) filtering measures the inefficiency of the filter assumptions. Residual uncertainty in DA posterior state estimates can be attributed to three sources: (i) non-injectivity of the observation operator, (ii) noise in the observations, and (iii) filter approximations. The contribution of each of these sources is measurable in an OSSE setting. The amount of information extracted from observations by data assimilation (or system identification, including parameter estimation) can also be measured by Shannon's theory. Since practical filters are approximations of Bayes' law, it is important to know whether the information that is extracted form observations by a filter is reliable. We define information as either good or bad, and propose to measure these two types of information using partial Kullback-Leibler divergences. Defined this way, good and bad information sum to total information. This segregation of information into good and bad components requires a validation target distribution; in a DA OSSE setting, this can be the true Bayesian posterior, but in a real-world setting the validation target might be determined by a set of in situ observations.
DOT National Transportation Integrated Search
2013-04-01
There are three tasks for this research : 1. Methodology to extract Road Usage Patterns from Phone Data: We combined the : most complete record of daily mobility, based on large-scale mobile phone data, with : detailed Geographic Information System (...
1992-04-01
AND SCHEDULING" TIM FINN, UNIVERSITY OF MARYLAND, BALTIMORE COUNTY E. " EXTRACTING RULES FROM SOFTWARE FOR KNOWLEDGE-BASES" NOAH S. PRYWES, UNIVERSITY...Databases for Planning and Scheduling" Tim Finin, Unisys Corporation 8:30 - 9:00 " Extracting Rules from Software for Knowledge Baseso Noah Prywes, U. of...Space Requirements are Tractable E.G.: FEM, Multiplication Routines, Sorting Programs Lebmwmy fo Al Roseew d. The Ohio Male Unlversity A-2 Type 2
Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play
ERIC Educational Resources Information Center
North, Jamie S.; Williams, A. Mark
2008-01-01
The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…
Can we replace curation with information extraction software?
Karp, Peter D
2016-01-01
Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.
The design and implementation of web mining in web sites security
NASA Astrophysics Data System (ADS)
Li, Jian; Zhang, Guo-Yin; Gu, Guo-Chang; Li, Jian-Li
2003-06-01
The backdoor or information leak of Web servers can be detected by using Web Mining techniques on some abnormal Web log and Web application log data. The security of Web servers can be enhanced and the damage of illegal access can be avoided. Firstly, the system for discovering the patterns of information leakages in CGI scripts from Web log data was proposed. Secondly, those patterns for system administrators to modify their codes and enhance their Web site security were provided. The following aspects were described: one is to combine web application log with web log to extract more information, so web data mining could be used to mine web log for discovering the information that firewall and Information Detection System cannot find. Another approach is to propose an operation module of web site to enhance Web site security. In cluster server session, Density-Based Clustering technique is used to reduce resource cost and obtain better efficiency.
Infrared moving small target detection based on saliency extraction and image sparse representation
NASA Astrophysics Data System (ADS)
Zhang, Xiaomin; Ren, Kan; Gao, Jin; Li, Chaowei; Gu, Guohua; Wan, Minjie
2016-10-01
Moving small target detection in infrared image is a crucial technique of infrared search and tracking system. This paper present a novel small target detection technique based on frequency-domain saliency extraction and image sparse representation. First, we exploit the features of Fourier spectrum image and magnitude spectrum of Fourier transform to make a rough extract of saliency regions and use a threshold segmentation system to classify the regions which look salient from the background, which gives us a binary image as result. Second, a new patch-image model and over-complete dictionary were introduced to the detection system, then the infrared small target detection was converted into a problem solving and optimization process of patch-image information reconstruction based on sparse representation. More specifically, the test image and binary image can be decomposed into some image patches follow certain rules. We select the target potential area according to the binary patch-image which contains salient region information, then exploit the over-complete infrared small target dictionary to reconstruct the test image blocks which may contain targets. The coefficients of target image patch satisfy sparse features. Finally, for image sequence, Euclidean distance was used to reduce false alarm ratio and increase the detection accuracy of moving small targets in infrared images due to the target position correlation between frames.
Structural health monitoring feature design by genetic programming
NASA Astrophysics Data System (ADS)
Harvey, Dustin Y.; Todd, Michael D.
2014-09-01
Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and other high-capital or life-safety critical structures. Conventional data processing involves pre-processing and extraction of low-dimensional features from in situ time series measurements. The features are then input to a statistical pattern recognition algorithm to perform the relevant classification or regression task necessary to facilitate decisions by the SHM system. Traditional design of signal processing and feature extraction algorithms can be an expensive and time-consuming process requiring extensive system knowledge and domain expertise. Genetic programming, a heuristic program search method from evolutionary computation, was recently adapted by the authors to perform automated, data-driven design of signal processing and feature extraction algorithms for statistical pattern recognition applications. The proposed method, called Autofead, is particularly suitable to handle the challenges inherent in algorithm design for SHM problems where the manifestation of damage in structural response measurements is often unclear or unknown. Autofead mines a training database of response measurements to discover information-rich features specific to the problem at hand. This study provides experimental validation on three SHM applications including ultrasonic damage detection, bearing damage classification for rotating machinery, and vibration-based structural health monitoring. Performance comparisons with common feature choices for each problem area are provided demonstrating the versatility of Autofead to produce significant algorithm improvements on a wide range of problems.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-01-01
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-03-19
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
Science information systems: Visualization
NASA Technical Reports Server (NTRS)
Wall, Ray J.
1991-01-01
Future programs in earth science, planetary science, and astrophysics will involve complex instruments that produce data at unprecedented rates and volumes. Current methods for data display, exploration, and discovery are inadequate. Visualization technology offers a means for the user to comprehend, explore, and examine complex data sets. The goal of this program is to increase the effectiveness and efficiency of scientists in extracting scientific information from large volumes of instrument data.
Djioua, Moussa; Plamondon, Réjean
2009-11-01
In this paper, we present a new analytical method for estimating the parameters of Delta-Lognormal functions and characterizing handwriting strokes. According to the Kinematic Theory of rapid human movements, these parameters contain information on both the motor commands and the timing properties of a neuromuscular system. The new algorithm, called XZERO, exploits relationships between the zero crossings of the first and second time derivatives of a lognormal function and its four basic parameters. The methodology is described and then evaluated under various testing conditions. The new tool allows a greater variety of stroke patterns to be processed automatically. Furthermore, for the first time, the extraction accuracy is quantified empirically, taking advantage of the exponential relationships that link the dispersion of the extraction errors with its signal-to-noise ratio. A new extraction system which combines this algorithm with two other previously published methods is also described and evaluated. This system provides researchers involved in various domains of pattern analysis and artificial intelligence with new tools for the basic study of single strokes as primitives for understanding rapid human movements.
Tecchio, Franca; Porcaro, Camillo; Barbati, Giulia; Zappasodi, Filippo
2007-01-01
A brain–computer interface (BCI) can be defined as any system that can track the person's intent which is embedded in his/her brain activity and, from it alone, translate the intention into commands of a computer. Among the brain signal monitoring systems best suited for this challenging task, electroencephalography (EEG) and magnetoencephalography (MEG) are the most realistic, since both are non-invasive, EEG is portable and MEG could provide more specific information that could be later exploited also through EEG signals. The first two BCI steps require set up of the appropriate experimental protocol while recording the brain signal and then to extract interesting features from the recorded cerebral activity. To provide information useful in these BCI stages, our aim is to provide an overview of a new procedure we recently developed, named functional source separation (FSS). As it comes from the blind source separation algorithms, it exploits the most valuable information provided by the electrophysiological techniques, i.e. the waveform signal properties, remaining blind to the biophysical nature of the signal sources. FSS returns the single trial source activity, estimates the time course of a neuronal pool along different experimental states on the basis of a specific functional requirement in a specific time period, and uses the simulated annealing as the optimization procedure allowing the exploit of functional constraints non-differentiable. Moreover, a minor section is included, devoted to information acquired by MEG in stroke patients, to guide BCI applications aiming at sustaining motor behaviour in these patients. Relevant BCI features – spatial and time-frequency properties – are in fact altered by a stroke in the regions devoted to hand control. Moreover, a method to investigate the relationship between sensory and motor hand cortical network activities is described, providing information useful to develop BCI feedback control systems. This review provides a description of the FSS technique, a promising tool for the BCI community for online electrophysiological feature extraction, and offers interesting information to develop BCI applications to sustain hand control in stroke patients. PMID:17331989
NASA Technical Reports Server (NTRS)
1982-01-01
A project to develop an effective mobility aid for blind pedestrians which acquires consecutive images of the scenes before a moving pedestrian, which locates and identifies the pedestrian's path and potential obstacles in the path, which presents path and obstacle information to the pedestrian, and which operates in real-time is discussed. The mobility aid has three principal components: an image acquisition system, an image interpretation system, and an information presentation system. The image acquisition system consists of a miniature, solid-state TV camera which transforms the scene before the blind pedestrian into an image which can be received by the image interpretation system. The image interpretation system is implemented on a microprocessor which has been programmed to execute real-time feature extraction and scene analysis algorithms for locating and identifying the pedestrian's path and potential obstacles. Identity and location information is presented to the pedestrian by means of tactile coding and machine-generated speech.
Wang, Li; Zhang, Yaoyun; Jiang, Min; Wang, Jingqi; Dong, Jiancheng; Liu, Yun; Tao, Cui; Jiang, Guoqian; Zhou, Yi; Xu, Hua
2018-07-01
In recent years, electronic health record systems have been widely implemented in China, making clinical data available electronically. However, little effort has been devoted to making drug information exchangeable among these systems. This study aimed to build a Normalized Chinese Clinical Drug (NCCD) knowledge base, by applying and extending the information model of RxNorm to Chinese clinical drugs. Chinese drugs were collected from 4 major resources-China Food and Drug Administration, China Health Insurance Systems, Hospital Pharmacy Systems, and China Pharmacopoeia-for integration and normalization in NCCD. Chemical drugs were normalized using the information model in RxNorm without much change. Chinese patent drugs (i.e., Chinese herbal extracts), however, were represented using an expanded RxNorm model to incorporate the unique characteristics of these drugs. A hybrid approach combining automated natural language processing technologies and manual review by domain experts was then applied to drug attribute extraction, normalization, and further generation of drug names at different specification levels. Lastly, we reported the statistics of NCCD, as well as the evaluation results using several sets of randomly selected Chinese drugs. The current version of NCCD contains 16 976 chemical drugs and 2663 Chinese patent medicines, resulting in 19 639 clinical drugs, 250 267 unique concepts, and 2 602 760 relations. By manual review of 1700 chemical drugs and 250 Chinese patent drugs randomly selected from NCCD (about 10%), we showed that the hybrid approach could achieve an accuracy of 98.60% for drug name extraction and normalization. Using a collection of 500 chemical drugs and 500 Chinese patent drugs from other resources, we showed that NCCD achieved coverages of 97.0% and 90.0% for chemical drugs and Chinese patent drugs, respectively. Evaluation results demonstrated the potential to improve interoperability across various electronic drug systems in China.
Space and Earth Science Data Compression Workshop
NASA Technical Reports Server (NTRS)
Tilton, James C. (Editor)
1991-01-01
The workshop explored opportunities for data compression to enhance the collection and analysis of space and Earth science data. The focus was on scientists' data requirements, as well as constraints imposed by the data collection, transmission, distribution, and archival systems. The workshop consisted of several invited papers; two described information systems for space and Earth science data, four depicted analysis scenarios for extracting information of scientific interest from data collected by Earth orbiting and deep space platforms, and a final one was a general tutorial on image data compression.
A mobile care system with alert mechanism.
Lee, Ren-Guey; Chen, Kuei-Chien; Hsiao, Chun-Chieh; Tseng, Chwan-Lu
2007-09-01
Hypertension and arrhythmia are chronic diseases, which can be effectively prevented and controlled only if the physiological parameters of the patient are constantly monitored, along with the full support of the health education and professional medical care. In this paper, a role-based intelligent mobile care system with alert mechanism in chronic care environment is proposed and implemented. The roles in our system include patients, physicians, nurses, and healthcare providers. Each of the roles represents a person that uses a mobile device such as a mobile phone to communicate with the server setup in the care center such that he or she can go around without restrictions. For commercial mobile phones with Bluetooth communication capability attached to chronic patients, we have developed physiological signal recognition algorithms that were implemented and built-in in the mobile phone without affecting its original communication functions. It is thus possible to integrate several front-end mobile care devices with Bluetooth communication capability to extract patients' various physiological parameters [such as blood pressure, pulse, saturation of haemoglobin (SpO2), and electrocardiogram (ECG)], to monitor multiple physiological signals without space limit, and to upload important or abnormal physiological information to healthcare center for storage and analysis or transmit the information to physicians and healthcare providers for further processing. Thus, the physiological signal extraction devices only have to deal with signal extraction and wireless transmission. Since they do not have to do signal processing, their form factor can be further reduced to reach the goal of microminiaturization and power saving. An alert management mechanism has been included in back-end healthcare center to initiate various strategies for automatic emergency alerts after receiving emergency messages or after automatically recognizing emergency messages. Within the time intervals in system setting, according to the medical history of a specific patient, our prototype system can inform various healthcare providers in sequence to provide healthcare service with their reply to ensure the accuracy of alert information and the completeness of early warning notification to further improve the healthcare quality. In the end, with the testing results and performance evaluation of our implemented system prototype, we conclude that it is possible to set up a complete intelligent healt care chain with mobile monitoring and healthcare service via the assistance of our system.
ERIC Educational Resources Information Center
Swan, Gerry
2009-01-01
In 2003, a portfolio system was implemented to manage the data associated with the field experiences in a teacher education program at a research institution in the southeast region of the United States. In this longitudinal study, the implementation trends from usage data extracted from the system are used to discuss the implications for the use…
NASA Astrophysics Data System (ADS)
Ramakrishnan, Sowmya; Alvino, Christopher; Grady, Leo; Kiraly, Atilla
2011-03-01
We present a complete automatic system to extract 3D centerlines of ribs from thoracic CT scans. Our rib centerline system determines the positional information for the rib cage consisting of extracted rib centerlines, spinal canal centerline, pairing and labeling of ribs. We show an application of this output to produce an enhanced visualization of the rib cage by the method of Kiraly et al., in which the ribs are digitally unfolded along their centerlines. The centerline extraction consists of three stages: (a) pre-trace processing for rib localization, (b) rib centerline tracing, and (c) post-trace processing to merge the rib traces. Then we classify ribs from non-ribs and determine anatomical rib labeling. Our novel centerline tracing technique uses the Random Walker algorithm to segment the structural boundary of the rib in successive 2D cross sections orthogonal to the longitudinal direction of the ribs. Then the rib centerline is progressively traced along the rib using a 3D Kalman filter. The rib centerline extraction framework was evaluated on 149 CT datasets with varying slice spacing, dose, and under a variety of reconstruction kernels. The results of the evaluation are presented. The extraction takes approximately 20 seconds on a modern radiology workstation and performs robustly even in the presence of partial volume effects or rib pathologies such as bone metastases or fractures, making the system suitable for assisting clinicians in expediting routine rib reading for oncology and trauma applications.
NASA Astrophysics Data System (ADS)
Kim, Min Young; Cho, Hyung Suck; Kim, Jae H.
2002-10-01
In recent years, intelligent autonomous mobile robots have drawn tremendous interests as service robots for serving human or industrial robots for replacing human. To carry out the task, robots must be able to sense and recognize 3D space that they live or work. In this paper, we deal with the topic related to 3D sensing system for the environment recognition of mobile robots. For this, the structured lighting is basically utilized for a 3D visual sensor system because of the robustness on the nature of the navigation environment and the easy extraction of feature information of interest. The proposed sensing system is classified into a trinocular vision system, which is composed of the flexible multi-stripe laser projector, and two cameras. The principle of extracting the 3D information is based on the optical triangulation method. With modeling the projector as another camera and using the epipolar constraints which the whole cameras makes, the point-to-point correspondence between the line feature points in each image is established. In this work, the principle of this sensor is described in detail, and a series of experimental tests is performed to show the simplicity and efficiency and accuracy of this sensor system for 3D the environment sensing and recognition.
Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.
Dave, Jaydev K; Gingold, Eric L
2013-01-01
The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.
NASA Astrophysics Data System (ADS)
Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia
2018-05-01
Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.
Experience of the ARGO autonomous vehicle
NASA Astrophysics Data System (ADS)
Bertozzi, Massimo; Broggi, Alberto; Conte, Gianni; Fascioli, Alessandra
1998-07-01
This paper presents and discusses the first results obtained by the GOLD (Generic Obstacle and Lane Detection) system as an automatic driver of ARGO. ARGO is a Lancia Thema passenger car equipped with a vision-based system that allows to extract road and environmental information from the acquired scene. By means of stereo vision, obstacles on the road are detected and localized, while the processing of a single monocular image allows to extract the road geometry in front of the vehicle. The generality of the underlying approach allows to detect generic obstacles (without constraints on shape, color, or symmetry) and to detect lane markings even in dark and in strong shadow conditions. The hardware system consists of a PC Pentium 200 Mhz with MMX technology and a frame-grabber board able to acquire 3 b/w images simultaneously; the result of the processing (position of obstacles and geometry of the road) is used to drive an actuator on the steering wheel, while debug information are presented to the user on an on-board monitor and a led-based control panel.
Rinaldi, Fabio; Ellendorff, Tilia Renate; Madan, Sumit; Clematide, Simon; van der Lek, Adrian; Mevissen, Theo; Fluck, Juliane
2016-01-01
Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text. © The Author(s) 2016. Published by Oxford University Press.
Ghasemzadeh, Hassan; Loseu, Vitali; Jafari, Roozbeh
2010-03-01
Mobile sensor-based systems are emerging as promising platforms for healthcare monitoring. An important goal of these systems is to extract physiological information about the subject wearing the network. Such information can be used for life logging, quality of life measures, fall detection, extraction of contextual information, and many other applications. Data collected by these sensor nodes are overwhelming, and hence, an efficient data processing technique is essential. In this paper, we present a system using inexpensive, off-the-shelf inertial sensor nodes that constructs motion transcripts from biomedical signals and identifies movements by taking collaboration between the nodes into consideration. Transcripts are built of motion primitives and aim to reduce the complexity of the original data. We then label each primitive with a unique symbol and generate a sequence of symbols, known as motion template, representing a particular action. This model leads to a distributed algorithm for action recognition using edit distance with respect to motion templates. The algorithm reduces the number of active nodes during every classification decision. We present our results using data collected from five normal subjects performing transitional movements. The results clearly illustrate the effectiveness of our framework. In particular, we obtain a classification accuracy of 84.13% with only one sensor node involved in the classification process.
NASA Astrophysics Data System (ADS)
Zoratti, Paul K.; Gilbert, R. Kent; Majewski, Ronald; Ference, Jack
1995-12-01
Development of automotive collision warning systems has progressed rapidly over the past several years. A key enabling technology for these systems is millimeter-wave radar. This paper addresses a very critical millimeter-wave radar sensing issue for automotive radar, namely the scattering characteristics of common roadway objects such as vehicles, roadsigns, and bridge overpass structures. The data presented in this paper were collected on ERIM's Fine Resolution Radar Imaging Rotary Platform Facility and processed with ERIM's image processing tools. The value of this approach is that it provides system developers with a 2D radar image from which information about individual point scatterers `within a single target' can be extracted. This information on scattering characteristics will be utilized to refine threat assessment processing algorithms and automotive radar hardware configurations. (1) By evaluating the scattering characteristics identified in the radar image, radar signatures as a function of aspect angle for common roadway objects can be established. These signatures will aid in the refinement of threat assessment processing algorithms. (2) Utilizing ERIM's image manipulation tools, total RCS and RCS as a function of range and azimuth can be extracted from the radar image data. This RCS information will be essential in defining the operational envelope (e.g. dynamic range) within which any radar sensor hardware must be designed.
Extracting 3D Information from 1D and 2D Diagnostic Systems on the DIII-D Tokamak
NASA Astrophysics Data System (ADS)
Brookman, Michael
2017-10-01
The interpretation of tokamak data often hinges on assumptions of axisymetry and flux surface equilibria, neglecting 3D effects. This work discusses examples on the DIII-D tokamak where this assumption is an insufficient approximation, and explores the diagnostic information available to resolve 3D effects while preserving 1D profiles. Methods for extracting 3D data from the electron cyclotron emission radiometers, density profile reflectometer, and Thomson scattering system are discussed. Coordinating diagnostics around the tokamak shows the significance of 3D features, such as sawteeth[1] and resonant magnetic perturbations. A consequence of imposed 3D perturbations is a shift in major radius of measured profiles between diagnostics at different toroidal locations. Integrating different diagnostics requires a database containing information about their toroidal, poloidal, and radial locations. Through the data analysis framework OMFIT, it is possible to measure the magnitude of the apparent shifts from 3D effects and enforce consistency between diagnostics. Using the existing 1D and 2D diagnostic systems on DIII-D, this process allows the effects of the 3D perturbations on 1D profiles to be addressed. Supported by US DOE contracts DE-FC02-04ER54698, DE-FG03-97ER54415.
The use of analytical sedimentation velocity to extract thermodynamic linkage.
Cole, James L; Correia, John J; Stafford, Walter F
2011-11-01
For 25 years, the Gibbs Conference on Biothermodynamics has focused on the use of thermodynamics to extract information about the mechanism and regulation of biological processes. This includes the determination of equilibrium constants for macromolecular interactions by high precision physical measurements. These approaches further reveal thermodynamic linkages to ligand binding events. Analytical ultracentrifugation has been a fundamental technique in the determination of macromolecular reaction stoichiometry and energetics for 85 years. This approach is highly amenable to the extraction of thermodynamic couplings to small molecule binding in the overall reaction pathway. In the 1980s this approach was extended to the use of sedimentation velocity techniques, primarily by the analysis of tubulin-drug interactions by Na and Timasheff. This transport method necessarily incorporates the complexity of both hydrodynamic and thermodynamic nonideality. The advent of modern computational methods in the last 20 years has subsequently made the analysis of sedimentation velocity data for interacting systems more robust and rigorous. Here we review three examples where sedimentation velocity has been useful at extracting thermodynamic information about reaction stoichiometry and energetics. Approaches to extract linkage to small molecule binding and the influence of hydrodynamic nonideality are emphasized. These methods are shown to also apply to the collection of fluorescence data with the new Aviv FDS. Copyright © 2011 Elsevier B.V. All rights reserved.
The use of analytical sedimentation velocity to extract thermodynamic linkage
Cole, James L.; Correia, John J.; Stafford, Walter F.
2011-01-01
For 25 years, the Gibbs Conference on Biothermodynamics has focused on the use of thermodynamics to extract information about the mechanism and regulation of biological processes. This includes the determination of equilibrium constants for macromolecular interactions by high precision physical measurements. These approaches further reveal thermodynamic linkages to ligand binding events. Analytical ultracentrifugation has been a fundamental technique in the determination of macromolecular reaction stoichiometry and energetics for 85 years. This approach is highly amenable to the extraction of thermodynamic couplings to small molecule binding in the overall reaction pathway. In the 1980’s this approach was extended to the use of sedimentation velocity techniques, primarily by the analysis of tubulin-drug interactions by Na and Timasheff. This transport method necessarily incorporates the complexity of both hydrodynamic and thermodynamic nonideality. The advent of modern computational methods in the last 20 years has subsequently made the analysis of sedimentation velocity data for interacting systems more robust and rigorous. Here we review three examples where sedimentation velocity has been useful at extracting thermodynamic information about reaction stoichiometry and energetics. Approaches to extract linkage to small molecule binding and the influence of hydrodynamic nonideality are emphasized. These methods are shown to also apply to the collection of fluorescence data with the new Aviv FDS. PMID:21703752
Visual slant misperception and the Black-Hole landing situation
NASA Technical Reports Server (NTRS)
Perrone, J. A.
1983-01-01
A theory which explains the tendency for dangerously low approaches during night landing situations is presented. The two dimensional information at the pilot's eye contains sufficient information for the visual system to extract the angle of slant of the runway relative to the approach path. The analysis is depends upon perspective information which is available at a certain distance out from the aimpoint, to either side of the runway edgelights. Under black hole landing conditions, however, this information is not available, and it is proposed that the visual system use instead the only available information, the perspective gradient of the runway edgelights. An equation is developed which predicts the perceived approach angle when this incorrect parameter is used. The predictions are in close agreement with existing experimental data.
Relations between work and entropy production for general information-driven, finite-state engines
NASA Astrophysics Data System (ADS)
Merhav, Neri
2017-02-01
We consider a system model of a general finite-state machine (ratchet) that simultaneously interacts with three kinds of reservoirs: a heat reservoir, a work reservoir, and an information reservoir, the latter being taken to be a running digital tape whose symbols interact sequentially with the machine. As has been shown in earlier work, this finite-state machine can act as a demon (with memory), which creates a net flow of energy from the heat reservoir into the work reservoir (thus extracting useful work) at the price of increasing the entropy of the information reservoir. Under very few assumptions, we propose a simple derivation of a family of inequalities that relate the work extraction with the entropy production. These inequalities can be seen as either upper bounds on the extractable work or as lower bounds on the entropy production, depending on the point of view. Many of these bounds are relatively easy to calculate and they are tight in the sense that equality can be approached arbitrarily closely. In their basic forms, these inequalities are applicable to any finite number of cycles (and not only asymptotically), and for a general input information sequence (possibly correlated), which is not necessarily assumed even stationary. Several known results are obtained as special cases.
Phenomenological analysis of medical time series with regular and stochastic components
NASA Astrophysics Data System (ADS)
Timashev, Serge F.; Polyakov, Yuriy S.
2007-06-01
Flicker-Noise Spectroscopy (FNS), a general approach to the extraction and parameterization of resonant and stochastic components contained in medical time series, is presented. The basic idea of FNS is to treat the correlation links present in sequences of different irregularities, such as spikes, "jumps", and discontinuities in derivatives of different orders, on all levels of the spatiotemporal hierarchy of the system under study as main information carriers. The tools to extract and analyze the information are power spectra and difference moments (structural functions), which complement the information of each other. The structural function stochastic component is formed exclusively by "jumps" of the dynamic variable while the power spectrum stochastic component is formed by both spikes and "jumps" on every level of the hierarchy. The information "passport" characteristics that are determined by fitting the derived expressions to the experimental variations for the stochastic components of power spectra and structural functions are interpreted as the correlation times and parameters that describe the rate of "memory loss" on these correlation time intervals for different irregularities. The number of the extracted parameters is determined by the requirements of the problem under study. Application of this approach to the analysis of tremor velocity signals for a Parkinsonian patient is discussed.
Ad-Hoc Queries over Document Collections - A Case Study
NASA Astrophysics Data System (ADS)
Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker
We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
LiDAR Vegetation Investigation and Signature Analysis System (LVISA)
NASA Astrophysics Data System (ADS)
Höfle, Bernhard; Koenig, Kristina; Griesbaum, Luisa; Kiefer, Andreas; Hämmerle, Martin; Eitel, Jan; Koma, Zsófia
2015-04-01
Our physical environment undergoes constant changes in space and time with strongly varying triggers, frequencies, and magnitudes. Monitoring these environmental changes is crucial to improve our scientific understanding of complex human-environmental interactions and helps us to respond to environmental change by adaptation or mitigation. The three-dimensional (3D) description of the Earth surface features and the detailed monitoring of surface processes using 3D spatial data have gained increasing attention within the last decades, such as in climate change research (e.g., glacier retreat), carbon sequestration (e.g., forest biomass monitoring), precision agriculture and natural hazard management. In all those areas, 3D data have helped to improve our process understanding by allowing quantifying the structural properties of earth surface features and their changes over time. This advancement has been fostered by technological developments and increased availability of 3D sensing systems. In particular, LiDAR (light detection and ranging) technology, also referred to as laser scanning, has made significant progress and has evolved into an operational tool in environmental research and geosciences. The main result of LiDAR measurements is a highly spatially resolved 3D point cloud. Each point within the LiDAR point cloud has a XYZ coordinate associated with it and often additional information such as the strength of the returned backscatter. The point cloud provided by LiDAR contains rich geospatial, structural, and potentially biochemical information about the surveyed objects. To deal with the inherently unorganized datasets and the large data volume (frequently millions of XYZ coordinates) of LiDAR datasets, a multitude of algorithms for automatic 3D object detection (e.g., of single trees) and physical surface description (e.g., biomass) have been developed. However, so far the exchange of datasets and approaches (i.e., extraction algorithms) among LiDAR users lacks behind. We propose a novel concept, the LiDAR Vegetation Investigation and Signature Analysis System (LVISA), which shall enhance sharing of i) reference datasets of single vegetation objects with rich reference data (e.g., plant species, basic plant morphometric information) and ii) approaches for information extraction (e.g., single tree detection, tree species classification based on waveform LiDAR features). We will build an extensive LiDAR data repository for supporting the development and benchmarking of LiDAR-based object information extraction. The LiDAR Vegetation Investigation and Signature Analysis System (LVISA) uses international web service standards (Open Geospatial Consortium, OGC) for geospatial data access and also analysis (e.g., OGC Web Processing Services). This will allow the research community identifying plant object specific vegetation features from LiDAR data, while accounting for differences in LiDAR systems (e.g., beam divergence), settings (e.g., point spacing), and calibration techniques. It is the goal of LVISA to develop generic 3D information extraction approaches, which can be seamlessly transferred to other datasets, timestamps and also extraction tasks. The current prototype of LVISA can be visited and tested online via http://uni-heidelberg.de/lvisa. Video tutorials provide a quick overview and entry into the functionality of LVISA. We will present the current advances of LVISA and we will highlight future research and extension of LVISA, such as integrating low-cost LiDAR data and datasets acquired by highly temporal scanning of vegetation (e.g., continuous measurements). Everybody is invited to join the LVISA development and share datasets and analysis approaches in an interoperable way via the web-based LVISA geoportal.
Enhancing acronym/abbreviation knowledge bases with semantic information.
Torii, Manabu; Liu, Hongfang
2007-10-11
In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus.
NASA's online machine aided indexing system
NASA Technical Reports Server (NTRS)
Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.
1993-01-01
This report describes the NASA Lexical Dictionary, a machine aided indexing system used online at the National Aeronautics and Space Administration's Center for Aerospace Information (CASI). This system is comprised of a text processor that is based on the computational, non-syntactic analysis of input text, and an extensive 'knowledge base' that serves to recognize and translate text-extracted concepts. The structure and function of the various NLD system components are described in detail. Methods used for the development of the knowledge base are discussed. Particular attention is given to a statistically-based text analysis program that provides the knowledge base developer with a list of concept-specific phrases extracted from large textual corpora. Production and quality benefits resulting from the integration of machine aided indexing at CASI are discussed along with a number of secondary applications of NLD-derived systems including on-line spell checking and machine aided lexicography.
PDB Editor: a user-friendly Java-based Protein Data Bank file editor with a GUI.
Lee, Jonas; Kim, Sung Hou
2009-04-01
The Protein Data Bank file format is the format most widely used by protein crystallographers and biologists to disseminate and manipulate protein structures. Despite this, there are few user-friendly software packages available to efficiently edit and extract raw information from PDB files. This limitation often leads to many protein crystallographers wasting significant time manually editing PDB files. PDB Editor, written in Java Swing GUI, allows the user to selectively search, select, extract and edit information in parallel. Furthermore, the program is a stand-alone application written in Java which frees users from the hassles associated with platform/operating system-dependent installation and usage. PDB Editor can be downloaded from http://sourceforge.net/projects/pdbeditorjl/.
Smelling time: a neural basis for olfactory scene analysis
Ache, Barry W.; Hein, Andrew M.; Bobkov, Yuriy V.; Principe, Jose C.
2016-01-01
Behavioral evidence from phylogenetically diverse animals and humans suggests that olfaction could be much more involved in interpreting space and time than heretofore imagined by extracting temporal information inherent in the olfactory signal. If this is the case, the olfactory system must have neural mechanisms capable of encoding time at intervals relevant to the turbulent odor world in which many animals live. We review evidence that animals can use populations of rhythmically active or ‘bursting’ olfactory receptor neurons (bORNs) to extract and encode temporal information inherent in natural olfactory signals. We postulate that bORNs represent an unsuspected neural mechanism through which time can be accurately measured, and that ‘smelling time’ completes the requirements for true olfactory scene analysis. PMID:27594700
The Agent of extracting Internet Information with Lead Order
NASA Astrophysics Data System (ADS)
Mo, Zan; Huang, Chuliang; Liu, Aijun
In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.
Optimal Correlations in Many-Body Quantum Systems
NASA Astrophysics Data System (ADS)
Amico, L.; Rossini, D.; Hamma, A.; Korepin, V. E.
2012-06-01
Information and correlations in a quantum system are closely related through the process of measurement. We explore such relation in a many-body quantum setting, effectively bridging between quantum metrology and condensed matter physics. To this aim we adopt the information-theory view of correlations and study the amount of correlations after certain classes of positive-operator-valued measurements are locally performed. As many-body systems, we consider a one-dimensional array of interacting two-level systems (a spin chain) at zero temperature, where quantum effects are most pronounced. We demonstrate how the optimal strategy to extract the correlations depends on the quantum phase through a subtle interplay between local interactions and coherence.
Smart Networked Elements in Support of ISHM
NASA Technical Reports Server (NTRS)
Oostdyk, Rebecca; Mata, Carlos; Perotti, Jose M.
2008-01-01
At the core of ISHM is the ability to extract information and knowledge from raw data. Conventional data acquisition systems sample and convert physical measurements to engineering units, which higher-level systems use to derive health and information about processes and systems. Although health management is essential at the top level, there are considerable advantages to implementing health-related functions at the sensor level. The distribution of processing to lower levels reduces bandwidth requirements, enhances data fusion, and improves the resolution for detection and isolation of failures in a system, subsystem, component, or process. The Smart Networked Element (SNE) has been developed to implement intelligent functions and algorithms at the sensor level in support of ISHM.
NASA Astrophysics Data System (ADS)
Ohar, Orest P.; Lizotte, Todd E.
2009-08-01
Over the years law enforcement has become increasingly complex, driving a need for a better level of organization of knowledge within policing. The use of COMPSTAT or other Geospatial Information Systems (GIS) for crime mapping and analysis has provided opportunities for careful analysis of crime trends. By identifying hotspots within communities, data collected and entered into these systems can be analyzed to determine how, when and where law enforcement assets can be deployed efficiently. This paper will introduce in detail, a powerful new law enforcement and forensic investigative technology called Intentional Firearm Microstamping (IFM). Once embedded and deployed into firearms, IFM will provide data for identifying and tracking the sources of illegally trafficked firearms within the borders of the United States and across the border with Mexico. Intentional Firearm Microstamping is a micro code technology that leverages a laser based micromachining process to form optimally located, microscopic "intentional structures and marks" on components within a firearm. Thus when the firearm is fired, these IFM structures transfer an identifying tracking code onto the expended cartridge that is ejected from the firearm. Intentional Firearm Microstamped structures are laser micromachined alpha numeric and encoded geometric tracking numbers, linked to the serial number of the firearm. IFM codes can be extracted quickly and used without the need to recover the firearm. Furthermore, through the process of extraction, IFM codes can be quantitatively verified to a higher level of certainty as compared to traditional forensic matching techniques. IFM provides critical intelligence capable of identifying straw purchasers, trafficking routes and networks across state borders and can be used on firearms illegally exported across international borders. This paper will outline IFM applications for supporting intelligence led policing initiatives, IFM implementation strategies, describe the how IFM overcomes the firearms stochastic properties and explain the code extraction technologies that can be used by forensic investigators and discuss the applications where the extracted data will benefit geospatial information systems for forensic intelligence benefit.
Extracting local information from crowds through betting markets
NASA Astrophysics Data System (ADS)
Weijs, Steven
2015-04-01
In this research, a set-up is considered in which users can bet against a forecasting agency to challenge their probabilistic forecasts. From an information theory standpoint, a reward structure is considered that either provides the forecasting agency with better information, paying the successful providers of information for their winning bets, or funds excellent forecasting agencies through users that think they know better. Especially for local forecasts, the approach may help to diagnose model biases and to identify local predictive information that can be incorporated in the models. The challenges and opportunities for implementing such a system in practice are also discussed.
IMAGE 100: The interactive multispectral image processing system
NASA Technical Reports Server (NTRS)
Schaller, E. S.; Towles, R. W.
1975-01-01
The need for rapid, cost-effective extraction of useful information from vast quantities of multispectral imagery available from aircraft or spacecraft has resulted in the design, implementation and application of a state-of-the-art processing system known as IMAGE 100. Operating on the general principle that all objects or materials possess unique spectral characteristics or signatures, the system uses this signature uniqueness to identify similar features in an image by simultaneously analyzing signatures in multiple frequency bands. Pseudo-colors, or themes, are assigned to features having identical spectral characteristics. These themes are displayed on a color CRT, and may be recorded on tape, film, or other media. The system was designed to incorporate key features such as interactive operation, user-oriented displays and controls, and rapid-response machine processing. Owing to these features, the user can readily control and/or modify the analysis process based on his knowledge of the input imagery. Effective use can be made of conventional photographic interpretation skills and state-of-the-art machine analysis techniques in the extraction of useful information from multispectral imagery. This approach results in highly accurate multitheme classification of imagery in seconds or minutes rather than the hours often involved in processing using other means.
NASA Technical Reports Server (NTRS)
Lempriere, B. M.
1987-01-01
The procedures and results of a study of a conceptual system for measuring the debris environment on the space station is discussed. The study was conducted in two phases: the first consisted of experiments aimed at evaluating location of impact through panel response data collected from acoustic emission sensors; the second analyzed the available statistical description of the environment to determine the probability of the measurement system producing useful data, and analyzed the results of the previous tests to evaluate the accuracy of location and the feasibility of extracting impactor characteristics from the panel response. The conclusions were that for one panel the system would not be exposed to any event, but that the entire Logistics Module would provide a modest amount of data. The use of sensors with higher sensitivity than those used in the tests could be advantageous. The impact location could be found with sufficient accuracy from panel response data. The waveforms of the response were shown to contain information on the impact characteristics, but the data set did not span a sufficient range of the variables necessary to evaluate the feasibility of extracting the information.