Sample records for extract specific information

  1. Automated generation of individually customized visualizations of diagnosis-specific medical information using novel techniques of information extraction

    NASA Astrophysics Data System (ADS)

    Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang

    2005-04-01

    Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.

  2. Information extraction during simultaneous motion processing.

    PubMed

    Rideaux, Reuben; Edwards, Mark

    2014-02-01

    When confronted with multiple moving objects the visual system can process them in two stages: an initial stage in which a limited number of signals are processed in parallel (i.e. simultaneously) followed by a sequential stage. We previously demonstrated that during the simultaneous stage, observers could discriminate between presentations containing up to 5 vs. 6 spatially localized motion signals (Edwards & Rideaux, 2013). Here we investigate what information is actually extracted during the simultaneous stage and whether the simultaneous limit varies with the detail of information extracted. This was achieved by measuring the ability of observers to extract varied information from low detail, i.e. the number of signals presented, to high detail, i.e. the actual directions present and the direction of a specific element, during the simultaneous stage. The results indicate that the resolution of simultaneous processing varies as a function of the information which is extracted, i.e. as the information extraction becomes more detailed, from the number of moving elements to the direction of a specific element, the capacity to process multiple signals is reduced. Thus, when assigning a capacity to simultaneous motion processing, this must be qualified by designating the degree of information extraction. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  3. Thinking graphically: Connecting vision and cognition during graph comprehension.

    PubMed

    Ratwani, Raj M; Trafton, J Gregory; Boehm-Davis, Deborah A

    2008-03-01

    Task analytic theories of graph comprehension account for the perceptual and conceptual processes required to extract specific information from graphs. Comparatively, the processes underlying information integration have received less attention. We propose a new framework for information integration that highlights visual integration and cognitive integration. During visual integration, pattern recognition processes are used to form visual clusters of information; these visual clusters are then used to reason about the graph during cognitive integration. In 3 experiments, the processes required to extract specific information and to integrate information were examined by collecting verbal protocol and eye movement data. Results supported the task analytic theories for specific information extraction and the processes of visual and cognitive integration for integrative questions. Further, the integrative processes scaled up as graph complexity increased, highlighting the importance of these processes for integration in more complex graphs. Finally, based on this framework, design principles to improve both visual and cognitive integration are described. PsycINFO Database Record (c) 2008 APA, all rights reserved

  4. Considerations on the Optimal and Efficient Processing of Information-Bearing Signals

    ERIC Educational Resources Information Center

    Harms, Herbert Andrew

    2013-01-01

    Noise is a fundamental hurdle that impedes the processing of information-bearing signals, specifically the extraction of salient information. Processing that is both optimal and efficient is desired; optimality ensures the extracted information has the highest fidelity allowed by the noise, while efficiency ensures limited resource usage. Optimal…

  5. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system.

    PubMed

    Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C; Xu, Hua

    2014-01-01

    Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.

  6. An information extraction framework for cohort identification using electronic health records.

    PubMed

    Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G

    2013-01-01

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.

  7. An Information Extraction Framework for Cohort Identification Using Electronic Health Records

    PubMed Central

    Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255

  8. Automatic seizure detection based on the combination of newborn multi-channel EEG and HRV information

    NASA Astrophysics Data System (ADS)

    Mesbah, Mostefa; Balakrishnan, Malarvili; Colditz, Paul B.; Boashash, Boualem

    2012-12-01

    This article proposes a new method for newborn seizure detection that uses information extracted from both multi-channel electroencephalogram (EEG) and a single channel electrocardiogram (ECG). The aim of the study is to assess whether additional information extracted from ECG can improve the performance of seizure detectors based solely on EEG. Two different approaches were used to combine this extracted information. The first approach, known as feature fusion, involves combining features extracted from EEG and heart rate variability (HRV) into a single feature vector prior to feeding it to a classifier. The second approach, called classifier or decision fusion, is achieved by combining the independent decisions of the EEG and the HRV-based classifiers. Tested on recordings obtained from eight newborns with identified EEG seizures, the proposed neonatal seizure detection algorithms achieved 95.20% sensitivity and 88.60% specificity for the feature fusion case and 95.20% sensitivity and 94.30% specificity for the classifier fusion case. These results are considerably better than those involving classifiers using EEG only (80.90%, 86.50%) or HRV only (85.70%, 84.60%).

  9. Extracting Information from Narratives: An Application to Aviation Safety Reports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Posse, Christian; Matzke, Brett D.; Anderson, Catherine M.

    2005-05-12

    Aviation safety reports are the best available source of information about why a flight incident happened. However, stream of consciousness permeates the narratives making difficult the automation of the information extraction task. We propose an approach and infrastructure based on a common pattern specification language to capture relevant information via normalized template expression matching in context. Template expression matching handles variants of multi-word expressions. Normalization improves the likelihood of correct hits by standardizing and cleaning the vocabulary used in narratives. Checking for the presence of negative modifiers in the proximity of a potential hit reduces the chance of false hits.more » We present the above approach in the context of a specific application, which is the extraction of human performance factors from NASA ASRS reports. While knowledge infusion from experts plays a critical role during the learning phase, early results show that in a production mode, the automated process provides information that is consistent with analyses by human subjects.« less

  10. Dual-wavelength phase-shifting digital holography selectively extracting wavelength information from wavelength-multiplexed holograms.

    PubMed

    Tahara, Tatsuki; Mori, Ryota; Kikunaga, Shuhei; Arai, Yasuhiko; Takaki, Yasuhiro

    2015-06-15

    Dual-wavelength phase-shifting digital holography that selectively extracts wavelength information from five wavelength-multiplexed holograms is presented. Specific phase shifts for respective wavelengths are introduced to remove the crosstalk components and extract only the object wave at the desired wavelength from the holograms. Object waves in multiple wavelengths are selectively extracted by utilizing 2π ambiguity and the subtraction procedures based on phase-shifting interferometry. Numerical results show the validity of the proposed technique. The proposed technique is also experimentally demonstrated.

  11. Extracting and standardizing medication information in clinical text – the MedEx-UIMA system

    PubMed Central

    Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C.; Xu, Hua

    2014-01-01

    Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/. PMID:25954575

  12. MedXN: an open source medication extraction and normalization tool for clinical text

    PubMed Central

    Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang

    2014-01-01

    Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954

  13. Effects of band selection on endmember extraction for forestry applications

    NASA Astrophysics Data System (ADS)

    Karathanassi, Vassilia; Andreou, Charoula; Andronis, Vassilis; Kolokoussis, Polychronis

    2014-10-01

    In spectral unmixing theory, data reduction techniques play an important role as hyperspectral imagery contains an immense amount of data, posing many challenging problems such as data storage, computational efficiency, and the so called "curse of dimensionality". Feature extraction and feature selection are the two main approaches for dimensionality reduction. Feature extraction techniques are used for reducing the dimensionality of the hyperspectral data by applying transforms on hyperspectral data. Feature selection techniques retain the physical meaning of the data by selecting a set of bands from the input hyperspectral dataset, which mainly contain the information needed for spectral unmixing. Although feature selection techniques are well-known for their dimensionality reduction potentials they are rarely used in the unmixing process. The majority of the existing state-of-the-art dimensionality reduction methods set criteria to the spectral information, which is derived by the whole wavelength, in order to define the optimum spectral subspace. These criteria are not associated with any particular application but with the data statistics, such as correlation and entropy values. However, each application is associated with specific land c over materials, whose spectral characteristics present variations in specific wavelengths. In forestry for example, many applications focus on tree leaves, in which specific pigments such as chlorophyll, xanthophyll, etc. determine the wavelengths where tree species, diseases, etc., can be detected. For such applications, when the unmixing process is applied, the tree species, diseases, etc., are considered as the endmembers of interest. This paper focuses on investigating the effects of band selection on the endmember extraction by exploiting the information of the vegetation absorbance spectral zones. More precisely, it is explored whether endmember extraction can be optimized when specific sets of initial bands related to leaf spectral characteristics are selected. Experiments comprise application of well-known signal subspace estimation and endmember extraction methods on a hyperspectral imagery that presents a forest area. Evaluation of the extracted endmembers showed that more forest species can be extracted as endmembers using selected bands.

  14. Extraction of Vertical Profiles of Atmospheric Variables from Gridded Binary, Edition 2 (GRIB2) Model Output Files

    DTIC Science & Technology

    2018-01-18

    processing. Specifically, the method described herein uses wgrib2 commands along with a Python script or program to produce tabular text files that in...It makes use of software that is readily available and can be implemented on many computer systems combined with relatively modest additional...example), extracts appropriate information, and lists the extracted information in a readable tabular form. The Python script used here is described in

  15. ccML, a new mark-up language to improve ISO/EN 13606-based electronic health record extracts practical edition.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Cáceres, Jesús; Somolinos, Roberto; Pascual, Mario; Martínez, Ignacio; Salvador, Carlos H; Monteagudo, José Luis

    2013-01-01

    The objective of this paper is to introduce a new language called ccML, designed to provide convenient pragmatic information to applications using the ISO/EN13606 reference model (RM), such as electronic health record (EHR) extracts editors. EHR extracts are presently built using the syntactic and semantic information provided in the RM and constrained by archetypes. The ccML extra information enables the automation of the medico-legal context information edition, which is over 70% of the total in an extract, without modifying the RM information. ccML is defined using a W3C XML schema file. Valid ccML files complement the RM with additional pragmatics information. The ccML language grammar is defined using formal language theory as a single-type tree grammar. The new language is tested using an EHR extracts editor application as proof-of-concept system. Seven ccML PVCodes (predefined value codes) are introduced in this grammar to cope with different realistic EHR edition situations. These seven PVCodes have different interpretation strategies, from direct look up in the ccML file itself, to more complex searches in archetypes or system precomputation. The possibility to declare generic types in ccML gives rise to ambiguity during interpretation. The criterion used to overcome ambiguity is that specificity should prevail over generality. The opposite would make the individual specific element declarations useless. A new mark-up language ccML is introduced that opens up the possibility of providing applications using the ISO/EN13606 RM with the necessary pragmatics information to be practical and realistic.

  16. Construction of Green Tide Monitoring System and Research on its Key Techniques

    NASA Astrophysics Data System (ADS)

    Xing, B.; Li, J.; Zhu, H.; Wei, P.; Zhao, Y.

    2018-04-01

    As a kind of marine natural disaster, Green Tide has been appearing every year along the Qingdao Coast, bringing great loss to this region, since the large-scale bloom in 2008. Therefore, it is of great value to obtain the real time dynamic information about green tide distribution. In this study, methods of optical remote sensing and microwave remote sensing are employed in Green Tide Monitoring Research. A specific remote sensing data processing flow and a green tide information extraction algorithm are designed, according to the optical and microwave data of different characteristics. In the aspect of green tide spatial distribution information extraction, an automatic extraction algorithm of green tide distribution boundaries is designed based on the principle of mathematical morphology dilation/erosion. And key issues in information extraction, including the division of green tide regions, the obtaining of basic distributions, the limitation of distribution boundary, and the elimination of islands, have been solved. The automatic generation of green tide distribution boundaries from the results of remote sensing information extraction is realized. Finally, a green tide monitoring system is built based on IDL/GIS secondary development in the integrated environment of RS and GIS, achieving the integration of RS monitoring and information extraction.

  17. Information retrieval and terminology extraction in online resources for patients with diabetes.

    PubMed

    Seljan, Sanja; Baretić, Maja; Kucis, Vlasta

    2014-06-01

    Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.

  18. Pathology report data extraction from relational database using R, with extraction from reports on melanoma of skin as an example.

    PubMed

    Ye, Jay J

    2016-01-01

    Different methods have been described for data extraction from pathology reports with varying degrees of success. Here a technique for directly extracting data from relational database is described. Our department uses synoptic reports modified from College of American Pathologists (CAP) Cancer Protocol Templates to report most of our cancer diagnoses. Choosing the melanoma of skin synoptic report as an example, R scripting language extended with RODBC package was used to query the pathology information system database. Reports containing melanoma of skin synoptic report in the past 4 and a half years were retrieved and individual data elements were extracted. Using the retrieved list of the cases, the database was queried a second time to retrieve/extract the lymph node staging information in the subsequent reports from the same patients. 426 synoptic reports corresponding to unique lesions of melanoma of skin were retrieved, and data elements of interest were extracted into an R data frame. The distribution of Breslow depth of melanomas grouped by year is used as an example of intra-report data extraction and analysis. When the new pN staging information was present in the subsequent reports, 82% (77/94) was precisely retrieved (pN0, pN1, pN2 and pN3). Additional 15% (14/94) was retrieved with certain ambiguity (positive or knowing there was an update). The specificity was 100% for both. The relationship between Breslow depth and lymph node status was graphed as an example of lesion-specific multi-report data extraction and analysis. R extended with RODBC package is a simple and versatile approach well-suited for the above tasks. The success or failure of the retrieval and extraction depended largely on whether the reports were formatted and whether the contents of the elements were consistently phrased. This approach can be easily modified and adopted for other pathology information systems that use relational database for data management.

  19. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    PubMed

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  20. BioRAT: extracting biological information from full-length papers.

    PubMed

    Corney, David P A; Buxton, Bernard F; Langdon, William B; Jones, David T

    2004-11-22

    Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.

  1. ccML, a new mark-up language to improve ISO/EN 13606-based electronic health record extracts practical edition

    PubMed Central

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Cáceres, Jesús; Somolinos, Roberto; Pascual, Mario; Martínez, Ignacio; Salvador, Carlos H; Monteagudo, José Luis

    2013-01-01

    Objective The objective of this paper is to introduce a new language called ccML, designed to provide convenient pragmatic information to applications using the ISO/EN13606 reference model (RM), such as electronic health record (EHR) extracts editors. EHR extracts are presently built using the syntactic and semantic information provided in the RM and constrained by archetypes. The ccML extra information enables the automation of the medico-legal context information edition, which is over 70% of the total in an extract, without modifying the RM information. Materials and Methods ccML is defined using a W3C XML schema file. Valid ccML files complement the RM with additional pragmatics information. The ccML language grammar is defined using formal language theory as a single-type tree grammar. The new language is tested using an EHR extracts editor application as proof-of-concept system. Results Seven ccML PVCodes (predefined value codes) are introduced in this grammar to cope with different realistic EHR edition situations. These seven PVCodes have different interpretation strategies, from direct look up in the ccML file itself, to more complex searches in archetypes or system precomputation. Discussion The possibility to declare generic types in ccML gives rise to ambiguity during interpretation. The criterion used to overcome ambiguity is that specificity should prevail over generality. The opposite would make the individual specific element declarations useless. Conclusion A new mark-up language ccML is introduced that opens up the possibility of providing applications using the ISO/EN13606 RM with the necessary pragmatics information to be practical and realistic. PMID:23019241

  2. Technology Demonstration Summary: CF Systems Organics Extraction System, New Bedford Harbor, Massachusetts

    EPA Science Inventory

    The Site Program demonstration of CF Systems' organics extraction technology was conducted to obtain specific operating and cost information that could be used in evaluating the potential applicability of the technology to Superfund sites. The demonstration was conducted concurr...

  3. Table Extraction from Web Pages Using Conditional Random Fields to Extract Toponym Related Data

    NASA Astrophysics Data System (ADS)

    Luthfi Hanifah, Hayyu'; Akbar, Saiful

    2017-01-01

    Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.

  4. Sentiment Analysis Using Common-Sense and Context Information

    PubMed Central

    Mittal, Namita; Bansal, Pooja; Garg, Sonal

    2015-01-01

    Sentiment analysis research has been increasing tremendously in recent times due to the wide range of business and social applications. Sentiment analysis from unstructured natural language text has recently received considerable attention from the research community. In this paper, we propose a novel sentiment analysis model based on common-sense knowledge extracted from ConceptNet based ontology and context information. ConceptNet based ontology is used to determine the domain specific concepts which in turn produced the domain specific important features. Further, the polarities of the extracted concepts are determined using the contextual polarity lexicon which we developed by considering the context information of a word. Finally, semantic orientations of domain specific features of the review document are aggregated based on the importance of a feature with respect to the domain. The importance of the feature is determined by the depth of the feature in the ontology. Experimental results show the effectiveness of the proposed methods. PMID:25866505

  5. Sentiment analysis using common-sense and context information.

    PubMed

    Agarwal, Basant; Mittal, Namita; Bansal, Pooja; Garg, Sonal

    2015-01-01

    Sentiment analysis research has been increasing tremendously in recent times due to the wide range of business and social applications. Sentiment analysis from unstructured natural language text has recently received considerable attention from the research community. In this paper, we propose a novel sentiment analysis model based on common-sense knowledge extracted from ConceptNet based ontology and context information. ConceptNet based ontology is used to determine the domain specific concepts which in turn produced the domain specific important features. Further, the polarities of the extracted concepts are determined using the contextual polarity lexicon which we developed by considering the context information of a word. Finally, semantic orientations of domain specific features of the review document are aggregated based on the importance of a feature with respect to the domain. The importance of the feature is determined by the depth of the feature in the ontology. Experimental results show the effectiveness of the proposed methods.

  6. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

    PubMed Central

    Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

    2016-01-01

    Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. PMID:26911818

  7. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.

    PubMed

    Tahsin, Tasnia; Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

    2016-09-01

    The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. 7 CFR 917.50 - Reports.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... package at which sold, including specific and detailed information relative to all discounts, allowances..., other than the Secretary upon request therefor, data or information obtained or extracted from such... the particular handler from whom received: Provided, That such data and information may be combined...

  9. From Specific Information Extraction to Inferences: A Hierarchical Framework of Graph Comprehension

    DTIC Science & Technology

    2004-09-01

    The skill to interpret the information displayed in graphs is so important to have, the National Council of Teachers of Mathematics has created...guidelines to ensure that students learn these skills ( NCTM : Standards for Mathematics , 2003). These guidelines are based primarily on the extraction of...graphical perception. Human Computer Interaction, 8, 353-388. NCTM : Standards for Mathematics . (2003, 2003). Peebles, D., & Cheng, P. C.-H. (2002

  10. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org

  11. The contribution of the vaccine adverse event text mining system to the classification of possible Guillain-Barré syndrome reports.

    PubMed

    Botsis, T; Woo, E J; Ball, R

    2013-01-01

    We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority.

  12. Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines.

    PubMed

    Raja, Kalpana; Natarajan, Jeyakumar

    2018-07-01

    Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Aviation obstacle auto-extraction using remote sensing information

    NASA Astrophysics Data System (ADS)

    Zimmer, N.; Lugsch, W.; Ravenscroft, D.; Schiefele, J.

    2008-10-01

    An Obstacle, in the aviation context, may be any natural, man-made, fixed or movable object, permanent or temporary. Currently, the most common way to detect relevant aviation obstacles from an aircraft or helicopter for navigation purposes and collision avoidance is the use of merged infrared and synthetic information of obstacle data. Several algorithms have been established to utilize synthetic and infrared images to generate obstacle information. There might be a situation however where the system is error-prone and may not be able to consistently determine the current environment. This situation can be avoided when the system knows the true position of the obstacle. The quality characteristics of the obstacle data strongly depends on the quality of the source data such as maps and official publications. In some countries such as newly industrializing and developing countries, quality and quantity of obstacle information is not available. The aviation world has two specifications - RTCA DO-276A and ICAO ANNEX 15 Ch. 10 - which describe the requirements for aviation obstacles. It is essential to meet these requirements to be compliant with the specifications and to support systems based on these specifications, e.g. 3D obstacle warning systems where accurate coordinates based on WGS-84 is a necessity. Existing aerial and satellite or soon to exist high quality remote sensing data makes it feasible to think about automated aviation obstacle data origination. This paper will describe the feasibility to auto-extract aviation obstacles from remote sensing data considering limitations of image and extraction technologies. Quality parameters and possible resolution of auto-extracted obstacle data will be discussed and presented.

  14. Electronic processing of informed consents in a global pharmaceutical company environment.

    PubMed

    Vishnyakova, Dina; Gobeill, Julien; Oezdemir-Zaech, Fatma; Kreim, Olivier; Vachon, Therese; Clade, Thierry; Haenning, Xavier; Mikhailov, Dmitri; Ruch, Patrick

    2014-01-01

    We present an electronic capture tool to process informed consents, which are mandatory recorded when running a clinical trial. This tool aims at the extraction of information expressing the duration of the consent given by the patient to authorize the exploitation of biomarker-related information collected during clinical trials. The system integrates a language detection module (LDM) to route a document into the appropriate information extraction module (IEM). The IEM is based on language-specific sets of linguistic rules for the identification of relevant textual facts. The achieved accuracy of both the LDM and IEM is 99%. The architecture of the system is described in detail.

  15. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  16. Thinking Graphically: Connecting Vision and Cognition during Graph Comprehension

    ERIC Educational Resources Information Center

    Ratwani, Raj M.; Trafton, J. Gregory; Boehm-Davis, Deborah A.

    2008-01-01

    Task analytic theories of graph comprehension account for the perceptual and conceptual processes required to extract specific information from graphs. Comparatively, the processes underlying information integration have received less attention. We propose a new framework for information integration that highlights visual integration and cognitive…

  17. Systematically Extracting Metal- and Solvent-Related Occupational Information from Free-Text Responses to Lifetime Occupational History Questionnaires

    PubMed Central

    Friesen, Melissa C.; Locke, Sarah J.; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A.; Purdue, Mark; Colt, Joanne S.

    2014-01-01

    Objectives: Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants’ jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Methods: Our study population comprised 2408 subjects, reporting 11991 jobs, from a case–control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert’s independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Results: Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44–51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9–14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Conclusions: Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. PMID:24590110

  18. Generating disease-pertinent treatment vocabularies from MEDLINE citations.

    PubMed

    Wang, Liqin; Del Fiol, Guilherme; Bray, Bruce E; Haug, Peter J

    2017-01-01

    Healthcare communities have identified a significant need for disease-specific information. Disease-specific ontologies are useful in assisting the retrieval of disease-relevant information from various sources. However, building these ontologies is labor intensive. Our goal is to develop a system for an automated generation of disease-pertinent concepts from a popular knowledge resource for the building of disease-specific ontologies. A pipeline system was developed with an initial focus of generating disease-specific treatment vocabularies. It was comprised of the components of disease-specific citation retrieval, predication extraction, treatment predication extraction, treatment concept extraction, and relevance ranking. A semantic schema was developed to support the extraction of treatment predications and concepts. Four ranking approaches (i.e., occurrence, interest, degree centrality, and weighted degree centrality) were proposed to measure the relevance of treatment concepts to the disease of interest. We measured the performance of four ranks in terms of the mean precision at the top 100 concepts with five diseases, as well as the precision-recall curves against two reference vocabularies. The performance of the system was also compared to two baseline approaches. The pipeline system achieved a mean precision of 0.80 for the top 100 concepts with the ranking by interest. There were no significant different among the four ranks (p=0.53). However, the pipeline-based system had significantly better performance than the two baselines. The pipeline system can be useful for an automated generation of disease-relevant treatment concepts from the biomedical literature. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Mapping from Space - Ontology Based Map Production Using Satellite Imageries

    NASA Astrophysics Data System (ADS)

    Asefpour Vakilian, A.; Momeni, M.

    2013-09-01

    Determination of the maximum ability for feature extraction from satellite imageries based on ontology procedure using cartographic feature determination is the main objective of this research. Therefore, a special ontology has been developed to extract maximum volume of information available in different high resolution satellite imageries and compare them to the map information layers required in each specific scale due to unified specification for surveying and mapping. ontology seeks to provide an explicit and comprehensive classification of entities in all sphere of being. This study proposes a new method for automatic maximum map feature extraction and reconstruction of high resolution satellite images. For example, in order to extract building blocks to produce 1 : 5000 scale and smaller maps, the road networks located around the building blocks should be determined. Thus, a new building index has been developed based on concepts obtained from ontology. Building blocks have been extracted with completeness about 83%. Then, road networks have been extracted and reconstructed to create a uniform network with less discontinuity on it. In this case, building blocks have been extracted with proper performance and the false positive value from confusion matrix was reduced by about 7%. Results showed that vegetation cover and water features have been extracted completely (100%) and about 71% of limits have been extracted. Also, the proposed method in this article had the ability to produce a map with largest scale possible from any multi spectral high resolution satellite imagery equal to or smaller than 1 : 5000.

  20. Mapping from Space - Ontology Based Map Production Using Satellite Imageries

    NASA Astrophysics Data System (ADS)

    Asefpour Vakilian, A.; Momeni, M.

    2013-09-01

    Determination of the maximum ability for feature extraction from satellite imageries based on ontology procedure using cartographic feature determination is the main objective of this research. Therefore, a special ontology has been developed to extract maximum volume of information available in different high resolution satellite imageries and compare them to the map information layers required in each specific scale due to unified specification for surveying and mapping. ontology seeks to provide an explicit and comprehensive classification of entities in all sphere of being. This study proposes a new method for automatic maximum map feature extraction and reconstruction of high resolution satellite images. For example, in order to extract building blocks to produce 1 : 5000 scale and smaller maps, the road networks located around the building blocks should be determined. Thus, a new building index has been developed based on concepts obtained from ontology. Building blocks have been extracted with completeness about 83 %. Then, road networks have been extracted and reconstructed to create a uniform network with less discontinuity on it. In this case, building blocks have been extracted with proper performance and the false positive value from confusion matrix was reduced by about 7 %. Results showed that vegetation cover and water features have been extracted completely (100 %) and about 71 % of limits have been extracted. Also, the proposed method in this article had the ability to produce a map with largest scale possible from any multi spectral high resolution satellite imagery equal to or smaller than 1 : 5000.

  1. Development of an Information System for Diploma Works Management

    ERIC Educational Resources Information Center

    Georgieva-Trifonova, Tsvetanka

    2011-01-01

    In this paper, a client/server information system for the management of data and its extraction from a database containing information for diploma works of students is proposed. The developed system provides users the possibility of accessing information about different characteristics of the diploma works, according to their specific interests.…

  2. Component-resolved evaluation of the content of major allergens in therapeutic extracts for specific immunotherapy of honeybee venom allergy

    PubMed Central

    Blank, Simon; Etzold, Stefanie; Darsow, Ulf; Schiener, Maximilian; Eberlein, Bernadette; Russkamp, Dennis; Wolf, Sara; Graessel, Anke; Biedermann, Tilo; Ollert, Markus; Schmidt-Weber, Carsten B.

    2017-01-01

    ABSTRACT Allergen-specific immunotherapy is the only curative treatment of honeybee venom (HBV) allergy, which is able to protect against further anaphylactic sting reactions. Recent analyses on a molecular level have demonstrated that HBV represents a complex allergen source that contains more relevant major allergens than formerly anticipated. Moreover, allergic patients show very diverse sensitization profiles with the different allergens. HBV-specific immunotherapy is conducted with HBV extracts which are derived from pure venom. The allergen content of these therapeutic extracts might differ due to natural variations of the source material or different down-stream processing strategies of the manufacturers. Since variations of the allergen content of therapeutic HBV extracts might be associated with therapeutic failure, we adressed the component-resolved allergen composition of different therapeutic grade HBV extracts which are approved for immunotherapy in numerous countries. The extracts were analyzed for their content of the major allergens Api m 1, Api m 2, Api m 3, Api m 5 and Api m 10. Using allergen-specific antibodies we were able to demonstrate the underrepresentation of relevant major allergens such as Api m 3, Api m 5 and Api m 10 in particular therapeutic extracts. Taken together, standardization of therapeutic extracts by determination of the total allergenic potency might imply the intrinsic pitfall of losing information about particular major allergens. Moreover, the variable allergen composition of different therapeutic HBV extracts might have an impact on therapy outcome and the clinical management of HBV-allergic patients with specific IgE to particular allergens. PMID:28494206

  3. Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm.

    PubMed

    Khushaba, Rami N; Kodagoda, Sarath; Lal, Sara; Dissanayake, Gamini

    2011-01-01

    Driver drowsiness and loss of vigilance are a major cause of road accidents. Monitoring physiological signals while driving provides the possibility of detecting and warning of drowsiness and fatigue. The aim of this paper is to maximize the amount of drowsiness-related information extracted from a set of electroencephalogram (EEG), electrooculogram (EOG), and electrocardiogram (ECG) signals during a simulation driving test. Specifically, we develop an efficient fuzzy mutual-information (MI)- based wavelet packet transform (FMIWPT) feature-extraction method for classifying the driver drowsiness state into one of predefined drowsiness levels. The proposed method estimates the required MI using a novel approach based on fuzzy memberships providing an accurate-information content-estimation measure. The quality of the extracted features was assessed on datasets collected from 31 drivers on a simulation test. The experimental results proved the significance of FMIWPT in extracting features that highly correlate with the different drowsiness levels achieving a classification accuracy of 95%-- 97% on an average across all subjects.

  4. Missing binary data extraction challenges from Cochrane reviews in mental health and Campbell reviews with implications for empirical research.

    PubMed

    Spineli, Loukia M

    2017-12-01

    Tο report challenges encountered during the extraction process from Cochrane reviews in mental health and Campbell reviews and to indicate their implications on the empirical performance of different methods to handle missingness. We used a collection of meta-analyses on binary outcomes collated from a previous work on missing outcome data. To evaluate the accuracy of their extraction, we developed specific criteria pertaining to the reporting of missing outcome data in systematic reviews. Using the most popular methods to handle missing binary outcome data, we investigated the implications of the accuracy of the extracted meta-analysis on the random-effects meta-analysis results. Of 113 meta-analyses from Cochrane reviews, 60 (53%) were judged as "unclearly" extracted (ie, no information on the outcome of completers but available information on how missing participants were handled) and 42 (37%) as "unacceptably" extracted (ie, no information on the outcome of completers as well as no information on how missing participants were handled). For the remaining meta-analyses, it was judged that data were "acceptably" extracted (ie, information on the completers' outcome was provided for all trials). Overall, "unclear" extraction overestimated the magnitude of the summary odds ratio and the between-study variance and additionally inflated the uncertainty of both meta-analytical parameters. The only eligible Campbell review was judged as "unclear." Depending on the extent of missingness, the reporting quality of the systematic reviews can greatly affect the accuracy of the extracted meta-analyses and by extent, the empirical performance of different methods to handle missingness. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Activities of the Remote Sensing Information Sciences Research Group

    NASA Technical Reports Server (NTRS)

    Estes, J. E.; Botkin, D.; Peuquet, D.; Smith, T.; Star, J. L. (Principal Investigator)

    1984-01-01

    Topics on the analysis and processing of remotely sensed data in the areas of vegetation analysis and modelling, georeferenced information systems, machine assisted information extraction from image data, and artificial intelligence are investigated. Discussions on support field data and specific applications of the proposed technologies are also included.

  6. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

    PubMed Central

    Hunter, Lawrence; Lu, Zhiyong; Firby, James; Baumgartner, William A; Johnson, Helen L; Ogren, Philip V; Cohen, K Bretonnel

    2008-01-01

    Background Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at PMID:18237434

  7. Automatic generation of Web mining environments

    NASA Astrophysics Data System (ADS)

    Cibelli, Maurizio; Costagliola, Gennaro

    1999-02-01

    The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.

  8. KneeTex: an ontology-driven system for information extraction from MRI reports.

    PubMed

    Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

    2015-01-01

    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance. KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

  9. Systematically extracting metal- and solvent-related occupational information from free-text responses to lifetime occupational history questionnaires.

    PubMed

    Friesen, Melissa C; Locke, Sarah J; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A; Purdue, Mark; Colt, Joanne S

    2014-06-01

    Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants' jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Our study population comprised 2408 subjects, reporting 11991 jobs, from a case-control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert's independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44-51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9-14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.

  10. Biologically active extracts with kidney affections applications

    NASA Astrophysics Data System (ADS)

    Pascu (Neagu), Mihaela; Pascu, Daniela-Elena; Cozea, Andreea; Bunaciu, Andrei A.; Miron, Alexandra Raluca; Nechifor, Cristina Aurelia

    2015-12-01

    This paper is aimed to select plant materials rich in bioflavonoid compounds, made from herbs known for their application performances in the prevention and therapy of renal diseases, namely kidney stones and urinary infections (renal lithiasis, nephritis, urethritis, cystitis, etc.). This paper presents a comparative study of the medicinal plant extracts composition belonging to Ericaceae-Cranberry (fruit and leaves) - Vaccinium vitis-idaea L. and Bilberry (fruit) - Vaccinium myrtillus L. Concentrated extracts obtained from medicinal plants used in this work were analyzed from structural, morphological and compositional points of view using different techniques: chromatographic methods (HPLC), scanning electronic microscopy, infrared, and UV spectrophotometry, also by using kinetic model. Liquid chromatography was able to identify the specific compounds of the Ericaceae family, present in all three extracts, arbutosid, as well as specific components of each species, mostly from the class of polyphenols. The identification and quantitative determination of the active ingredients from these extracts can give information related to their therapeutic effects.

  11. Building an automated SOAP classifier for emergency department reports.

    PubMed

    Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

    2012-02-01

    Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.

  12. Efficient feature extraction from wide-area motion imagery by MapReduce in Hadoop

    NASA Astrophysics Data System (ADS)

    Cheng, Erkang; Ma, Liya; Blaisse, Adam; Blasch, Erik; Sheaff, Carolyn; Chen, Genshe; Wu, Jie; Ling, Haibin

    2014-06-01

    Wide-Area Motion Imagery (WAMI) feature extraction is important for applications such as target tracking, traffic management and accident discovery. With the increasing amount of WAMI collections and feature extraction from the data, a scalable framework is needed to handle the large amount of information. Cloud computing is one of the approaches recently applied in large scale or big data. In this paper, MapReduce in Hadoop is investigated for large scale feature extraction tasks for WAMI. Specifically, a large dataset of WAMI images is divided into several splits. Each split has a small subset of WAMI images. The feature extractions of WAMI images in each split are distributed to slave nodes in the Hadoop system. Feature extraction of each image is performed individually in the assigned slave node. Finally, the feature extraction results are sent to the Hadoop File System (HDFS) to aggregate the feature information over the collected imagery. Experiments of feature extraction with and without MapReduce are conducted to illustrate the effectiveness of our proposed Cloud-Enabled WAMI Exploitation (CAWE) approach.

  13. The Contribution of the Vaccine Adverse Event Text Mining System to the Classification of Possible Guillain-Barré Syndrome Reports

    PubMed Central

    Botsis, T.; Woo, E. J.; Ball, R.

    2013-01-01

    Background We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. Objective To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). Methods We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. Results MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. Conclusion For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority. PMID:23650490

  14. Impact of JPEG2000 compression on spatial-spectral endmember extraction from hyperspectral data

    NASA Astrophysics Data System (ADS)

    Martín, Gabriel; Ruiz, V. G.; Plaza, Antonio; Ortiz, Juan P.; García, Inmaculada

    2009-08-01

    Hyperspectral image compression has received considerable interest in recent years. However, an important issue that has not been investigated in the past is the impact of lossy compression on spectral mixture analysis applications, which characterize mixed pixels in terms of a suitable combination of spectrally pure spectral substances (called endmembers) weighted by their estimated fractional abundances. In this paper, we specifically investigate the impact of JPEG2000 compression of hyperspectral images on the quality of the endmembers extracted by algorithms that incorporate both the spectral and the spatial information (useful for incorporating contextual information in the spectral endmember search). The two considered algorithms are the automatic morphological endmember extraction (AMEE) and the spatial spectral endmember extraction (SSEE) techniques. Experimental results are conducted using a well-known data set collected by AVIRIS over the Cuprite mining district in Nevada and with detailed ground-truth information available from U. S. Geological Survey. Our experiments reveal some interesting findings that may be useful to specialists applying spatial-spectral endmember extraction algorithms to compressed hyperspectral imagery.

  15. Information Retrieval Using Hadoop Big Data Analysis

    NASA Astrophysics Data System (ADS)

    Motwani, Deepak; Madan, Madan Lal

    This paper concern on big data analysis which is the cognitive operation of probing huge amounts of information in an attempt to get uncovers unseen patterns. Through Big Data Analytics Applications such as public and private organization sectors have formed a strategic determination to turn big data into cut throat benefit. The primary occupation of extracting value from big data give rise to a process applied to pull information from multiple different sources; this process is known as extract transforms and lode. This paper approach extract information from log files and Research Paper, awareness reduces the efforts for blueprint finding and summarization of document from several positions. The work is able to understand better Hadoop basic concept and increase the user experience for research. In this paper, we propose an approach for analysis log files for finding concise information which is useful and time saving by using Hadoop. Our proposed approach will be applied on different research papers on a specific domain and applied for getting summarized content for further improvement and make the new content.

  16. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  17. Information based universal feature extraction

    NASA Astrophysics Data System (ADS)

    Amiri, Mohammad; Brause, Rüdiger

    2015-02-01

    In many real world image based pattern recognition tasks, the extraction and usage of task-relevant features are the most crucial part of the diagnosis. In the standard approach, they mostly remain task-specific, although humans who perform such a task always use the same image features, trained in early childhood. It seems that universal feature sets exist, but they are not yet systematically found. In our contribution, we tried to find those universal image feature sets that are valuable for most image related tasks. In our approach, we trained a neural network by natural and non-natural images of objects and background, using a Shannon information-based algorithm and learning constraints. The goal was to extract those features that give the most valuable information for classification of visual objects hand-written digits. This will give a good start and performance increase for all other image learning tasks, implementing a transfer learning approach. As result, in our case we found that we could indeed extract features which are valid in all three kinds of tasks.

  18. Program review presentation to Level 1, Interagency Coordination Committee

    NASA Technical Reports Server (NTRS)

    1982-01-01

    Progress in the development of crop inventory technology is reported. Specific topics include the results of a thematic mapper analysis, variable selection studies/early season estimator improvements, the agricultural information system simulator, large unit proportion estimation, and development of common features for multi-satellite information extraction.

  19. Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction.

    PubMed

    Gupta, Shashank; Pawar, Sachin; Ramrakhiyani, Nitin; Palshikar, Girish Keshav; Varma, Vasudeva

    2018-06-13

    Social media is a useful platform to share health-related information due to its vast reach. This makes it a good candidate for public-health monitoring tasks, specifically for pharmacovigilance. We study the problem of extraction of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from Twitter. Medical information extraction from social media is challenging, mainly due to short and highly informal nature of text, as compared to more technical and formal medical reports. Current methods in ADR mention extraction rely on supervised learning methods, which suffer from labeled data scarcity problem. The state-of-the-art method uses deep neural networks, specifically a class of Recurrent Neural Network (RNN) which is Long-Short-Term-Memory network (LSTM). Deep neural networks, due to their large number of free parameters rely heavily on large annotated corpora for learning the end task. But in the real-world, it is hard to get large labeled data, mainly due to the heavy cost associated with the manual annotation. To this end, we propose a novel semi-supervised learning based RNN model, which can leverage unlabeled data also present in abundance on social media. Through experiments we demonstrate the effectiveness of our method, achieving state-of-the-art performance in ADR mention extraction. In this study, we tackle the problem of labeled data scarcity for Adverse Drug Reaction mention extraction from social media and propose a novel semi-supervised learning based method which can leverage large unlabeled corpus available in abundance on the web. Through empirical study, we demonstrate that our proposed method outperforms fully supervised learning based baseline which relies on large manually annotated corpus for a good performance.

  20. Decoding rule search domain in the left inferior frontal gyrus

    PubMed Central

    Babcock, Laura; Vallesi, Antonino

    2018-01-01

    Traditionally, the left hemisphere has been thought to extract mainly verbal patterns of information, but recent evidence has shown that the left Inferior Frontal Gyrus (IFG) is active during inductive reasoning in both the verbal and spatial domains. We aimed to understand whether the left IFG supports inductive reasoning in a domain-specific or domain-general fashion. To do this we used Multi-Voxel Pattern Analysis to decode the representation of domain during a rule search task. Thirteen participants were asked to extract the rule underlying streams of letters presented in different spatial locations. Each rule was either verbal (letters forming words) or spatial (positions forming geometric figures). Our results show that domain was decodable in the left prefrontal cortex, suggesting that this region represents domain-specific information, rather than processes common to the two domains. A replication study with the same participants tested two years later confirmed these findings, though the individual representations changed, providing evidence for the flexible nature of representations. This study extends our knowledge on the neural basis of goal-directed behaviors and on how information relevant for rule extraction is flexibly mapped in the prefrontal cortex. PMID:29547623

  1. Comparison of Three Information Sources for Smoking Information in Electronic Health Records

    PubMed Central

    Wang, Liwei; Ruan, Xiaoyang; Yang, Ping; Liu, Hongfang

    2016-01-01

    OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. PMID:27980387

  2. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov

    PubMed Central

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V

    2016-01-01

    Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. PMID:27013523

  3. Meta-Generalis: A Novel Method for Structuring Information from Radiology Reports

    PubMed Central

    Barbosa, Flavio; Traina, Agma Jucci

    2016-01-01

    Summary Background A structured report for imaging exams aims at increasing the precision in information retrieval and communication between physicians. However, it is more concise than free text and may limit specialists’ descriptions of important findings not covered by pre-defined structures. A computational ontological structure derived from free texts designed by specialists may be a solution for this problem. Therefore, the goal of our study was to develop a methodology for structuring information in radiology reports covering specifications required for the Brazilian Portuguese language, including the terminology to be used. Methods We gathered 1,701 radiological reports of magnetic resonance imaging (MRI) studies of the lumbosacral spine from three different institutions. Techniques of text mining and ontological conceptualization of lexical units extracted were used to structure information. Ten radiologists, specialists in lumbosacral MRI, evaluated the textual superstructure and terminology extracted using an electronic questionnaire. Results The established methodology consists of six steps: 1) collection of radiology reports of a specific MRI examination; 2) textual decomposition; 3) normalization of lexical units; 4) identification of textual superstructures; 5) conceptualization of candidate-terms; and 6) evaluation of superstructures and extracted terminology by experts using an electronic questionnaire. Three different textual superstructures were identified, with terminological variations in the names of their textual categories. The number of candidate-terms conceptualized was 4,183, yielding 727 concepts. There were a total of 13,963 relationships between candidate-terms and concepts and 789 relationships among concepts. Conclusions The proposed methodology allowed structuring information in a more intuitive and practical way. Indications of three textual superstructures, extraction of lexicon units and the normalization and ontologically conceptualization were achieved while maintaining references to their respective categories and free text radiology reports. PMID:27580980

  4. Meta-generalis: A novel method for structuring information from radiology reports.

    PubMed

    Barbosa, Flavio; Traina, Agma Jucci; Muglia, Valdair Francisco

    2016-08-24

    A structured report for imaging exams aims at increasing the precision in information retrieval and communication between physicians. However, it is more concise than free text and may limit specialists' descriptions of important findings not covered by pre-defined structures. A computational ontological structure derived from free texts designed by specialists may be a solution for this problem. Therefore, the goal of our study was to develop a methodology for structuring information in radiology reports covering specifications required for the Brazilian Portuguese language, including the terminology to be used. We gathered 1,701 radiological reports of magnetic resonance imaging (MRI) studies of the lumbosacral spine from three different institutions. Techniques of text mining and ontological conceptualization of lexical units extracted were used to structure information. Ten radiologists, specialists in lumbosacral MRI, evaluated the textual superstructure and terminology extracted using an electronic questionnaire. The established methodology consists of six steps: 1) collection of radiology reports of a specific MRI examination; 2) textual decomposition; 3) normalization of lexical units; 4) identification of textual superstructures; 5) conceptualization of candidate-terms; and 6) evaluation of superstructures and extracted terminology by experts using an electronic questionnaire. Three different textual superstructures were identified, with terminological variations in the names of their textual categories. The number of candidate-terms conceptualized was 4,183, yielding 727 concepts. There were a total of 13,963 relationships between candidate-terms and concepts and 789 relationships among concepts. The proposed methodology allowed structuring information in a more intuitive and practical way. Indications of three textual superstructures, extraction of lexicon units and the normalization and ontologically conceptualization were achieved while maintaining references to their respective categories and free text radiology reports.

  5. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

    PubMed

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua

    2016-07-01

    Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. Data base management system configuration specification. [computer storage devices

    NASA Technical Reports Server (NTRS)

    Neiers, J. W.

    1979-01-01

    The functional requirements and the configuration of the data base management system are described. Techniques and technology which will enable more efficient and timely transfer of useful data from the sensor to the user, extraction of information by the user, and exchange of information among the users are demonstrated.

  7. Information extraction from Italian medical reports: An ontology-driven approach.

    PubMed

    Viani, Natalia; Larizza, Cristiana; Tibollo, Valentina; Napolitano, Carlo; Priori, Silvia G; Bellazzi, Riccardo; Sacchi, Lucia

    2018-03-01

    In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible. The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports. The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results. Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Neuroanatomic organization of sound memory in humans.

    PubMed

    Kraut, Michael A; Pitcock, Jeffery A; Calhoun, Vince; Li, Juan; Freeman, Thomas; Hart, John

    2006-11-01

    The neural interface between sensory perception and memory is a central issue in neuroscience, particularly initial memory organization following perceptual analyses. We used functional magnetic resonance imaging to identify anatomic regions extracting initial auditory semantic memory information related to environmental sounds. Two distinct anatomic foci were detected in the right superior temporal gyrus when subjects identified sounds representing either animals or threatening items. Threatening animal stimuli elicited signal changes in both foci, suggesting a distributed neural representation. Our results demonstrate both category- and feature-specific responses to nonverbal sounds in early stages of extracting semantic memory information from these sounds. This organization allows for these category-feature detection nodes to extract early, semantic memory information for efficient processing of transient sound stimuli. Neural regions selective for threatening sounds are similar to those of nonhuman primates, demonstrating semantic memory organization for basic biological/survival primitives are present across species.

  9. Skin prick testing predicts peanut challenge outcome in previously allergic or sensitized children with low serum peanut-specific IgE antibody concentration.

    PubMed

    Nolan, Richard C; Richmond, Peter; Prescott, Susan L; Mallon, Dominic F; Gong, Grace; Franzmann, Annkathrin M; Naidoo, Rama; Loh, Richard K S

    2007-05-01

    Peanut allergy is transient in some children but it is not clear whether quantitating peanut-specific IgE by Skin Prick Test (SPT) adds additional information to fluorescent-enzyme immunoassay (FEIA) in discriminating between allergic and tolerant children. To investigate whether SPT with a commercial extract or fresh foods adds additional predictive information for peanut challenge in children with a low FEIA (<10 k UA/L) who were previously sensitized, or allergic to peanuts. Children from a hospital-based allergy service who were previously sensitized or allergic to peanuts were invited to undergo a peanut challenge unless they had a serum peanut-specific IgE>10 k UA/L, a previous severe reaction, or a recent reaction to peanuts (within two years). SPT with a commercial extract, raw and roasted saline soaked peanuts was performed immediately prior to open challenge in hospital with increasing quantity of peanuts until total of 26.7 g of peanut was consumed. A positive challenge consisted of an objective IgE mediated reaction occurring during the observation period. 54 children (median age of 6.3 years) were admitted for a challenge. Nineteen challenges were positive, 27 negative, five were indeterminate and three did not proceed after SPT. Commercial and fresh food extracts provided similar diagnostic information. A wheal diameter of >or=7 mm of the commercial extract predicted an allergic outcome with specificity 97%, positive predictive value 93% and sensitivity 83%. There was a tendency for an increase in SPT wheal since initial diagnosis in children who remained allergic to peanuts while it decreased in those with a negative challenge. The outcome of a peanut challenge in peanut sensitized or previously allergic children with a low FEIA can be predicted by SPT. In this cohort, not challenging children with a SPT wheal of >or=7 mm would have avoided 15 of 18 positive challenges and denied a challenge to one out of 27 tolerant children.

  10. Government Information Locator Service (GILS). Draft report to the Information Infrastructure Task Force

    NASA Technical Reports Server (NTRS)

    1994-01-01

    This is a draft report on the Government Information Locator Service (GILS) to the National Information Infrastructure (NII) task force. GILS is designed to take advantage of internetworking technology known as client-server architecture which allows information to be distributed among multiple independent information servers. Two appendices are provided -- (1) A glossary of related terminology and (2) extracts from a draft GILS profile for the use of the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Library Applications.

  11. Consent for third molar tooth extractions in Australia and New Zealand: a review of current practice.

    PubMed

    Badenoch-Jones, E K; Lynham, A J; Loessner, D

    2016-06-01

    Informed consent is the legal requirement to educate a patient about a proposed medical treatment or procedure so that he or she can make informed decisions. The purpose of the study was to examine the current practice for obtaining informed consent for third molar tooth extractions (wisdom teeth) by oral and maxillofacial surgeons in Australia and New Zealand. An online survey was sent to 180 consultant oral and maxillofacial surgeons in Australia and New Zealand. Surgeons were asked to answer (yes/no) whether they routinely warned of a specific risk of third molar tooth extraction in their written consent. Seventy-one replies were received (39%). The only risks that surgeons agreed should be routinely included in written consent were a general warning of infection (not alveolar osteitis), inferior alveolar nerve damage (temporary and permanent) and lingual nerve damage (temporary and permanent). There is significant variability among Australian and New Zealand oral and maxillofacial surgeons regarding risk disclosure for third molar tooth extractions. We aim to improve consistency in consent for third molar extractions by developing an evidence-based consent form. © 2016 Australian Dental Association.

  12. Passive Polarimetric Information Processing for Target Classification

    NASA Astrophysics Data System (ADS)

    Sadjadi, Firooz; Sadjadi, Farzad

    Polarimetric sensing is an area of active research in a variety of applications. In particular, the use of polarization diversity has been shown to improve performance in automatic target detection and recognition. Within the diverse scope of polarimetric sensing, the field of passive polarimetric sensing is of particular interest. This chapter presents several new methods for gathering in formation using such passive techniques. One method extracts three-dimensional (3D) information and surface properties using one or more sensors. Another method extracts scene-specific algebraic expressions that remain unchanged under polariza tion transformations (such as along the transmission path to the sensor).

  13. The use of digital spaceborne SAR data for the delineation of surface features indicative of malaria vector breeding habitats

    NASA Technical Reports Server (NTRS)

    Imhoff, M. L.; Vermillion, C. H.; Khan, F. A.

    1984-01-01

    An investigation to examine the utility of spaceborne radar image data to malaria vector control programs is described. Specific tasks involve an analysis of radar illumination geometry vs information content, the synergy of radar and multispectral data mergers, and automated information extraction techniques.

  14. Using NLP to identify cancer cases in imaging reports drawn from radiology information systems.

    PubMed

    Patrick, Jon; Asgari, Pooyan; Li, Min; Nguyen, Dung

    2013-01-01

    A Natural Language processing (NLP) classifier has been developed for the Victorian and NSW Cancer Registries with the purpose of automatically identifying cancer reports from imaging services, transmitting them to the Registries and then extracting pertinent cancer information. Large scale trials conducted on over 40,000 reports show the sensitivity for identifying reportable cancer reports is above 98% with a specificity above 96%. Detection of tumour stream, report purpose, and a variety of extracted content is generally above 90% specificity. The differences between report layout and authoring strategies across imaging services appear to require different classifiers to retain this high level of accuracy. Linkage of the imaging data with existing registry records (hospital and pathology reports) to derive stage and recurrence of cancer has commenced and shown very promising results.

  15. Language translation, doman specific languages and ANTLR

    NASA Technical Reports Server (NTRS)

    Craymer, Loring; Parr, Terence

    2002-01-01

    We will discuss the features of ANTLR that make it an attractive tool for rapid developement of domain specific language translators and present some practical examples of its use: extraction of information from the Cassini Command Language specification, the processing of structured binary data, and IVL--an English-like language for generating VRML scene graph, which is used in configuring the jGuru.com server.

  16. Evaluating Health Information Systems Using Ontologies

    PubMed Central

    Anderberg, Peter; Larsson, Tobias C; Fricker, Samuel A; Berglund, Johan

    2016-01-01

    Background There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems. Objectives The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems—whether similar or heterogeneous—by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework. Methods On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries. Results The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project. Conclusions The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems. PMID:27311735

  17. Evaluating Health Information Systems Using Ontologies.

    PubMed

    Eivazzadeh, Shahryar; Anderberg, Peter; Larsson, Tobias C; Fricker, Samuel A; Berglund, Johan

    2016-06-16

    There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems. The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems-whether similar or heterogeneous-by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework. On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries. The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project. The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems.

  18. Terrain Extraction by Integrating Terrestrial Laser Scanner Data and Spectral Information

    NASA Astrophysics Data System (ADS)

    Lau, C. L.; Halim, S.; Zulkepli, M.; Azwan, A. M.; Tang, W. L.; Chong, A. K.

    2015-10-01

    The extraction of true terrain points from unstructured laser point cloud data is an important process in order to produce an accurate digital terrain model (DTM). However, most of these spatial filtering methods just utilizing the geometrical data to discriminate the terrain points from nonterrain points. The point cloud filtering method also can be improved by using the spectral information available with some scanners. Therefore, the objective of this study is to investigate the effectiveness of using the three-channel (red, green and blue) of the colour image captured from built-in digital camera which is available in some Terrestrial Laser Scanner (TLS) for terrain extraction. In this study, the data acquisition was conducted at a mini replica landscape in Universiti Teknologi Malaysia (UTM), Skudai campus using Leica ScanStation C10. The spectral information of the coloured point clouds from selected sample classes are extracted for spectral analysis. The coloured point clouds which within the corresponding preset spectral threshold are identified as that specific feature point from the dataset. This process of terrain extraction is done through using developed Matlab coding. Result demonstrates that a higher spectral resolution passive image is required in order to improve the output. This is because low quality of the colour images captured by the sensor contributes to the low separability in spectral reflectance. In conclusion, this study shows that, spectral information is capable to be used as a parameter for terrain extraction.

  19. Data on DNA gel sample load, gel electrophoresis, PCR and cost analysis.

    PubMed

    Kuhn, Ramona; Böllmann, Jörg; Krahl, Kathrin; Bryant, Isaac Mbir; Martienssen, Marion

    2018-02-01

    The data presented in this article provide supporting information to the related research article "Comparison of ten different DNA extraction procedures with respect to their suitability for environmental samples" (Kuhn et al., 2017) [1]. In that article, we compared the suitability of ten selected DNA extraction methods based on DNA quality, purity, quantity and applicability to universal PCR. Here we provide the data on the specific DNA gel sample load, all unreported gel images of crude DNA and PCR results, and the complete cost analysis for all tested extraction procedures and in addition two commercial DNA extraction kits for soil and water.

  20. Extraction of reduced alteration information based on Aster data: a case study of the Bashibulake uranium ore district

    NASA Astrophysics Data System (ADS)

    Ye, Fa-wang; Liu, De-chang

    2008-12-01

    Practices of sandstone-type uranium exploration in recent years in China indicate that the uranium mineralization alteration information is of great importance for selecting a new uranium target or prospecting in outer area of the known uranium ore district. Taking a case study of BASHIBULAKE uranium ore district, this paper mainly presents the technical minds and methods of extracting the reduced alteration information by oil and gas in BASHIBULAKE ore district using ASTER data. First, the regional geological setting and study status in BASHIBULAKE uranium ore district are introduced in brief. Then, the spectral characteristics of altered sandstone and un-altered sandstone in BASHIBULAKE ore district are analyzed deeply. Based on the spectral analysis, two technical minds to extract the remote sensing reduced alteration information are proposed, and the un-mixing method is introduced to process ASTER data to extract the reduced alteration information in BASHIBULAKE ore district. From the enhanced images, three remote sensing anomaly zones are discovered, and their geological and prospecting significances are further made sure by taking the advantages of multi-bands in SWIR of ASTER data. Finally, the distribution and intensity of the reduced alteration information in Cretaceous system and its relationship with the genesis of uranium deposit are discussed, the specific suggestions for uranium prospecting orientation in outer of BASHIBULAKE ore district are also proposed.

  1. Note: Sound recovery from video using SVD-based information extraction

    NASA Astrophysics Data System (ADS)

    Zhang, Dashan; Guo, Jie; Lei, Xiujun; Zhu, Chang'an

    2016-08-01

    This note reports an efficient singular value decomposition (SVD)-based vibration extraction approach that recovers sound information in silent high-speed video. A high-speed camera of which frame rates are in the range of 2 kHz-10 kHz is applied to film the vibrating objects. Sub-images cut from video frames are transformed into column vectors and then reconstructed to a new matrix. The SVD of the new matrix produces orthonormal image bases (OIBs) and image projections onto specific OIB can be recovered as understandable acoustical signals. Standard frequencies of 256 Hz and 512 Hz tuning forks are extracted offline from their vibrating surfaces and a 3.35 s speech signal is recovered online from a piece of paper that is stimulated by sound waves within 1 min.

  2. An extractables/leachables strategy facilitated by collaboration between drug product vendors and plastic material/system suppliers.

    PubMed

    Jenke, Dennis

    2007-01-01

    Leaching of plastic materials, packaging, or containment systems by finished drug products and/or their related solutions can happen when contact occurs between such materials, systems, and products. While the drug product vendor has the regulatory/legal responsibility to demonstrate that such leaching does not affect the safety, efficacy, and/or compliance of the finished drug product, the plastic's supplier can facilitate that demonstration by providing the drug product vendor with appropriate and relevant information. Although it is a reasonable expectation that suppliers would possess and share such facilitating information, it is not reasonable for vendors to expect suppliers to (1) reveal confidential information without appropriate safeguards and (2) possess information specific to the vendor's finished drug product. Any potential conflict between the vendor's desire for information and the supplier's willingness to either generate or supply such information can be mitigated if the roles and responsibilities of these two stakeholders are established up front. The vendor of the finished drug product is responsible for supplying regulators with a full and complete leachables assessment for its finished drug product. To facilitate (but not take the place of) the vendor's leachables assessment, suppliers of the materials, components, or systems can provide the vendor with a full and complete extractables assessment for their material/system. The vendor and supplier share the responsibility for reconciling or correlating the extractables and leachables data. While this document establishes the components of a full and complete extractables assessment, specifying the detailed process by which a full and complete extractables assessment is performed is beyond its scope.

  3. The identification of clinically important elements within medical journal abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR).

    PubMed

    Dawes, Martin; Pluye, Pierre; Shea, Laura; Grad, Roland; Greenberg, Arlene; Nie, Jian-Yun

    2007-01-01

    Information retrieval in primary care is becoming more difficult as the volume of medical information held in electronic databases expands. The lexical structure of this information might permit automatic indexing and improved retrieval. To determine the possibility of identifying the key elements of clinical studies, namely Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR), from abstracts of medical journals. We used a convenience sample of 20 synopses from the journal Evidence-Based Medicine (EBM) and their matching original journal article abstracts obtained from PubMed. Three independent primary care professionals identified PECODR-related extracts of text. Rules were developed to define each PECODR element and the selection process of characters, words, phrases and sentences. From the extracts of text related to PECODR elements, potential lexical patterns that might help identify those elements were proposed and assessed using NVivo software. A total of 835 PECODR-related text extracts containing 41,263 individual text characters were identified from 20 EBM journal synopses. There were 759 extracts in the corresponding PubMed abstracts containing 31,947 characters. PECODR elements were found in nearly all abstracts and synopses with the exception of duration. There was agreement on 86.6% of the extracts from the 20 EBM synopses and 85.0% on the corresponding PubMed abstracts. After consensus this rose to 98.4% and 96.9% respectively. We found potential text patterns in the Comparison, Outcome and Results elements of both EBM synopses and PubMed abstracts. Some phrases and words are used frequently and are specific for these elements in both synopses and abstracts. Results suggest a PECODR-related structure exists in medical abstracts and that there might be lexical patterns specific to these elements. More sophisticated computer-assisted lexical-semantic analysis might refine these results, and pave the way to automating PECODR indexing, and improve information retrieval in primary care.

  4. KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process

    NASA Technical Reports Server (NTRS)

    Gettig, Gary A.

    1988-01-01

    Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.

  5. Extracting biomedical events from pairs of text entities

    PubMed Central

    2015-01-01

    Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478

  6. IPAT: a freely accessible software tool for analyzing multiple patent documents with inbuilt landscape visualizer.

    PubMed

    Ajay, Dara; Gangwal, Rahul P; Sangamwar, Abhay T

    2015-01-01

    Intelligent Patent Analysis Tool (IPAT) is an online data retrieval tool, operated based on text mining algorithm to extract specific patent information in a predetermined pattern into an Excel sheet. The software is designed and developed to retrieve and analyze technology information from multiple patent documents and generate various patent landscape graphs and charts. The software is C# coded in visual studio 2010, which extracts the publicly available patent information from the web pages like Google Patent and simultaneously study the various technology trends based on user-defined parameters. In other words, IPAT combined with the manual categorization will act as an excellent technology assessment tool in competitive intelligence and due diligence for predicting the future R&D forecast.

  7. Text-in-context: a method for extracting findings in mixed-methods mixed research synthesis studies.

    PubMed

    Sandelowski, Margarete; Leeman, Jennifer; Knafl, Kathleen; Crandell, Jamie L

    2013-06-01

    Our purpose in this paper is to propose a new method for extracting findings from research reports included in mixed-methods mixed research synthesis studies. International initiatives in the domains of systematic review and evidence synthesis have been focused on broadening the conceptualization of evidence, increased methodological inclusiveness and the production of evidence syntheses that will be accessible to and usable by a wider range of consumers. Initiatives in the general mixed-methods research field have been focused on developing truly integrative approaches to data analysis and interpretation. The data extraction challenges described here were encountered, and the method proposed for addressing these challenges was developed, in the first year of the ongoing (2011-2016) study: Mixed-Methods Synthesis of Research on Childhood Chronic Conditions and Family. To preserve the text-in-context of findings in research reports, we describe a method whereby findings are transformed into portable statements that anchor results to relevant information about sample, source of information, time, comparative reference point, magnitude and significance and study-specific conceptions of phenomena. The data extraction method featured here was developed specifically to accommodate mixed-methods mixed research synthesis studies conducted in nursing and other health sciences, but reviewers might find it useful in other kinds of research synthesis studies. This data extraction method itself constitutes a type of integration to preserve the methodological context of findings when statements are read individually and in comparison to each other. © 2012 Blackwell Publishing Ltd.

  8. Aqueous Chloride Operations Overview: Plutonium and Americium Purification/Recovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gardner, Kyle Shelton; Kimball, David Bryan; Skidmore, Bradley Evan

    These are a set of slides intended for an information session as part of recruiting activities at Brigham Young University. It gives an overview of aqueous chloride operations, specifically on plutonium and americium purification/recovery. This presentation details the steps taken perform these processes, from plutonium size reduction, dissolution, solvent extraction, oxalate precipitation, to calcination. For americium recovery, it details the CLEAR (chloride extraction and actinide recovery) Line, oxalate precipitation and calcination.

  9. Extracting business vocabularies from business process models: SBVR and BPMN standards-based approach

    NASA Astrophysics Data System (ADS)

    Skersys, Tomas; Butleris, Rimantas; Kapocius, Kestutis

    2013-10-01

    Approaches for the analysis and specification of business vocabularies and rules are very relevant topics in both Business Process Management and Information Systems Development disciplines. However, in common practice of Information Systems Development, the Business modeling activities still are of mostly empiric nature. In this paper, basic aspects of the approach for business vocabularies' semi-automated extraction from business process models are presented. The approach is based on novel business modeling-level OMG standards "Business Process Model and Notation" (BPMN) and "Semantics for Business Vocabularies and Business Rules" (SBVR), thus contributing to OMG's vision about Model-Driven Architecture (MDA) and to model-driven development in general.

  10. Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

    PubMed

    Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo

    2017-12-01

    Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures. Source code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr. zheng@itp.ac.cn or dbu@ict.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Multilingual Information Retrieval in Thoracic Radiology: Feasibility Study

    PubMed Central

    Castilla, André Coutinho; Furuie, Sérgio Shiguemi; Mendonça, Eneida A.

    2014-01-01

    Most of essential information contained on Electronic Medical Record is stored as text, imposing several difficulties on automated data extraction and retrieval. Natural language processing is an approach that can unlock clinical information from free texts. The proposed methodology uses the specialized natural language processor MEDLEE developed for English language. To use this processor on Portuguese medical texts, chest x-ray reports were Machine Translated into English. The result of serial coupling of MT an NLP is tagged text which needs further investigation for extracting clinical findings. The objective of this experiment was to investigate normal reports and reports with device description on a set of 165 chest x-ray reports. We obtained sensitivity and specificity of 1 and 0.71 for the first condition and 0.97 and 0.97 for the second respectively. The reference was formed by the opinion of two radiologists. The results of this experiment indicate the viability of extracting clinical findings from chest x-ray reports through coupling MT and NLP. PMID:17911745

  12. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports

    PubMed Central

    Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.

    2010-01-01

    Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations. PMID:21031012

  13. Time dependent calibration of a sediment extraction scheme.

    PubMed

    Roychoudhury, Alakendra N

    2006-04-01

    Sediment extraction methods to quantify metal concentration in aquatic sediments usually present limitations in accuracy and reproducibility because metal concentration in the supernatant is controlled to a large extent by the physico-chemical properties of the sediment that result in a complex interplay between the solid and the solution phase. It is suggested here that standardization of sediment extraction methods using pure mineral phases or reference material is futile and instead the extraction processes should be calibrated using site-specific sediments before their application. For calibration, time dependent release of metals should be observed for each leachate to ascertain the appropriate time for a given extraction step. Although such an approach is tedious and time consuming, using iron extraction as an example, it is shown here that apart from quantitative data such an approach provides additional information on factors that play an intricate role in metal dynamics in the environment. Single step ascorbate, HCl, oxalate and dithionite extractions were used for targeting specific iron phases from saltmarsh sediments and their response was observed over time in order to calibrate the extraction times for each extractant later to be used in a sequential extraction. For surficial sediments, an extraction time of 24 h, 1 h, 2 h and 3 h was ascertained for ascorbate, HCl, oxalate and dithionite extractions, respectively. Fluctuations in iron concentration in the supernatant over time were ubiquitous. The adsorption-desorption behavior is possibly controlled by the sediment organic matter, formation or consumption of active exchange sites during extraction and the crystallinity of iron mineral phase present in the sediments.

  14. CMedTEX: A Rule-based Temporal Expression Extraction and Normalization System for Chinese Clinical Notes.

    PubMed

    Liu, Zengjian; Tang, Buzhou; Wang, Xiaolong; Chen, Qingcai; Li, Haodi; Bu, Junzhao; Jiang, Jingzhi; Deng, Qiwen; Zhu, Suisong

    2016-01-01

    Time is an important aspect of information and is very useful for information utilization. The goal of this study was to analyze the challenges of temporal expression (TE) extraction and normalization in Chinese clinical notes by assessing the performance of a rule-based system developed by us on a manually annotated corpus (including 1,778 clinical notes of 281 hospitalized patients). In order to develop system conveniently, we divided TEs into three categories: direct, indirect and uncertain TEs, and designed different rules for each category of them. Evaluation on the independent test set shows that our system achieves an F-score of93.40% on TE extraction, and an accuracy of 92.58% on TE normalization under "exact-match" criterion. Compared with HeidelTime for Chinese newswire text, our system is much better, indicating that it is necessary to develop a specific TE extraction and normalization system for Chinese clinical notes because of domain difference.

  15. The Contribution of Language-Specific Knowledge in the Selection of Statistically-Coherent Word Candidates

    ERIC Educational Resources Information Center

    Toro, Juan M.; Pons, Ferran; Bion, Ricardo A. H.; Sebastian-Galles, Nuria

    2011-01-01

    Much research has explored the extent to which statistical computations account for the extraction of linguistic information. However, it remains to be studied how language-specific constraints are imposed over these computations. In the present study we investigated if the violation of a word-forming rule in Catalan (the presence of more than one…

  16. Automatic definition of the oncologic EHR data elements from NCIT in OWL.

    PubMed

    Cuggia, Marc; Bourdé, Annabel; Turlin, Bruno; Vincendeau, Sebastien; Bertaud, Valerie; Bohec, Catherine; Duvauferrier, Régis

    2011-01-01

    Semantic interoperability based on ontologies allows systems to combine their information and process them automatically. The ability to extract meaningful fragments from ontology is a key for the ontology re-use and the construction of a subset will help to structure clinical data entries. The aim of this work is to provide a method for extracting a set of concepts for a specific domain, in order to help to define data elements of an oncologic EHR. a generic extraction algorithm was developed to extract, from the NCIT and for a specific disease (i.e. prostate neoplasm), all the concepts of interest into a sub-ontology. We compared all the concepts extracted to the concepts encoded manually contained into the multi-disciplinary meeting report form (MDMRF). We extracted two sub-ontologies: sub-ontology 1 by using a single key concept and sub-ontology 2 by using 5 additional keywords. The coverage of sub-ontology 2 to the MDMRF concepts was 51%. The low rate of coverage is due to the lack of definition or mis-classification of the NCIT concepts. By providing a subset of concepts focused on a particular domain, this extraction method helps at optimizing the binding process of data elements and at maintaining and enriching a domain ontology.

  17. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach

    PubMed Central

    2012-01-01

    Background Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. Methods We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. Results We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. Conclusions We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data. PMID:22759462

  18. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.

    PubMed

    Ratkovic, Zorana; Golik, Wiktoria; Warnier, Pierre

    2012-06-26

    Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data.

  19. Extracting Information about the Initial State from the Black Hole Radiation.

    PubMed

    Lochan, Kinjalk; Padmanabhan, T

    2016-02-05

    The crux of the black hole information paradox is related to the fact that the complete information about the initial state of a quantum field in a collapsing spacetime is not available to future asymptotic observers, belying the expectations from a unitary quantum theory. We study the imprints of the initial quantum state contained in a specific class of distortions of the black hole radiation and identify the classes of in states that can be partially or fully reconstructed from the information contained within. Even for the general in state, we can uncover some specific information. These results suggest that a classical collapse scenario ignores this richness of information in the resulting spectrum and a consistent quantum treatment of the entire collapse process might allow us to retrieve much more information from the spectrum of the final radiation.

  20. Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

    PubMed

    Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

    2015-06-06

    Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.

  1. Evaluation of certain food additives and contaminants. Eightieth report of the Joint FAO/WHO Expert Committee on Food Additives.

    PubMed

    2016-01-01

    This report represents the conclusions of a Joint FAO/WHO Expert Committee convened to evaluate the safety of various food additives and contaminants and to prepare specifications for identity and purity. The first part of the report contains a brief description of general considerations addressed at the meeting, including updates on matters of interest to the work of the Committee. A summary follows of the Committee's evaluations of technical, toxicological and/or dietary exposure data for seven food additives (benzoates; lipase from Fusarium heterosporum expressed in Ogataea polymorpha; magnesium stearate; maltotetraohydrolase from Pseudomonas stutzeri expressed in Bacillus licheniformis; mixed β-glucanase, cellulase and xylanase from Rasamsonia emersonii; mixed β-glucanase and xylanase from Disporotrichum dimorphosporum; polyvinyl alcohol (PVA)- polyethylene glycol (PEG) graft copolymer) and two groups of contaminants (non-dioxin-like polychlorinated biphenyls and pyrrolizidine alkaloids). Specifications for the following food additives were revised or withdrawn: advantame; annatto extracts (solavnt extracted bixin, ad solvent-extracted norbixin); food additives containing aluminium and/or silicon (aluminium silicate; calcium aluminium silicate; calcium silicate; silicon dioxide, amorphous; sodium aluminium silicate); and glycerol ester of gum rosin. Annexed to the report are tables or text summarizing the toxicological and dietary exposure information and information on specifications as well as the Committees recommendations on the food additives and contaminants considered at this meeting.

  2. Automated concept-level information extraction to reduce the need for custom software and rules development.

    PubMed

    D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D

    2011-01-01

    Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.

  3. Remote sensing and extractable biological resources

    NASA Technical Reports Server (NTRS)

    Cronin, L. E.

    1972-01-01

    The nature and quantity of extractable biological resources available in the Chesapeake Bay are discussed. The application of miniaturized radio sensors to track the movement of fish and birds is described. The specific uses of remote sensors for detecting and mapping areas of algae, red tide, thermal pollution, and vegetation beds are presented. The necessity for obtaining information on the physical, chemical, and meteorological features of the entire bay in order to provide improved resources management is emphasized.

  4. Induced lexico-syntactic patterns improve information extraction from online medical forums.

    PubMed

    Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D

    2014-01-01

    To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. The Development of Indonesian Labour Market Information System (LMIS) for Vocational Schools and Industries

    NASA Astrophysics Data System (ADS)

    Parinsi, M. T.; Palilingan, V. R.; Sukardi; Surjono, H. D.

    2018-02-01

    This paper is aim to developing Information system for the labor market which specifically linking vocational schools (SMK) graduates and industries. The methods of this application using Research and Development (R&D) from Borg and Gall conducting in North Sulawesi Province in Indonesia. The result are reliable and acceptable for the graduate students. The Labor Market Information system (LMIS) can help the industries to find a labor/graduates that matched with the company requirement at a real time. SMK may have a benefit by extracting the Information from the application, they can prepare their students for the specific work in the industries. The next development of the application will designed to be available not only for the SMK graduate students.

  6. The utility of an automated electronic system to monitor and audit transfusion practice.

    PubMed

    Grey, D E; Smith, V; Villanueva, G; Richards, B; Augustson, B; Erber, W N

    2006-05-01

    Transfusion laboratories with transfusion committees have a responsibility to monitor transfusion practice and generate improvements in clinical decision-making and red cell usage. However, this can be problematic and expensive because data cannot be readily extracted from most laboratory information systems. To overcome this problem, we developed and introduced a system to electronically extract and collate extensive amounts of data from two laboratory information systems and to link it with ICD10 clinical codes in a new database using standard information technology. Three data files were generated from two laboratory information systems, ULTRA (version 3.2) and TM, using standard information technology scripts. These were patient pre- and post-transfusion haemoglobin, blood group and antibody screen, and cross match and transfusion data. These data together with ICD10 codes for surgical cases were imported into an MS ACCESS database and linked by means of a unique laboratory number. Queries were then run to extract the relevant information and processed in Microsoft Excel for graphical presentation. We assessed the utility of this data extraction system to audit transfusion practice in a 600-bed adult tertiary hospital over an 18-month period. A total of 52 MB of data were extracted from the two laboratory information systems for the 18-month period and together with 2.0 MB theatre ICD10 data enabled case-specific transfusion information to be generated. The audit evaluated 15,992 blood group and antibody screens, 25,344 cross-matched red cell units and 15,455 transfused red cell units. Data evaluated included cross-matched to transfusion ratios and pre- and post-transfusion haemoglobin levels for a range of clinical diagnoses. Data showed significant differences between clinical units and by ICD10 code. This method to electronically extract large amounts of data and linkage with clinical databases has provided a powerful and sustainable tool for monitoring transfusion practice. It has been successfully used to identify areas requiring education, training and clinical guidance and allows for comparison with national haemoglobin-based transfusion guidelines.

  7. Nondeterministic data base for computerized visual perception

    NASA Technical Reports Server (NTRS)

    Yakimovsky, Y.

    1976-01-01

    A description is given of the knowledge representation data base in the perception subsystem of the Mars robot vehicle prototype. Two types of information are stored. The first is generic information that represents general rules that are conformed to by structures in the expected environments. The second kind of information is a specific description of a structure, i.e., the properties and relations of objects in the specific case being analyzed. The generic knowledge is represented so that it can be applied to extract and infer the description of specific structures. The generic model of the rules is substantially a Bayesian representation of the statistics of the environment, which means it is geared to representation of nondeterministic rules relating properties of, and relations between, objects. The description of a specific structure is also nondeterministic in the sense that all properties and relations may take a range of values with an associated probability distribution.

  8. A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature.

    PubMed

    Tchoua, Roselyne B; Chard, Kyle; Audus, Debra; Qin, Jian; de Pablo, Juan; Foster, Ian

    2016-01-01

    A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.

  9. An automatic rat brain extraction method based on a deformable surface model.

    PubMed

    Li, Jiehua; Liu, Xiaofeng; Zhuo, Jiachen; Gullapalli, Rao P; Zara, Jason M

    2013-08-15

    The extraction of the brain from the skull in medical images is a necessary first step before image registration or segmentation. While pre-clinical MR imaging studies on small animals, such as rats, are increasing, fully automatic imaging processing techniques specific to small animal studies remain lacking. In this paper, we present an automatic rat brain extraction method, the Rat Brain Deformable model method (RBD), which adapts the popular human brain extraction tool (BET) through the incorporation of information on the brain geometry and MR image characteristics of the rat brain. The robustness of the method was demonstrated on T2-weighted MR images of 64 rats and compared with other brain extraction methods (BET, PCNN, PCNN-3D). The results demonstrate that RBD reliably extracts the rat brain with high accuracy (>92% volume overlap) and is robust against signal inhomogeneity in the images. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. On-line object feature extraction for multispectral scene representation

    NASA Technical Reports Server (NTRS)

    Ghassemian, Hassan; Landgrebe, David

    1988-01-01

    A new on-line unsupervised object-feature extraction method is presented that reduces the complexity and costs associated with the analysis of the multispectral image data and data transmission, storage, archival and distribution. The ambiguity in the object detection process can be reduced if the spatial dependencies, which exist among the adjacent pixels, are intelligently incorporated into the decision making process. The unity relation was defined that must exist among the pixels of an object. Automatic Multispectral Image Compaction Algorithm (AMICA) uses the within object pixel-feature gradient vector as a valuable contextual information to construct the object's features, which preserve the class separability information within the data. For on-line object extraction the path-hypothesis and the basic mathematical tools for its realization are introduced in terms of a specific similarity measure and adjacency relation. AMICA is applied to several sets of real image data, and the performance and reliability of features is evaluated.

  11. Mitigating Information Overload: The Impact of Context-Based Approach to the Design of Tools for Intelligence Analysts

    DTIC Science & Technology

    2008-03-01

    amount of arriving data, extract actionable information, and integrate it with prior knowledge. Add to that the pressures of today’s fusion center...information, and integrate it with prior knowledge. Add to that the pressures of today’s fusion center climate and it becomes clear that analysts, police... fusion centers, including specifics about how these problems manifest at the Illinois State Police (ISP) Statewide Terrorism and Intelligence Center

  12. Topographic Information Requirements and Computer-Graphic Display Techniques for Nap-of-the-Earth Flight.

    DTIC Science & Technology

    1979-12-01

    required of the Army aviator. The successful accomplishment of many of these activities depends upon the aviator’s ability to extract information from maps...Cruise NOE VBI Determine Position VB2 Crew Coordination (Topographic) VB3 Radio Communication VI . TERM4INATION C. Post-Flight VIC1 Debriefing 11LA 1I...NOE FUNCTION: VBI DETERMINE POSITION INFORMATION REQUIREMENT SPECIFICS SOURCE COMMENTS See Function IIIAl ! FUNCTION: VB2 CREW COORDINATION

  13. CRL/NMSU and Brandeis: Description of the MucBruce System as Used for MUC-4

    DTIC Science & Technology

    1992-01-01

    developing a method fo r identifying articles of interest and extracting and storing specific kinds of information from large volumes o f Japanese and...performance . Most of the information produced in our MUC template s is arrived at by probing the text which surrounds `significant’ words for the...strings with semantic information . The other two, the Relevant Template Filter and the Relevant Paragraph Filter, perform word frequency analysis to

  14. Towards a Relation Extraction Framework for Cyber-Security Concepts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Corinne L; Bridges, Robert A; Huffer, Kelly M

    In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised NLP and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting the desired relations. Preliminary testing on a smallmore » corpus shows promising results, obtaining precision of .82.« less

  15. [HPLC-ESI-MS(n) analysis of the water soluble extracts of Fructus Choerospondiatis].

    PubMed

    Shi, Run-ju; Dai, Yun; Fang, Min-feng; Zhao, Xin; Zheng, Jian-bin; Zheng, Xiao-hui

    2007-03-01

    To establish an HPLC-ESI-MS(n) method for analyzing the chemical ingredients in the water soluble extracts of Fructus Choerospondiatis. Water-solvable extracts of Fructus Choerospondiatis are obtained by heating recirculation. Multi-stage reaction mode (MRM) of the HPLC-ESI-MS(n) was used to determine the content of Gallic acid, the MS(n) technology was used to obtain the information of characteristic multistage fragment ions so as to identify the chemical structure of peaks in the total current spectrum. Eleven compounds were identified, and one of them is a new unknown ingredient. The method, which has high recovery and specificity, can offer the experimental evidences for the further research of the chemical ingredients extracted from the Fructus Choerospondiatis.

  16. Composition of extracts of airborne grain dusts: lectins and lymphocyte mitogens.

    PubMed Central

    Olenchock, S A; Lewis, D M; Mull, J C

    1986-01-01

    Airborne grain dusts are heterogeneous materials that can elicit acute and chronic respiratory pathophysiology in exposed workers. Previous characterizations of the dusts include the identification of viable microbial contaminants, mycotoxins, and endotoxins. We provide information on the lectin-like activity of grain dust extracts and its possible biological relationship. Hemagglutination of erythrocytes and immunochemical modulation by antibody to specific lectins showed the presence of these substances in extracts of airborne dusts from barley, corn, and rye. Proliferation of normal rat splenic lymphocytes in vitro provided evidence for direct biological effects on the cells of the immune system. These data expand the knowledge of the composition of grain dusts (extracts), and suggest possible mechanisms that may contribute to respiratory disease in grain workers. PMID:3709474

  17. Text-in-Context: A Method for Extracting Findings in Mixed-Methods Mixed Research Synthesis Studies

    PubMed Central

    Leeman, Jennifer; Knafl, Kathleen; Crandell, Jamie L.

    2012-01-01

    Aim Our purpose in this paper is to propose a new method for extracting findings from research reports included in mixed-methods mixed research synthesis studies. Background International initiatives in the domains of systematic review and evidence synthesis have been focused on broadening the conceptualization of evidence, increased methodological inclusiveness and the production of evidence syntheses that will be accessible to and usable by a wider range of consumers. Initiatives in the general mixed-methods research field have been focused on developing truly integrative approaches to data analysis and interpretation. Data source The data extraction challenges described here were encountered and the method proposed for addressing these challenges was developed, in the first year of the ongoing (2011–2016) study: Mixed-Methods Synthesis of Research on Childhood Chronic Conditions and Family. Discussion To preserve the text-in-context of findings in research reports, we describe a method whereby findings are transformed into portable statements that anchor results to relevant information about sample, source of information, time, comparative reference point, magnitude and significance and study-specific conceptions of phenomena. Implications for nursing The data extraction method featured here was developed specifically to accommodate mixed-methods mixed research synthesis studies conducted in nursing and other health sciences, but reviewers might find it useful in other kinds of research synthesis studies. Conclusion This data extraction method itself constitutes a type of integration to preserve the methodological context of findings when statements are read individually and in comparison to each other. PMID:22924808

  18. Functional source separation and hand cortical representation for a brain–computer interface feature extraction

    PubMed Central

    Tecchio, Franca; Porcaro, Camillo; Barbati, Giulia; Zappasodi, Filippo

    2007-01-01

    A brain–computer interface (BCI) can be defined as any system that can track the person's intent which is embedded in his/her brain activity and, from it alone, translate the intention into commands of a computer. Among the brain signal monitoring systems best suited for this challenging task, electroencephalography (EEG) and magnetoencephalography (MEG) are the most realistic, since both are non-invasive, EEG is portable and MEG could provide more specific information that could be later exploited also through EEG signals. The first two BCI steps require set up of the appropriate experimental protocol while recording the brain signal and then to extract interesting features from the recorded cerebral activity. To provide information useful in these BCI stages, our aim is to provide an overview of a new procedure we recently developed, named functional source separation (FSS). As it comes from the blind source separation algorithms, it exploits the most valuable information provided by the electrophysiological techniques, i.e. the waveform signal properties, remaining blind to the biophysical nature of the signal sources. FSS returns the single trial source activity, estimates the time course of a neuronal pool along different experimental states on the basis of a specific functional requirement in a specific time period, and uses the simulated annealing as the optimization procedure allowing the exploit of functional constraints non-differentiable. Moreover, a minor section is included, devoted to information acquired by MEG in stroke patients, to guide BCI applications aiming at sustaining motor behaviour in these patients. Relevant BCI features – spatial and time-frequency properties – are in fact altered by a stroke in the regions devoted to hand control. Moreover, a method to investigate the relationship between sensory and motor hand cortical network activities is described, providing information useful to develop BCI feedback control systems. This review provides a description of the FSS technique, a promising tool for the BCI community for online electrophysiological feature extraction, and offers interesting information to develop BCI applications to sustain hand control in stroke patients. PMID:17331989

  19. The feasibility of using natural language processing to extract clinical information from breast pathology reports.

    PubMed

    Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S

    2012-01-01

    The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.

  20. A Risk Assessment System with Automatic Extraction of Event Types

    NASA Astrophysics Data System (ADS)

    Capet, Philippe; Delavallade, Thomas; Nakamura, Takuya; Sandor, Agnes; Tarsitano, Cedric; Voyatzi, Stavroula

    In this article we describe the joint effort of experts in linguistics, information extraction and risk assessment to integrate EventSpotter, an automatic event extraction engine, into ADAC, an automated early warning system. By detecting as early as possible weak signals of emerging risks ADAC provides a dynamic synthetic picture of situations involving risk. The ADAC system calculates risk on the basis of fuzzy logic rules operated on a template graph whose leaves are event types. EventSpotter is based on a general purpose natural language dependency parser, XIP, enhanced with domain-specific lexical resources (Lexicon-Grammar). Its role is to automatically feed the leaves with input data.

  1. Workstation Analytics in Distributed Warfighting Experimentation: Results from Coalition Attack Guidance Experiment 3A

    DTIC Science & Technology

    2014-06-01

    central location. Each of the SQLite databases are converted and stored in one MySQL database and the pcap files are parsed to extract call information...from the specific communications applications used during the experiment. This extracted data is then stored in the same MySQL database. With all...rhythm of the event. Figure 3 demonstrates the application usage over the course of the experiment for the EXDIR. As seen, the EXDIR spent the majority

  2. Hot Chili Peppers: Extraction, Cleanup, and Measurement of Capsaicin

    NASA Astrophysics Data System (ADS)

    Huang, Jiping; Mabury, Scott A.; Sagebiel, John C.

    2000-12-01

    Capsaicin, the pungent ingredient of the red pepper or Capsicum annuum, is widely used in food preparation. The purpose of this experiment was to acquaint students with the active ingredients of hot chili pepper (capsaicin and dihydrocapsaicin), the extraction, cleanup, and analysis of these chemicals, as a fun and informative analytical exercise. Fresh peppers were prepared and extracted with acetonitrile, removing plant co-extractives by addition to a C-18 solid-phase extraction cartridge. Elution of the capsaicinoids was accomplished with a methanol-acetic acid solution. Analysis was completed by reverse-phase HPLC with diode-array or variable wavelength detection and calibration with external standards. Levels of capsaicin and dihydrocapsaicin were typically found to correlate with literature values for a specific hot pepper variety. Students particularly enjoyed relating concentrations of capsaicinoids to their perceived valuation of "hotness".

  3. Patient-Reported Outcome (PRO) Assessment in Clinical Trials: A Systematic Review of Guidance for Trial Protocol Writers

    PubMed Central

    Calvert, Melanie; Kyte, Derek; Duffy, Helen; Gheorghe, Adrian; Mercieca-Bebber, Rebecca; Ives, Jonathan; Draper, Heather; Brundage, Michael; Blazeby, Jane; King, Madeleine

    2014-01-01

    Background Evidence suggests there are inconsistencies in patient-reported outcome (PRO) assessment and reporting in clinical trials, which may limit the use of these data to inform patient care. For trials with a PRO endpoint, routine inclusion of key PRO information in the protocol may help improve trial conduct and the reporting and appraisal of PRO results; however, it is currently unclear exactly what PRO-specific information should be included. The aim of this review was to summarize the current PRO-specific guidance for clinical trial protocol developers. Methods and Findings We searched the MEDLINE, EMBASE, CINHAL and Cochrane Library databases (inception to February 2013) for PRO-specific guidance regarding trial protocol development. Further guidance documents were identified via Google, Google scholar, requests to members of the UK Clinical Research Collaboration registered clinical trials units and international experts. Two independent investigators undertook title/abstract screening, full text review and data extraction, with a third involved in the event of disagreement. 21,175 citations were screened and 54 met the inclusion criteria. Guidance documents were difficult to access: electronic database searches identified just 8 documents, with the remaining 46 sourced elsewhere (5 from citation tracking, 27 from hand searching, 7 from the grey literature review and 7 from experts). 162 unique PRO-specific protocol recommendations were extracted from included documents. A further 10 PRO recommendations were identified relating to supporting trial documentation. Only 5/162 (3%) recommendations appeared in ≥50% of guidance documents reviewed, indicating a lack of consistency. Conclusions PRO-specific protocol guidelines were difficult to access, lacked consistency and may be challenging to implement in practice. There is a need to develop easily accessible consensus-driven PRO protocol guidance. Guidance should be aimed at ensuring key PRO information is routinely included in appropriate trial protocols, in order to facilitate rigorous collection/reporting of PRO data, to effectively inform patient care. PMID:25333995

  4. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Literature-based condition-specific miRNA-mRNA target prediction.

    PubMed

    Oh, Minsik; Rhee, Sungmin; Moon, Ji Hwan; Chae, Heejoon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2017-01-01

    miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.

  6. Recent progress in automatically extracting information from the pharmacogenomic literature

    PubMed Central

    Garten, Yael; Coulet, Adrien; Altman, Russ B

    2011-01-01

    The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206

  7. Representation control increases task efficiency in complex graphical representations.

    PubMed

    Moritz, Julia; Meyerhoff, Hauke S; Meyer-Dernbecher, Claudia; Schwan, Stephan

    2018-01-01

    In complex graphical representations, the relevant information for a specific task is often distributed across multiple spatial locations. In such situations, understanding the representation requires internal transformation processes in order to extract the relevant information. However, digital technology enables observers to alter the spatial arrangement of depicted information and therefore to offload the transformation processes. The objective of this study was to investigate the use of such a representation control (i.e. the users' option to decide how information should be displayed) in order to accomplish an information extraction task in terms of solution time and accuracy. In the representation control condition, the participants were allowed to reorganize the graphical representation and reduce information density. In the control condition, no interactive features were offered. We observed that participants in the representation control condition solved tasks that required reorganization of the maps faster and more accurate than participants without representation control. The present findings demonstrate how processes of cognitive offloading, spatial contiguity, and information coherence interact in knowledge media intended for broad and diverse groups of recipients.

  8. Representation control increases task efficiency in complex graphical representations

    PubMed Central

    Meyerhoff, Hauke S.; Meyer-Dernbecher, Claudia; Schwan, Stephan

    2018-01-01

    In complex graphical representations, the relevant information for a specific task is often distributed across multiple spatial locations. In such situations, understanding the representation requires internal transformation processes in order to extract the relevant information. However, digital technology enables observers to alter the spatial arrangement of depicted information and therefore to offload the transformation processes. The objective of this study was to investigate the use of such a representation control (i.e. the users' option to decide how information should be displayed) in order to accomplish an information extraction task in terms of solution time and accuracy. In the representation control condition, the participants were allowed to reorganize the graphical representation and reduce information density. In the control condition, no interactive features were offered. We observed that participants in the representation control condition solved tasks that required reorganization of the maps faster and more accurate than participants without representation control. The present findings demonstrate how processes of cognitive offloading, spatial contiguity, and information coherence interact in knowledge media intended for broad and diverse groups of recipients. PMID:29698443

  9. Search guidance is proportional to the categorical specificity of a target cue.

    PubMed

    Schmidt, Joseph; Zelinsky, Gregory J

    2009-10-01

    Visual search studies typically assume the availability of precise target information to guide search, often a picture of the exact target. However, search targets in the real world are often defined categorically and with varying degrees of visual specificity. In five target preview conditions we manipulated the availability of target visual information in a search task for common real-world objects. Previews were: a picture of the target, an abstract textual description of the target, a precise textual description, an abstract + colour textual description, or a precise + colour textual description. Guidance generally increased as information was added to the target preview. We conclude that the information used for search guidance need not be limited to a picture of the target. Although generally less precise, to the extent that visual information can be extracted from a target label and loaded into working memory, this information too can be used to guide search.

  10. A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    PubMed Central

    2011-01-01

    Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy. PMID:22208882

  11. DARHT Multi-intelligence Seismic and Acoustic Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

    The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less

  12. Definition of variables required for comprehensive description of drug dosage and clinical pharmacokinetics.

    PubMed

    Medem, Anna V; Seidling, Hanna M; Eichler, Hans-Georg; Kaltschmidt, Jens; Metzner, Michael; Hubert, Carina M; Czock, David; Haefeli, Walter E

    2017-05-01

    Electronic clinical decision support systems (CDSS) require drug information that can be processed by computers. The goal of this project was to determine and evaluate a compilation of variables that comprehensively capture the information contained in the summary of product characteristic (SmPC) and unequivocally describe the drug, its dosage options, and clinical pharmacokinetics. An expert panel defined and structured a set of variables and drafted a guideline to extract and enter information on dosage and clinical pharmacokinetics from textual SmPCs as published by the European Medicines Agency (EMA). The set of variables was iteratively revised and evaluated by data extraction and variable allocation of roughly 7% of all centrally approved drugs. The information contained in the SmPC was allocated to three information clusters consisting of 260 variables. The cluster "drug characterization" specifies the nature of the drug. The cluster "dosage" provides information on approved drug dosages and defines corresponding specific conditions. The cluster "clinical pharmacokinetics" includes pharmacokinetic parameters of relevance for dosing in clinical practice. A first evaluation demonstrated that, despite the complexity of the current free text SmPCs, dosage and pharmacokinetic information can be reliably extracted from the SmPCs and comprehensively described by a limited set of variables. By proposing a compilation of variables well describing drug dosage and clinical pharmacokinetics, the project represents a step forward towards the development of a comprehensive database system serving as information source for sophisticated CDSS.

  13. The perception of surface layout during low level flight

    NASA Technical Reports Server (NTRS)

    Perrone, John A.

    1991-01-01

    Although it is fairly well established that information about surface layout can be gained from motion cues, it is not so clear as to what information humans can use and what specific information they should be provided. Theoretical analyses tell us that the information is in the stimulus. It will take more experiments to verify that this information can be used by humans to extract surface layout from the 2D velocity flow field. The visual motion factors that can affect the pilot's ability to control an aircraft and to infer the layout of the terrain ahead are discussed.

  14. Quantitative evaluation of translational medicine based on scientometric analysis and information extraction.

    PubMed

    Zhang, Yin; Diao, Tianxi; Wang, Lei

    2014-12-01

    Designed to advance the two-way translational process between basic research and clinical practice, translational medicine has become one of the most important areas in biomedicine. The quantitative evaluation of translational medicine is valuable for the decision making of global translational medical research and funding. Using the scientometric analysis and information extraction techniques, this study quantitatively analyzed the scientific articles on translational medicine. The results showed that translational medicine had significant scientific output and impact, specific core field and institute, and outstanding academic status and benefit. While it is not considered in this study, the patent data are another important indicators that should be integrated in the relevant research in the future. © 2014 Wiley Periodicals, Inc.

  15. The portrayal of Down syndrome in prenatal screening information pamphlets.

    PubMed

    Lawson, Karen L; Carlson, Kara; Shynkaruk, Jody M

    2012-08-01

    To examine the information about Down syndrome (DS) provided to pregnant women in Canada through a content analysis of prenatal screening information pamphlets. Prenatal screening information pamphlets were requested from Canadian prenatal testing centres. In total, 17 pamphlets were received (response rate = 65%). Statements presenting information descriptive of DS were identified from the pamphlets, and a content analysis was carried out. Specifically, each statement was analyzed with respect to both the content and the valence of the information presented on the basis of predetermined decision rules. To enhance reliability, four independent raters reviewed each statement, and any differences in coding were resolved through discussion. In total, 158 statements descriptive of DS were extracted from the pamphlets. The categorical analysis revealed that 91% of the extracted statements emphasized medical or clinical information about DS, whereas only 9% of the statements relayed information pertaining to psychosocial issues. The valence analysis revealed that nearly one half of the statements portrayed a negative message pertaining to DS, while only 2.4% of the statements conveyed a positive image of DS. The pamphlets provided to pregnant women do not appear to present a comprehensive, balanced portrayal of DS, which may serve to limit informed decision-making.

  16. Requirement of scientific documentation for the development of Naturopathy.

    PubMed

    Rastogi, Rajiv

    2006-01-01

    Past few decades have witnessed explosion of knowledge in almost every field. This has resulted not only in the advancement of the subjects in particular but also have influenced the growth of various allied subjects. The present paper explains about the advancement of science through efforts made in specific areas and also through discoveries in different allied fields having an indirect influence upon the subject in proper. In Naturopathy this seems that though nothing particular is added to the basic thoughts or fundamental principles of the subject yet the entire treatment understanding is revolutionised under the influence of scientific discoveries of past few decades. Advent of information technology has further added to the boom of knowledge and many times this seems impossible to utilize these informations for the good of human being because these are not logically arranged in our minds. In the above background, the author tries to define documentation stating that we have today ocean of information and knowledge about various things- living or dead, plants, animals or human beings; the geographical conditions or changing weather and environment. What required to be done is to extract the relevant knowledge and information required to enrich the subject. The author compares documentation with churning of milk to extract butter. Documentation, in fact, is churning of ocean of information to extract the specific, most appropriate, relevant and defined information and knowledge related to the particular subject . The paper besides discussing the definition of documentation, highlights the areas of Naturopathy requiring an urgent necessity to make proper documentations. Paper also discusses the present status of Naturopathy in India, proposes short-term and long-term goals to be achieved and plans the strategies for achieving them. The most important aspect of the paper is due understanding of the limitations of Naturopathy but a constant effort to improve the same with the growth made in various discipline of science so far.

  17. Species-specific recognition of the carrier insect by dauer larvae of the nematode Caenorhabditis japonica.

    PubMed

    Okumura, Etsuko; Tanaka, Ryusei; Yoshiga, Toyoshi

    2013-02-15

    Host recognition is crucial during the phoretic stage of nematodes because it facilitates their association with hosts. However, limited information is available on the direct cues used for host recognition and host specificity in nematodes. Caenorhabditis japonica forms an intimate association with the burrower bug Parastrachia japonensis. Caenorhabditis japonica dauer larvae (DL), the phoretic stage of the nematode, are mainly found on adult P. japonensis females but no other species. To understand the mechanisms of species-specific and female carrier-biased ectophoresy in C. japonica, we investigated whether C. japonica DL could recognize their hosts using nematode loading and chemoattraction experiments. During the loading experiments, up to 300 C. japonica DL embarked on male and female P. japonensis, whereas none or very few utilized the other shield bugs Erthesina fullo and Macroscytus japonensis or the terrestrial isopod Armadillidium vulgare. In the chemoattraction experiments, hexane extracts containing the body surface components of nymphs and both adult P. japonensis sexes attracted C. japonica DL, whereas those of other shield bugs did not. Parastrachia japonensis extracts also arrested the dispersal of C. japonica DL released at a site where hexane extracts were spotted on an agar plate; i.e. >50% of DL remained at the site even 60 min after nematode inoculation whereas M. japonensis extracts or hexane alone did not have the same effect. These results suggest that C. japonica DL recognize their host species using direct chemical attractants from their specific host to maintain their association.

  18. From the Beauty of Genomic Landscapes to the Strength of Transcriptional Mechanisms.

    PubMed

    Natoli, Gioacchino

    2016-03-24

    Genomic analyses are commonly used to infer trends and broad rules underlying transcriptional control. The innovative approach by Tong et al. to interrogate genomic datasets allows extracting mechanistic information on the specific regulation of individual genes. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. ESP Needs Washback and the Fine Tuning of Driving Instruction

    ERIC Educational Resources Information Center

    Freiermuth, Mark R.

    2007-01-01

    Workplace needs are often difficult for English for Specific Purposes (ESP) teachers to assess due to a variety of obstacles that can restrict opportunities to analyze the existing needs. Nevertheless, the workers' needs may be recognized by employing techniques aimed at extracting information from the workers themselves. Japanese university…

  20. A Prolog System for Converting VHDL-Based Models to Generalized Extraction System (GES) Rules

    DTIC Science & Technology

    1991-06-01

    NOTICE When Government drawings, specifications, or other data are used for any purpose other than in connection with a definitely Government-related...any way supplied the said drawings, specifications, or other data , is not to be regarded by implication, or otherwise in any manner construed, as...information is estimated to average I hour per response, including the time for reviewilng instuctions. searching existing data souarces, qthprinq ted

  1. Extracting Information about the Rotator Cuff from Magnetic Resonance Images Using Deterministic and Random Techniques

    PubMed Central

    De Los Ríos, F. A.; Paluszny, M.

    2015-01-01

    We consider some methods to extract information about the rotator cuff based on magnetic resonance images; the study aims to define an alternative method of display that might facilitate the detection of partial tears in the supraspinatus tendon. Specifically, we are going to use families of ellipsoidal triangular patches to cover the humerus head near the affected area. These patches are going to be textured and displayed with the information of the magnetic resonance images using the trilinear interpolation technique. For the generation of points to texture each patch, we propose a new method that guarantees the uniform distribution of its points using a random statistical method. Its computational cost, defined as the average computing time to generate a fixed number of points, is significantly lower as compared with deterministic and other standard statistical techniques. PMID:25650281

  2. Electrochemical Probing through a Redox Capacitor To Acquire Chemical Information on Biothiols

    PubMed Central

    2016-01-01

    The acquisition of chemical information is a critical need for medical diagnostics, food/environmental monitoring, and national security. Here, we report an electrochemical information processing approach that integrates (i) complex electrical inputs/outputs, (ii) mediators to transduce the electrical I/O into redox signals that can actively probe the chemical environment, and (iii) a redox capacitor that manipulates signals for information extraction. We demonstrate the capabilities of this chemical information processing strategy using biothiols because of the emerging importance of these molecules in medicine and because their distinct chemical properties allow evaluation of hypothesis-driven information probing. We show that input sequences can be tailored to probe for chemical information both qualitatively (step inputs probe for thiol-specific signatures) and quantitatively. Specifically, we observed picomolar limits of detection and linear responses to concentrations over 5 orders of magnitude (1 pM–0.1 μM). This approach allows the capabilities of signal processing to be extended for rapid, robust, and on-site analysis of chemical information. PMID:27385047

  3. Electrochemical Probing through a Redox Capacitor To Acquire Chemical Information on Biothiols.

    PubMed

    Liu, Zhengchun; Liu, Yi; Kim, Eunkyoung; Bentley, William E; Payne, Gregory F

    2016-07-19

    The acquisition of chemical information is a critical need for medical diagnostics, food/environmental monitoring, and national security. Here, we report an electrochemical information processing approach that integrates (i) complex electrical inputs/outputs, (ii) mediators to transduce the electrical I/O into redox signals that can actively probe the chemical environment, and (iii) a redox capacitor that manipulates signals for information extraction. We demonstrate the capabilities of this chemical information processing strategy using biothiols because of the emerging importance of these molecules in medicine and because their distinct chemical properties allow evaluation of hypothesis-driven information probing. We show that input sequences can be tailored to probe for chemical information both qualitatively (step inputs probe for thiol-specific signatures) and quantitatively. Specifically, we observed picomolar limits of detection and linear responses to concentrations over 5 orders of magnitude (1 pM-0.1 μM). This approach allows the capabilities of signal processing to be extended for rapid, robust, and on-site analysis of chemical information.

  4. Multi-modal approach using Raman spectroscopy and optical coherence tomography for the discrimination of colonic adenocarcinoma from normal colon

    PubMed Central

    Ashok, Praveen C.; Praveen, Bavishna B.; Bellini, Nicola; Riches, Andrew; Dholakia, Kishan; Herrington, C. Simon

    2013-01-01

    We report a multimodal optical approach using both Raman spectroscopy and optical coherence tomography (OCT) in tandem to discriminate between colonic adenocarcinoma and normal colon. Although both of these non-invasive techniques are capable of discriminating between normal and tumour tissues, they are unable individually to provide both the high specificity and high sensitivity required for disease diagnosis. We combine the chemical information derived from Raman spectroscopy with the texture parameters extracted from OCT images. The sensitivity obtained using Raman spectroscopy and OCT individually was 89% and 78% respectively and the specificity was 77% and 74% respectively. Combining the information derived using the two techniques increased both sensitivity and specificity to 94% demonstrating that combining complementary optical information enhances diagnostic accuracy. These data demonstrate that multimodal optical analysis has the potential to achieve accurate non-invasive cancer diagnosis. PMID:24156073

  5. Volume and Value of Big Healthcare Data.

    PubMed

    Dinov, Ivo D

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions.

  6. Volume and Value of Big Healthcare Data

    PubMed Central

    Dinov, Ivo D.

    2016-01-01

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions. PMID:26998309

  7. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies

    PubMed Central

    Ni, Yan; Su, Mingming; Qiu, Yunping; Jia, Wei

    2017-01-01

    ADAP-GC is an automated computational pipeline for untargeted, GC-MS-based metabolomics studies. It takes raw mass spectrometry data as input and carries out a sequence of data processing steps including construction of extracted ion chromatograms, detection of chromatographic peak features, deconvolution of co-eluting compounds, and alignment of compounds across samples. Despite the increased accuracy from the original version to version 2.0 in terms of extracting metabolite information for identification and quantitation, ADAP-GC 2.0 requires appropriate specification of a number of parameters and has difficulty in extracting information of compounds that are in low concentration. To overcome these two limitations, ADAP-GC 3.0 was developed to improve both the robustness and sensitivity of compound detection. In this paper, we report how these goals were achieved and compare ADAP-GC 3.0 against three other software tools including ChromaTOF, AnalyzerPro, and AMDIS that are widely used in the metabolomics community. PMID:27461032

  8. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies.

    PubMed

    Ni, Yan; Su, Mingming; Qiu, Yunping; Jia, Wei; Du, Xiuxia

    2016-09-06

    ADAP-GC is an automated computational pipeline for untargeted, GC/MS-based metabolomics studies. It takes raw mass spectrometry data as input and carries out a sequence of data processing steps including construction of extracted ion chromatograms, detection of chromatographic peak features, deconvolution of coeluting compounds, and alignment of compounds across samples. Despite the increased accuracy from the original version to version 2.0 in terms of extracting metabolite information for identification and quantitation, ADAP-GC 2.0 requires appropriate specification of a number of parameters and has difficulty in extracting information on compounds that are in low concentration. To overcome these two limitations, ADAP-GC 3.0 was developed to improve both the robustness and sensitivity of compound detection. In this paper, we report how these goals were achieved and compare ADAP-GC 3.0 against three other software tools including ChromaTOF, AnalyzerPro, and AMDIS that are widely used in the metabolomics community.

  9. Rain/No-Rain Identification from Bispectral Satellite Information using Deep Neural Networks

    NASA Astrophysics Data System (ADS)

    Tao, Y.

    2016-12-01

    Satellite-based precipitation estimation products have the advantage of high resolution and global coverage. However, they still suffer from insufficient accuracy. To accurately estimate precipitation from satellite data, there are two most important aspects: sufficient precipitation information in the satellite information and proper methodologies to extract such information effectively. This study applies the state-of-the-art machine learning methodologies to bispectral satellite information for Rain/No-Rain detection. Specifically, we use deep neural networks to extract features from infrared and water vapor channels and connect it to precipitation identification. To evaluate the effectiveness of the methodology, we first applies it to the infrared data only (Model DL-IR only), the most commonly used inputs for satellite-based precipitation estimation. Then we incorporates water vapor data (Model DL-IR + WV) to further improve the prediction performance. Radar stage IV dataset is used as ground measurement for parameter calibration. The operational product, Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks Cloud Classification System (PERSIANN-CCS), is used as a reference to compare the performance of both models in both winter and summer seasons.The experiments show significant improvement for both models in precipitation identification. The overall performance gains in the Critical Success Index (CSI) are 21.60% and 43.66% over the verification periods for Model DL-IR only and Model DL-IR+WV model compared to PERSIANN-CCS, respectively. Moreover, specific case studies show that the water vapor channel information and the deep neural networks effectively help recover a large number of missing precipitation pixels under warm clouds while reducing false alarms under cold clouds.

  10. Extracting Objects for Aerial Manipulation on UAVs Using Low Cost Stereo Sensors

    PubMed Central

    Ramon Soria, Pablo; Bevec, Robert; Arrue, Begoña C.; Ude, Aleš; Ollero, Aníbal

    2016-01-01

    Giving unmanned aerial vehicles (UAVs) the possibility to manipulate objects vastly extends the range of possible applications. This applies to rotary wing UAVs in particular, where their capability of hovering enables a suitable position for in-flight manipulation. Their manipulation skills must be suitable for primarily natural, partially known environments, where UAVs mostly operate. We have developed an on-board object extraction method that calculates information necessary for autonomous grasping of objects, without the need to provide the model of the object’s shape. A local map of the work-zone is generated using depth information, where object candidates are extracted by detecting areas different to our floor model. Their image projections are then evaluated using support vector machine (SVM) classification to recognize specific objects or reject bad candidates. Our method builds a sparse cloud representation of each object and calculates the object’s centroid and the dominant axis. This information is then passed to a grasping module. Our method works under the assumption that objects are static and not clustered, have visual features and the floor shape of the work-zone area is known. We used low cost cameras for creating depth information that cause noisy point clouds, but our method has proved robust enough to process this data and return accurate results. PMID:27187413

  11. Extracting Objects for Aerial Manipulation on UAVs Using Low Cost Stereo Sensors.

    PubMed

    Ramon Soria, Pablo; Bevec, Robert; Arrue, Begoña C; Ude, Aleš; Ollero, Aníbal

    2016-05-14

    Giving unmanned aerial vehicles (UAVs) the possibility to manipulate objects vastly extends the range of possible applications. This applies to rotary wing UAVs in particular, where their capability of hovering enables a suitable position for in-flight manipulation. Their manipulation skills must be suitable for primarily natural, partially known environments, where UAVs mostly operate. We have developed an on-board object extraction method that calculates information necessary for autonomous grasping of objects, without the need to provide the model of the object's shape. A local map of the work-zone is generated using depth information, where object candidates are extracted by detecting areas different to our floor model. Their image projections are then evaluated using support vector machine (SVM) classification to recognize specific objects or reject bad candidates. Our method builds a sparse cloud representation of each object and calculates the object's centroid and the dominant axis. This information is then passed to a grasping module. Our method works under the assumption that objects are static and not clustered, have visual features and the floor shape of the work-zone area is known. We used low cost cameras for creating depth information that cause noisy point clouds, but our method has proved robust enough to process this data and return accurate results.

  12. Biological event composition

    PubMed Central

    2012-01-01

    Background In recent years, biological event extraction has emerged as a key natural language processing task, aiming to address the information overload problem in accessing the molecular biology literature. The BioNLP shared task competitions have contributed to this recent interest considerably. The first competition (BioNLP'09) focused on extracting biological events from Medline abstracts from a narrow domain, while the theme of the latest competition (BioNLP-ST'11) was generalization and a wider range of text types, event types, and subject domains were considered. We view event extraction as a building block in larger discourse interpretation and propose a two-phase, linguistically-grounded, rule-based methodology. In the first phase, a general, underspecified semantic interpretation is composed from syntactic dependency relations in a bottom-up manner. The notion of embedding underpins this phase and it is informed by a trigger dictionary and argument identification rules. Coreference resolution is also performed at this step, allowing extraction of inter-sentential relations. The second phase is concerned with constraining the resulting semantic interpretation by shared task specifications. We evaluated our general methodology on core biological event extraction and speculation/negation tasks in three main tracks of BioNLP-ST'11 (GENIA, EPI, and ID). Results We achieved competitive results in GENIA and ID tracks, while our results in the EPI track leave room for improvement. One notable feature of our system is that its performance across abstracts and articles bodies is stable. Coreference resolution results in minor improvement in system performance. Due to our interest in discourse-level elements, such as speculation/negation and coreference, we provide a more detailed analysis of our system performance in these subtasks. Conclusions The results demonstrate the viability of a robust, linguistically-oriented methodology, which clearly distinguishes general semantic interpretation from shared task specific aspects, for biological event extraction. Our error analysis pinpoints some shortcomings, which we plan to address in future work within our incremental system development methodology. PMID:22759461

  13. Sequence patterns mediating functions of disordered proteins.

    PubMed

    Exarchos, Konstantinos P; Kourou, Konstantina; Exarchos, Themis P; Papaloukas, Costas; Karamouzis, Michalis V; Fotiadis, Dimitrios I

    2015-01-01

    Disordered proteins lack specific 3D structure in their native state and have been implicated with numerous cellular functions as well as with the induction of severe diseases, e.g., cardiovascular and neurodegenerative diseases as well as diabetes. Due to their conformational flexibility they are often found to interact with a multitude of protein molecules; this one-to-many interaction which is vital for their versatile functioning involves short consensus protein sequences, which are normally detected using slow and cumbersome experimental procedures. In this work we exploit information from disorder-oriented protein interaction networks focused specifically on humans, in order to assemble, by means of overrepresentation, a set of sequence patterns that mediate the functioning of disordered proteins; hence, we are able to identify how a single protein achieves such functional promiscuity. Next, we study the sequential characteristics of the extracted patterns, which exhibit a striking preference towards a very limited subset of amino acids; specifically, residues leucine, glutamic acid, and serine are particularly frequent among the extracted patterns, and we also observe a nontrivial propensity towards alanine and glycine. Furthermore, based on the extracted patterns we set off to infer potential functional implications in order to verify our findings and potentially further extrapolate our knowledge regarding the functioning of disordered proteins. We observe that the extracted patterns are primarily involved with regulation, binding and posttranslational modifications, which constitute the most prominent functions of disordered proteins.

  14. a Framework of Change Detection Based on Combined Morphologica Features and Multi-Index Classification

    NASA Astrophysics Data System (ADS)

    Li, S.; Zhang, S.; Yang, D.

    2017-09-01

    Remote sensing images are particularly well suited for analysis of land cover change. In this paper, we present a new framework for detection of changing land cover using satellite imagery. Morphological features and a multi-index are used to extract typical objects from the imagery, including vegetation, water, bare land, buildings, and roads. Our method, based on connected domains, is different from traditional methods; it uses image segmentation to extract morphological features, while the enhanced vegetation index (EVI), the differential water index (NDWI) are used to extract vegetation and water, and a fragmentation index is used to the correct extraction results of water. HSV transformation and threshold segmentation extract and remove the effects of shadows on extraction results. Change detection is performed on these results. One of the advantages of the proposed framework is that semantic information is extracted automatically using low-level morphological features and indexes. Another advantage is that the proposed method detects specific types of change without any training samples. A test on ZY-3 images demonstrates that our framework has a promising capability to detect change.

  15. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.

  16. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

    PubMed Central

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242

  17. Profiling Lung Cancer Patients Using Electronic Health Records.

    PubMed

    Menasalvas Ruiz, Ernestina; Tuñas, Juan Manuel; Bermejo, Guzmán; Gonzalo Martín, Consuelo; Rodríguez-González, Alejandro; Zanin, Massimiliano; González de Pedro, Cristina; Méndez, Marta; Zaretskaia, Olga; Rey, Jesús; Parejo, Consuelo; Cruz Bermudez, Juan Luis; Provencio, Mariano

    2018-05-31

    If Electronic Health Records contain a large amount of information about the patient's condition and response to treatment, which can potentially revolutionize the clinical practice, such information is seldom considered due to the complexity of its extraction and analysis. We here report on a first integration of an NLP framework for the analysis of clinical records of lung cancer patients making use of a telephone assistance service of a major Spanish hospital. We specifically show how some relevant data, about patient demographics and health condition, can be extracted; and how some relevant analyses can be performed, aimed at improving the usefulness of the service. We thus demonstrate that the use of EHR texts, and their integration inside a data analysis framework, is technically feasible and worth of further study.

  18. Bearing diagnostics: A method based on differential geometry

    NASA Astrophysics Data System (ADS)

    Tian, Ye; Wang, Zili; Lu, Chen; Wang, Zhipeng

    2016-12-01

    The structures around bearings are complex, and the working environment is variable. These conditions cause the collected vibration signals to become nonlinear, non-stationary, and chaotic characteristics that make noise reduction, feature extraction, fault diagnosis, and health assessment significantly challenging. Thus, a set of differential geometry-based methods with superiorities in nonlinear analysis is presented in this study. For noise reduction, the Local Projection method is modified by both selecting the neighborhood radius based on empirical mode decomposition and determining noise subspace constrained by neighborhood distribution information. For feature extraction, Hessian locally linear embedding is introduced to acquire manifold features from the manifold topological structures, and singular values of eigenmatrices as well as several specific frequency amplitudes in spectrograms are extracted subsequently to reduce the complexity of the manifold features. For fault diagnosis, information geometry-based support vector machine is applied to classify the fault states. For health assessment, the manifold distance is employed to represent the health information; the Gaussian mixture model is utilized to calculate the confidence values, which directly reflect the health status. Case studies on Lorenz signals and vibration datasets of bearings demonstrate the effectiveness of the proposed methods.

  19. Challenges in Extracting Information From Large Hydrogeophysical-monitoring Datasets

    NASA Astrophysics Data System (ADS)

    Day-Lewis, F. D.; Slater, L. D.; Johnson, T.

    2012-12-01

    Over the last decade, new automated geophysical data-acquisition systems have enabled collection of increasingly large and information-rich geophysical datasets. Concurrent advances in field instrumentation, web services, and high-performance computing have made real-time processing, inversion, and visualization of large three-dimensional tomographic datasets practical. Geophysical-monitoring datasets have provided high-resolution insights into diverse hydrologic processes including groundwater/surface-water exchange, infiltration, solute transport, and bioremediation. Despite the high information content of such datasets, extraction of quantitative or diagnostic hydrologic information is challenging. Visual inspection and interpretation for specific hydrologic processes is difficult for datasets that are large, complex, and (or) affected by forcings (e.g., seasonal variations) unrelated to the target hydrologic process. New strategies are needed to identify salient features in spatially distributed time-series data and to relate temporal changes in geophysical properties to hydrologic processes of interest while effectively filtering unrelated changes. Here, we review recent work using time-series and digital-signal-processing approaches in hydrogeophysics. Examples include applications of cross-correlation, spectral, and time-frequency (e.g., wavelet and Stockwell transforms) approaches to (1) identify salient features in large geophysical time series; (2) examine correlation or coherence between geophysical and hydrologic signals, even in the presence of non-stationarity; and (3) condense large datasets while preserving information of interest. Examples demonstrate analysis of large time-lapse electrical tomography and fiber-optic temperature datasets to extract information about groundwater/surface-water exchange and contaminant transport.

  20. The fusiform face area: a cortical region specialized for the perception of faces

    PubMed Central

    Kanwisher, Nancy; Yovel, Galit

    2006-01-01

    Faces are among the most important visual stimuli we perceive, informing us not only about a person's identity, but also about their mood, sex, age and direction of gaze. The ability to extract this information within a fraction of a second of viewing a face is important for normal social interactions and has probably played a critical role in the survival of our primate ancestors. Considerable evidence from behavioural, neuropsychological and neurophysiological investigations supports the hypothesis that humans have specialized cognitive and neural mechanisms dedicated to the perception of faces (the face-specificity hypothesis). Here, we review the literature on a region of the human brain that appears to play a key role in face perception, known as the fusiform face area (FFA). Section 1 outlines the theoretical background for much of this work. The face-specificity hypothesis falls squarely on one side of a longstanding debate in the fields of cognitive science and cognitive neuroscience concerning the extent to which the mind/brain is composed of: (i) special-purpose (‘domain-specific’) mechanisms, each dedicated to processing a specific kind of information (e.g. faces, according to the face-specificity hypothesis), versus (ii) general-purpose (‘domain-general’) mechanisms, each capable of operating on any kind of information. Face perception has long served both as one of the prime candidates of a domain-specific process and as a key target for attack by proponents of domain-general theories of brain and mind. Section 2 briefly reviews the prior literature on face perception from behaviour and neurophysiology. This work supports the face-specificity hypothesis and argues against its domain-general alternatives (the individuation hypothesis, the expertise hypothesis and others). Section 3 outlines the more recent evidence on this debate from brain imaging, focusing particularly on the FFA. We review the evidence that the FFA is selectively engaged in face perception, by addressing (and rebutting) five of the most widely discussed alternatives to this hypothesis. In §4, we consider recent findings that are beginning to provide clues into the computations conducted in the FFA and the nature of the representations the FFA extracts from faces. We argue that the FFA is engaged both in detecting faces and in extracting the necessary perceptual information to recognize them, and that the properties of the FFA mirror previously identified behavioural signatures of face-specific processing (e.g. the face-inversion effect). Section 5 asks how the computations and representations in the FFA differ from those occurring in other nearby regions of cortex that respond strongly to faces and objects. The evidence indicates clear functional dissociations between these regions, demonstrating that the FFA shows not only functional specificity but also area specificity. We end by speculating in §6 on some of the broader questions raised by current research on the FFA, including the developmental origins of this region and the question of whether faces are unique versus whether similarly specialized mechanisms also exist for other domains of high-level perception and cognition. PMID:17118927

  1. Evaluation of the application of ERTS-1 data to the regional land use planning process. [Northeast Wisconsin

    NASA Technical Reports Server (NTRS)

    Clapp, J. L. (Principal Investigator); Green, T., III; Hanson, G. F.; Kiefer, R. W.; Niemann, B. J., Jr.

    1974-01-01

    The author has identified the following significant results. Employing simple and economical extraction methods, ERTS can provide valuable data to the planners at the state or regional level with a frequency never before possible. Interactive computer methods of working directly with ERTS digital information show much promise for providing land use information at a more specific level, since the data format production rate of ERTS justifies improved methods of analysis.

  2. Land cover change detection using a GIS-guided, feature-based classification of Landsat thematic mapper data. [Geographic Information System

    NASA Technical Reports Server (NTRS)

    Enslin, William R.; Ton, Jezching; Jain, Anil

    1987-01-01

    Landsat TM data were combined with land cover and planimetric data layers contained in the State of Michigan's geographic information system (GIS) to identify changes in forestlands, specifically new oil/gas wells. A GIS-guided feature-based classification method was developed. The regions extracted by the best image band/operator combination were studied using a set of rules based on the characteristics of the GIS oil/gas pads.

  3. Retrieval of radiology reports citing critical findings with disease-specific customization.

    PubMed

    Lacson, Ronilda; Sugarbaker, Nathanael; Prevedello, Luciano M; Ivan, Ip; Mar, Wendy; Andriole, Katherine P; Khorasani, Ramin

    2012-01-01

    Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. THIS PAPER: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications - an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) - to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application's performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks.

  4. Retrieval of Radiology Reports Citing Critical Findings with Disease-Specific Customization

    PubMed Central

    Lacson, Ronilda; Sugarbaker, Nathanael; Prevedello, Luciano M; Ivan, IP; Mar, Wendy; Andriole, Katherine P; Khorasani, Ramin

    2012-01-01

    Background: Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. Purpose: This paper: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications – an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) – to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application’s performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Results: Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Conclusion: Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks. PMID:22934127

  5. Detection and Evaluation of Cheating on College Exams Using Supervised Classification

    ERIC Educational Resources Information Center

    Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire

    2012-01-01

    Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…

  6. A Validation of Parafoveal Semantic Information Extraction in Reading Chinese

    ERIC Educational Resources Information Center

    Zhou, Wei; Kliegl, Reinhold; Yan, Ming

    2013-01-01

    Parafoveal semantic processing has recently been well documented in reading Chinese sentences, presumably because of language-specific features. However, because of a large variation of fixation landing positions on pretarget words, some preview words actually were located in foveal vision when readers' eyes landed close to the end of the…

  7. Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers

    PubMed Central

    Tabei, Yasuo; Pauwels, Edouard; Stoven, Véronique; Takemoto, Kazuhiro; Yamanishi, Yoshihiro

    2012-01-01

    Motivation: Drug effects are mainly caused by the interactions between drug molecules and their target proteins including primary targets and off-targets. Identification of the molecular mechanisms behind overall drug–target interactions is crucial in the drug design process. Results: We develop a classifier-based approach to identify chemogenomic features (the underlying associations between drug chemical substructures and protein domains) that are involved in drug–target interaction networks. We propose a novel algorithm for extracting informative chemogenomic features by using L1 regularized classifiers over the tensor product space of possible drug–target pairs. It is shown that the proposed method can extract a very limited number of chemogenomic features without loosing the performance of predicting drug–target interactions and the extracted features are biologically meaningful. The extracted substructure–domain association network enables us to suggest ligand chemical fragments specific for each protein domain and ligand core substructures important for a wide range of protein families. Availability: Softwares are available at the supplemental website. Contact: yamanishi@bioreg.kyushu-u.ac.jp Supplementary Information: Datasets and all results are available at http://cbio.ensmp.fr/~yyamanishi/l1binary/ . PMID:22962471

  8. Advanced eddy current test signal analysis for steam generator tube defect classification and characterization

    NASA Astrophysics Data System (ADS)

    McClanahan, James Patrick

    Eddy Current Testing (ECT) is a Non-Destructive Examination (NDE) technique that is widely used in power generating plants (both nuclear and fossil) to test the integrity of heat exchanger (HX) and steam generator (SG) tubing. Specifically for this research, laboratory-generated, flawed tubing data were examined. The purpose of this dissertation is to develop and implement an automated method for the classification and an advanced characterization of defects in HX and SG tubing. These two improvements enhanced the robustness of characterization as compared to traditional bobbin-coil ECT data analysis methods. A more robust classification and characterization of the tube flaw in-situ (while the SG is on-line but not when the plant is operating), should provide valuable information to the power industry. The following are the conclusions reached from this research. A feature extraction program acquiring relevant information from both the mixed, absolute and differential data was successfully implemented. The CWT was utilized to extract more information from the mixed, complex differential data. Image Processing techniques used to extract the information contained in the generated CWT, classified the data with a high success rate. The data were accurately classified, utilizing the compressed feature vector and using a Bayes classification system. An estimation of the upper bound for the probability of error, using the Bhattacharyya distance, was successfully applied to the Bayesian classification. The classified data were separated according to flaw-type (classification) to enhance characterization. The characterization routine used dedicated, flaw-type specific ANNs that made the characterization of the tube flaw more robust. The inclusion of outliers may help complete the feature space so that classification accuracy is increased. Given that the eddy current test signals appear very similar, there may not be sufficient information to make an extremely accurate (>95%) classification or an advanced characterization using this system. It is necessary to have a larger database fore more accurate system learning.

  9. A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

    PubMed Central

    Ortega-Martorell, Sandra; Ruiz, Héctor; Vellido, Alfredo; Olier, Iván; Romero, Enrique; Julià-Sapé, Margarida; Martín, José D.; Jarman, Ian H.; Arús, Carles; Lisboa, Paulo J. G.

    2013-01-01

    Background The clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyzes single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of tumor type classification from the spectroscopic signal. Methodology/Principal Findings Non-negative matrix factorization techniques have recently shown their potential for the identification of meaningful sources from brain tissue spectroscopy data. In this study, we use a convex variant of these methods that is capable of handling negatively-valued data and generating sources that can be interpreted as tumor class prototypes. A novel approach to convex non-negative matrix factorization is proposed, in which prior knowledge about class information is utilized in model optimization. Class-specific information is integrated into this semi-supervised process by setting the metric of a latent variable space where the matrix factorization is carried out. The reported experimental study comprises 196 cases from different tumor types drawn from two international, multi-center databases. The results indicate that the proposed approach outperforms a purely unsupervised process by achieving near perfect correlation of the extracted sources with the mean spectra of the tumor types. It also improves tissue type classification. Conclusions/Significance We show that source extraction by unsupervised matrix factorization benefits from the integration of the available class information, so operating in a semi-supervised learning manner, for discriminative source identification and brain tumor labeling from single-voxel spectroscopy data. We are confident that the proposed methodology has wider applicability for biomedical signal processing. PMID:24376744

  10. [Application of regular expression in extracting key information from Chinese medicine literatures about re-evaluation of post-marketing surveillance].

    PubMed

    Wang, Zhifei; Xie, Yanming; Wang, Yongyan

    2011-10-01

    Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.

  11. Characterization of cogon grass (Imperata cylindrica) pollen extract and preliminary analysis of grass group 1, 4 and 5 homologues using monoclonal antibodies to Phleum pratense.

    PubMed

    Kumar, L; Sridhara, S; Singh, B P; Gangal, S V

    1998-11-01

    Previous studies have established the role of Imperata cylindrica (Ic) pollen in type I allergic disorders. However, no systematic information is available on the allergen composition of Ic pollen extract. To characterize the IgE-binding proteins of Ic pollen extract and to detect the presence of grass group 1, 4 and 5 allergen homologues, if any. Pollen extract of Ic was analyzed by in vivo and in vitro procedures such as intradermal tests (ID), enzyme-linked immunosorbent assay (ELISA), ELISA-inhibition, thin-layer isoelectric focusing (TLIEF), sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) and immunoblotting. Dot blot assay was carried out to check the presence of well-known group 1, 4, and 5 allergen homologues in Ic pollen extract. Out of 303 respiratory allergies patients skin-tested, 27 showed sensitivity to Ic pollen extract. Specific IgE levels were elevated in all 15 serum samples tested. The extract prepared for this study was found to be highly potent since it required only 400 ng of homologous proteins for 50% inhibition of binding in ELISA inhibition assays. TLIEF of Ic pollen extract showed 44 silver-stained bands (pI 3.5-7.0) while SDS-PAGE resolved it into 24 Coomassie-Brilliant-Blue-stained bands (MW 100-10 kD). Immunoblotting with individual patient sera recognized 7 major IgE-binding bands (MW 85, 62, 57, 43, 40, 28 and 16 kD) in Ic pollen extract. A panel of monoclonal antibodies, specific to group 1, 4 and 5 allergens from Phleum pratense pollen extract identified group 5 and group 4 homologues in Ic pollen extract. Ic pollen extract was characterized for the protein profile by TLIEF and SDS-PAGE. IgE reactivity was determined by ELISA and immunoblot. Monoclonal antibodies to group 5 and group 4 allergens reacted weakly showing that this pollen contains group 5 and group 4 homologous allergens.

  12. Enhancing Comparative Effectiveness Research With Automated Pediatric Pneumonia Detection in a Multi-Institutional Clinical Repository: A PHIS+ Pilot Study.

    PubMed

    Meystre, Stephane; Gouripeddi, Ramkiran; Tieder, Joel; Simmons, Jeffrey; Srivastava, Rajendu; Shah, Samir

    2017-05-15

    Community-acquired pneumonia is a leading cause of pediatric morbidity. Administrative data are often used to conduct comparative effectiveness research (CER) with sufficient sample sizes to enhance detection of important outcomes. However, such studies are prone to misclassification errors because of the variable accuracy of discharge diagnosis codes. The aim of this study was to develop an automated, scalable, and accurate method to determine the presence or absence of pneumonia in children using chest imaging reports. The multi-institutional PHIS+ clinical repository was developed to support pediatric CER by expanding an administrative database of children's hospitals with detailed clinical data. To develop a scalable approach to find patients with bacterial pneumonia more accurately, we developed a Natural Language Processing (NLP) application to extract relevant information from chest diagnostic imaging reports. Domain experts established a reference standard by manually annotating 282 reports to train and then test the NLP application. Findings of pleural effusion, pulmonary infiltrate, and pneumonia were automatically extracted from the reports and then used to automatically classify whether a report was consistent with bacterial pneumonia. Compared with the annotated diagnostic imaging reports reference standard, the most accurate implementation of machine learning algorithms in our NLP application allowed extracting relevant findings with a sensitivity of .939 and a positive predictive value of .925. It allowed classifying reports with a sensitivity of .71, a positive predictive value of .86, and a specificity of .962. When compared with each of the domain experts manually annotating these reports, the NLP application allowed for significantly higher sensitivity (.71 vs .527) and similar positive predictive value and specificity . NLP-based pneumonia information extraction of pediatric diagnostic imaging reports performed better than domain experts in this pilot study. NLP is an efficient method to extract information from a large collection of imaging reports to facilitate CER. ©Stephane Meystre, Ramkiran Gouripeddi, Joel Tieder, Jeffrey Simmons, Rajendu Srivastava, Samir Shah. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.05.2017.

  13. Knowledge-Guided Robust MRI Brain Extraction for Diverse Large-Scale Neuroimaging Studies on Humans and Non-Human Primates

    PubMed Central

    Wang, Yaping; Nie, Jingxin; Yap, Pew-Thian; Li, Gang; Shi, Feng; Geng, Xiujuan; Guo, Lei; Shen, Dinggang

    2014-01-01

    Accurate and robust brain extraction is a critical step in most neuroimaging analysis pipelines. In particular, for the large-scale multi-site neuroimaging studies involving a significant number of subjects with diverse age and diagnostic groups, accurate and robust extraction of the brain automatically and consistently is highly desirable. In this paper, we introduce population-specific probability maps to guide the brain extraction of diverse subject groups, including both healthy and diseased adult human populations, both developing and aging human populations, as well as non-human primates. Specifically, the proposed method combines an atlas-based approach, for coarse skull-stripping, with a deformable-surface-based approach that is guided by local intensity information and population-specific prior information learned from a set of real brain images for more localized refinement. Comprehensive quantitative evaluations were performed on the diverse large-scale populations of ADNI dataset with over 800 subjects (55∼90 years of age, multi-site, various diagnosis groups), OASIS dataset with over 400 subjects (18∼96 years of age, wide age range, various diagnosis groups), and NIH pediatrics dataset with 150 subjects (5∼18 years of age, multi-site, wide age range as a complementary age group to the adult dataset). The results demonstrate that our method consistently yields the best overall results across almost the entire human life span, with only a single set of parameters. To demonstrate its capability to work on non-human primates, the proposed method is further evaluated using a rhesus macaque dataset with 20 subjects. Quantitative comparisons with popularly used state-of-the-art methods, including BET, Two-pass BET, BET-B, BSE, HWA, ROBEX and AFNI, demonstrate that the proposed method performs favorably with superior performance on all testing datasets, indicating its robustness and effectiveness. PMID:24489639

  14. Does Iconicity in Pictographs Matter? The Influence of Iconicity and Numeracy on Information Processing, Decision Making, and Liking in an Eye-Tracking Study.

    PubMed

    Kreuzmair, Christina; Siegrist, Michael; Keller, Carmen

    2017-03-01

    Researchers recommend the use of pictographs in medical risk communication to improve people's risk comprehension and decision making. However, it is not yet clear whether the iconicity used in pictographs to convey risk information influences individuals' information processing and comprehension. In an eye-tracking experiment with participants from the general population (N = 188), we examined whether specific types of pictograph icons influence the processing strategy viewers use to extract numerical information. In addition, we examined the effect of iconicity and numeracy on probability estimation, recall, and icon liking. This experiment used a 2 (iconicity: blocks vs. restroom icons) × 2 (scenario: medical vs. nonmedical) between-subject design. Numeracy had a significant effect on information processing strategy, but we found no effect of iconicity or scenario. Results indicated that both icon types enabled high and low numerates to use their default way of processing and extracting the gist of the message from the pictorial risk communication format: high numerates counted icons, whereas low numerates used large-area processing. There was no effect of iconicity in the probability estimation. However, people who saw restroom icons had a higher probability of correctly recalling the exact risk level. Iconicity had no effect on icon liking. Although the effects are small, our findings suggest that person-like restroom icons in pictographs seem to have some advantages for risk communication. Specifically, in nonpersonalized prevention brochures, person-like restroom icons may maintain reader motivation for processing the risk information. © 2016 Society for Risk Analysis.

  15. Challenges in Managing Information Extraction

    ERIC Educational Resources Information Center

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  16. Hylemetry versus Biometry: a new method to certificate the lithography authenticity

    NASA Astrophysics Data System (ADS)

    Schirripa Spagnolo, Giuseppe; Cozzella, Lorenzo; Simonetti, Carla

    2011-06-01

    When we buy an artwork object a certificate of authenticity contain specific details about the artwork. Unfortunately, these certificates are often exchanged between similar artworks: the same document is supplied by the seller to certificate the originality. In this way the buyer will have a copy of an original certificate to attest that the "not original artwork" is an original one. A solution for this problem would be to insert a system that links together the certificate and a specific artwork. To do this it is necessary, for a single artwork, to find unique, unrepeatable, and unchangeable characteristics. In this paper we propose a new lithography certification based on the color spots distribution, which compose the lithography itself. Due to the high resolution acquisition media available today, it is possible using analysis method typical of speckle metrology. In particular, in verification phase it is only necessary acquiring the same portion of lithography, extracting the verification information, using the private key to obtain the same information from the certificate and confronting the two information using a comparison threshold. Due to the possible rotation and translation it is applied image correlation solutions, used in speckle metrology, to determine translation and rotation error and correct allow to verifying extracted and acquired images in the best situation, for granting correct originality verification.

  17. Dexamethasone: Idiopathic Thrombocytopenic Purpura in Children and Adolescents

    PubMed Central

    Generali, Joyce A.; Cada, Dennis J.

    2013-01-01

    This Hospital Pharmacy feature is extracted from Off-Label Drug Facts, a quarterly publication available from Wolters Kluwer Health. Off-Label Drug Facts is a practitioner-oriented resource for information about specific drug uses that are unapproved by the US Food and Drug Administration. This new guide to the literature enables the health care professional or clinician to quickly identify published studies on off-label uses and determine if a specific use is rational in a patient care scenario. References direct the reader to the full literature for more comprehensive information before patient care decisions are made. Direct questions or comments regarding Off-Label Drug Uses to jgeneral@kumc.edu. PMID:24421447

  18. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

    The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less

  19. Space Research Data Management in the National Aeronautics and Space Administration

    NASA Technical Reports Server (NTRS)

    Ludwig, G. H.

    1986-01-01

    Space related scientific research has passed through a natural evolutionary process. The task of extracting the meaningful information from the raw data is highly involved and will require data processing capabilities that do not exist today. The results are presented of a three year examination of this subject, using an earlier report as a starting point. The general conclusion is that there are areas in which NASA's data management practices can be improved and recommends specific actions. These actions will enhance NASA's ability to extract more of the potential data and to capitalize on future opportunities.

  20. Extracting noun phrases for all of MEDLINE.

    PubMed Central

    Bennett, N. A.; He, Q.; Powell, K.; Schatz, B. R.

    1999-01-01

    A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature. PMID:10566444

  1. Text Detection, Tracking and Recognition in Video: A Comprehensive Survey.

    PubMed

    Yin, Xu-Cheng; Zuo, Ze-Yu; Tian, Shu; Liu, Cheng-Lin

    2016-04-14

    Intelligent analysis of video data is currently in wide demand because video is a major source of sensory data in our lives. Text is a prominent and direct source of information in video, while recent surveys of text detection and recognition in imagery [1], [2] focus mainly on text extraction from scene images. Here, this paper presents a comprehensive survey of text detection, tracking and recognition in video with three major contributions. First, a generic framework is proposed for video text extraction that uniformly describes detection, tracking, recognition, and their relations and interactions. Second, within this framework, a variety of methods, systems and evaluation protocols of video text extraction are summarized, compared, and analyzed. Existing text tracking techniques, tracking based detection and recognition techniques are specifically highlighted. Third, related applications, prominent challenges, and future directions for video text extraction (especially from scene videos and web videos) are also thoroughly discussed.

  2. Bioactivity-guided fractionation for the butyrylcholinesterase inhibitory activity of furanocoumarins from Angelica archangelica L. roots and fruits.

    PubMed

    Wszelaki, Natalia; Paradowska, Katarzyna; Jamróz, Marta K; Granica, Sebastian; Kiss, Anna K

    2011-09-14

    Isolation and identification of the inhibitors of butyrylcholinesterase (BChE), obtained from the extracts of roots and fruits of Angelica archangelica L., are reported. Our results confirmed the weak inhibitory effect of Angelica roots on acetylcholinesterase activity. BChE inhibition was much more pronounced at a concentration of 100 μg/mL for hexane extracts and attained a higher rate than 50%. The TLC bioautography guided fractionation and spectroscopic analysis led to the isolation and identification of imperatorin from the fruit's hexane extract and of heraclenol-2'-O-angelate from the root's hexane extract. Both compounds showed significant BChE inhibition activity with IC(50) = 14.4 ± 3.2 μM and IC(50) = 7.5 ± 1.8 μM, respectively. Only C8-substituted and C5-unsubstituted furanocoumarins were active, which could supply information about the initial structures of specific BChE inhibitors.

  3. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

    PubMed Central

    Roos, Marco; Marshall, M Scott; Gibson, Andrew P; Schuemie, Martijn; Meij, Edgar; Katrenko, Sophia; van Hage, Willem Robert; Krommydas, Konstantinos; Adriaans, Pieter W

    2009-01-01

    Background Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation. PMID:19796406

  4. Announcing the Launch of CPTAC’s Proteogenomics DREAM Challenge | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    This week, we are excited to announce the launch of the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) Proteogenomics Computational DREAM Challenge.  The aim of this Challenge is to encourage the generation of computational methods for extracting information from the cancer proteome and for linking those data to genomic and transcriptomic information.  The specific goals are to predict proteomic and phosphoproteomic data from other multiple data types including transcriptomics and genetics.

  5. a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information

    NASA Astrophysics Data System (ADS)

    Lian, Shizhong; Chen, Jiangping; Luo, Minghai

    2016-06-01

    Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.

  6. Paper-Based MicroRNA Expression Profiling from Plasma and Circulating Tumor Cells.

    PubMed

    Leong, Sai Mun; Tan, Karen Mei-Ling; Chua, Hui Wen; Huang, Mo-Chao; Cheong, Wai Chye; Li, Mo-Huang; Tucker, Steven; Koay, Evelyn Siew-Chuan

    2017-03-01

    Molecular characterization of circulating tumor cells (CTCs) holds great promise for monitoring metastatic progression and characterizing metastatic disease. However, leukocyte and red blood cell contamination of routinely isolated CTCs makes CTC-specific molecular characterization extremely challenging. Here we report the use of a paper-based medium for efficient extraction of microRNAs (miRNAs) from limited amounts of biological samples such as rare CTCs harvested from cancer patient blood. Specifically, we devised a workflow involving the use of Flinders Technology Associates (FTA) ® Elute Card with a digital PCR-inspired "partitioning" method to extract and purify miRNAs from plasma and CTCs. We demonstrated the sensitivity of this method to detect miRNA expression from as few as 3 cancer cells spiked into human blood. Using this method, background miRNA expression was excluded from contaminating blood cells, and CTC-specific miRNA expression profiles were derived from breast and colorectal cancer patients. Plasma separated out during purification of CTCs could likewise be processed using the same paper-based method for miRNA detection, thereby maximizing the amount of patient-specific information that can be derived from a single blood draw. Overall, this paper-based extraction method enables an efficient, cost-effective workflow for maximized recovery of small RNAs from limited biological samples for downstream molecular analyses. © 2016 American Association for Clinical Chemistry.

  7. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    PubMed

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily adapted to novel NGS assays. Examples, tutorials, and extensive documentation can be found at https://plastid.readthedocs.io .

  8. Earth Observatory Satellite system definition study. Report 5: System design and specifications. Volume 2: EOS-A system specification

    NASA Technical Reports Server (NTRS)

    1974-01-01

    The objectives of the Earth Observatory Satellite (EOS) program are defined. The system specifications for the satellite payload are examined. The broad objectives of the EOS-A program are as follows: (1) to develop space-borne sensors for the measurement of land resources, (2) to evolve spacecraft systems and subsystems which will permit earth observation with greater accuracy, coverage, spatial resolution, and continuity than existing systems, (3) to develop improved information processing, extraction, display, and distribution systems, and (4) to use space transportation systems for resupply and retrieval of the EOS.

  9. Information extraction system

    DOEpatents

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  10. Extraction of endoscopic images for biomedical figure classification

    NASA Astrophysics Data System (ADS)

    Xue, Zhiyun; You, Daekeun; Chachra, Suchet; Antani, Sameer; Long, L. R.; Demner-Fushman, Dina; Thoma, George R.

    2015-03-01

    Modality filtering is an important feature in biomedical image searching systems and may significantly improve the retrieval performance of the system. This paper presents a new method for extracting endoscopic image figures from photograph images in biomedical literature, which are found to have highly diverse content and large variability in appearance. Our proposed method consists of three main stages: tissue image extraction, endoscopic image candidate extraction, and ophthalmic image filtering. For tissue image extraction we use image patch level clustering and MRF relabeling to detect images containing skin/tissue regions. Next, we find candidate endoscopic images by exploiting the round shape characteristics that commonly appear in these images. However, this step needs to compensate for images where endoscopic regions are not entirely round. In the third step we filter out the ophthalmic images which have shape characteristics very similar to the endoscopic images. We do this by using text information, specifically, anatomy terms, extracted from the figure caption. We tested and evaluated our method on a dataset of 115,370 photograph figures, and achieved promising precision and recall rates of 87% and 84%, respectively.

  11. Validation assessment of shoreline extraction on medium resolution satellite image

    NASA Astrophysics Data System (ADS)

    Manaf, Syaifulnizam Abd; Mustapha, Norwati; Sulaiman, Md Nasir; Husin, Nor Azura; Shafri, Helmi Zulhaidi Mohd

    2017-10-01

    Monitoring coastal zones helps provide information about the conditions of the coastal zones, such as erosion or accretion. Moreover, monitoring the shorelines can help measure the severity of such conditions. Such measurement can be performed accurately by using Earth observation satellite images rather than by using traditional ground survey. To date, shorelines can be extracted from satellite images with a high degree of accuracy by using satellite image classification techniques based on machine learning to identify the land and water classes of the shorelines. In this study, the researchers validated the results of extracted shorelines of 11 classifiers using a reference shoreline provided by the local authority. Specifically, the validation assessment was performed to examine the difference between the extracted shorelines and the reference shorelines. The research findings showed that the SVM Linear was the most effective image classification technique, as evidenced from the lowest mean distance between the extracted shoreline and the reference shoreline. Furthermore, the findings showed that the accuracy of the extracted shoreline was not directly proportional to the accuracy of the image classification.

  12. Computer virus information update CIAC-2301

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Orvis, W.J.

    1994-01-15

    While CIAC periodically issues bulletins about specific computer viruses, these bulletins do not cover all the computer viruses that affect desktop computers. The purpose of this document is to identify most of the known viruses for the MS-DOS and Macintosh platforms and give an overview of the effects of each virus. The authors also include information on some windows, Atari, and Amiga viruses. This document is revised periodically as new virus information becomes available. This document replaces all earlier versions of the CIAC Computer virus Information Update. The date on the front cover indicates date on which the information inmore » this document was extracted from CIAC`s Virus database.« less

  13. Effects of DNA Extraction Procedures on Bacteroides Profiles in Fecal Samples From Various Animals Determined by Terminal Restriction Fragment Length Polymorphism Analysis

    EPA Science Inventory

    A major assumption in microbial source tracking is that some fecal bacteria are specific to a host animal, and thus provide unique microbial fingerprints that can be used to differentiate hosts. However, the DNA information obtained from a particular sample may be biased dependi...

  14. Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques

    NASA Technical Reports Server (NTRS)

    Smith, Michael A.; Kanade, Takeo

    1997-01-01

    Digital video is rapidly becoming important for education, entertainment, and a host of multimedia applications. With the size of the video collections growing to thousands of hours, technology is needed to effectively browse segments in a short time without losing the content of the video. We propose a method to extract the significant audio and video information and create a "skim" video which represents a very short synopsis of the original. The goal of this work is to show the utility of integrating language and image understanding techniques for video skimming by extraction of significant information, such as specific objects, audio keywords and relevant video structure. The resulting skim video is much shorter, where compaction is as high as 20:1, and yet retains the essential content of the original segment.

  15. Metadata registry and management system based on ISO 11179 for cancer clinical trials information system

    PubMed Central

    Park, Yu Rang; Kim*, Ju Han

    2006-01-01

    Standardized management of data elements (DEs) for Case Report Form (CRF) is crucial in Clinical Trials Information System (CTIS). Traditional CTISs utilize organization-specific definitions and storage methods for Des and CRFs. We developed metadata-based DE management system for clinical trials, Clinical and Histopathological Metadata Registry (CHMR), using international standard for metadata registry (ISO 11179) for the management of cancer clinical trials information. CHMR was evaluated in cancer clinical trials with 1625 DEs extracted from the College of American Pathologists Cancer Protocols for 20 major cancers. PMID:17238675

  16. Automatic indexing of compound words based on mutual information for Korean text retrieval

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pan Koo Kim; Yoo Kun Cho

    In this paper, we present an automatic indexing technique for compound words suitable to an aggulutinative language, specifically Korean. Firstly, we present the construction conditions to compose compound words as indexing terms. Also we present the decomposition rules applicable to consecutive nouns to extract all contents of text. Finally we propose a measure to estimate the usefulness of a term, mutual information, to calculate the degree of word association of compound words, based on the information theoretic notion. By applying this method, our system has raised the precision rate of compound words from 72% to 87%.

  17. Dynamic information processing states revealed through neurocognitive models of object semantics

    PubMed Central

    Clarke, Alex

    2015-01-01

    Recognising objects relies on highly dynamic, interactive brain networks to process multiple aspects of object information. To fully understand how different forms of information about objects are represented and processed in the brain requires a neurocognitive account of visual object recognition that combines a detailed cognitive model of semantic knowledge with a neurobiological model of visual object processing. Here we ask how specific cognitive factors are instantiated in our mental processes and how they dynamically evolve over time. We suggest that coarse semantic information, based on generic shared semantic knowledge, is rapidly extracted from visual inputs and is sufficient to drive rapid category decisions. Subsequent recurrent neural activity between the anterior temporal lobe and posterior fusiform supports the formation of object-specific semantic representations – a conjunctive process primarily driven by the perirhinal cortex. These object-specific representations require the integration of shared and distinguishing object properties and support the unique recognition of objects. We conclude that a valuable way of understanding the cognitive activity of the brain is though testing the relationship between specific cognitive measures and dynamic neural activity. This kind of approach allows us to move towards uncovering the information processing states of the brain and how they evolve over time. PMID:25745632

  18. Identifying Nanoscale Structure-Function Relationships Using Multimodal Atomic Force Microscopy, Dimensionality Reduction, and Regression Techniques.

    PubMed

    Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S

    2018-05-31

    Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.

  19. Protein-based stable isotope probing.

    PubMed

    Jehmlich, Nico; Schmidt, Frank; Taubert, Martin; Seifert, Jana; Bastida, Felipe; von Bergen, Martin; Richnow, Hans-Hermann; Vogt, Carsten

    2010-12-01

    We describe a stable isotope probing (SIP) technique that was developed to link microbe-specific metabolic function to phylogenetic information. Carbon ((13)C)- or nitrogen ((15)N)-labeled substrates (typically with >98% heavy label) were used in cultivation experiments and the heavy isotope incorporation into proteins (protein-SIP) on growth was determined. The amount of incorporation provides a measure for assimilation of a substrate, and the sequence information from peptide analysis obtained by mass spectrometry delivers phylogenetic information about the microorganisms responsible for the metabolism of the particular substrate. In this article, we provide guidelines for incubating microbial cultures with labeled substrates and a protocol for protein-SIP. The protocol guides readers through the proteomics pipeline, including protein extraction, gel-free and gel-based protein separation, the subsequent mass spectrometric analysis of peptides and the calculation of the incorporation of stable isotopes into peptides. Extraction of proteins and the mass fingerprint measurements of unlabeled and labeled fractions can be performed in 2-3 d.

  20. Asymptotic Cramer-Rao bounds for Morlet wavelet filter bank transforms of FM signals

    NASA Astrophysics Data System (ADS)

    Scheper, Richard

    2002-03-01

    Wavelet filter banks are potentially useful tools for analyzing and extracting information from frequency modulated (FM) signals in noise. Chief among the advantages of such filter banks is the tendency of wavelet transforms to concentrate signal energy while simultaneously dispersing noise energy over the time-frequency plane, thus raising the effective signal to noise ratio of filtered signals. Over the past decade, much effort has gone into devising new algorithms to extract the relevant information from transformed signals while identifying and discarding the transformed noise. Therefore, estimates of the ultimate performance bounds on such algorithms would serve as valuable benchmarks in the process of choosing optimal algorithms for given signal classes. Discussed here is the specific case of FM signals analyzed by Morlet wavelet filter banks. By making use of the stationary phase approximation of the Morlet transform, and assuming that the measured signals are well resolved digitally, the asymptotic form of the Fisher Information Matrix is derived. From this, Cramer-Rao bounds are analytically derived for simple cases.

  1. The Urbis Project: Identification and Characterization of Potential Urban Development Areas as a Web-Based Service

    NASA Astrophysics Data System (ADS)

    Manzke, Nina; Kada, Martin; Kastler, Thomas; Xu, Shaojuan; de Lange, Norbert; Ehlers, Manfred

    2016-06-01

    Urban sprawl and the related landscape fragmentation is a Europe-wide challenge in the context of sustainable urban planning. The URBan land recycling Information services for Sustainable cities (URBIS) project aims for the development, implementation, and validation of web-based information services for urban vacant land in European functional urban areas in order to provide end-users with site specific characteristics and to facilitate the identification and evaluation of potential development areas. The URBIS services are developed based on open geospatial data. In particular, the Copernicus Urban Atlas thematic layers serve as the main data source for an initial inventory of sites. In combination with remotely sensed data like SPOT5 images and ancillary datasets like OpenStreetMap, detailed site specific information is extracted. Services are defined for three main categories: i) baseline services, which comprise an initial inventory and typology of urban land, ii) update services, which provide a regular inventory update as well as an analysis of urban land use dynamics and changes, and iii) thematic services, which deliver specific information tailored to end-users' needs.

  2. MARS: bringing the automation of small-molecule bioanalytical sample preparations to a new frontier.

    PubMed

    Li, Ming; Chou, Judy; Jing, Jing; Xu, Hui; Costa, Aldo; Caputo, Robin; Mikkilineni, Rajesh; Flannelly-King, Shane; Rohde, Ellen; Gan, Lawrence; Klunk, Lewis; Yang, Liyu

    2012-06-01

    In recent years, there has been a growing interest in automating small-molecule bioanalytical sample preparations specifically using the Hamilton MicroLab(®) STAR liquid-handling platform. In the most extensive work reported thus far, multiple small-molecule sample preparation assay types (protein precipitation extraction, SPE and liquid-liquid extraction) have been integrated into a suite that is composed of graphical user interfaces and Hamilton scripts. Using that suite, bioanalytical scientists have been able to automate various sample preparation methods to a great extent. However, there are still areas that could benefit from further automation, specifically, the full integration of analytical standard and QC sample preparation with study sample extraction in one continuous run, real-time 2D barcode scanning on the Hamilton deck and direct Laboratory Information Management System database connectivity. We developed a new small-molecule sample-preparation automation system that improves in all of the aforementioned areas. The improved system presented herein further streamlines the bioanalytical workflow, simplifies batch run design, reduces analyst intervention and eliminates sample-handling error.

  3. Personalized Guideline-Based Treatment Recommendations Using Natural Language Processing Techniques.

    PubMed

    Becker, Matthias; Böckmann, Britta

    2017-01-01

    Clinical guidelines and clinical pathways are accepted and proven instruments for quality assurance and process optimization. Today, electronic representation of clinical guidelines exists as unstructured text, but is not well-integrated with patient-specific information from electronic health records. Consequently, generic content of the clinical guidelines is accessible, but it is not possible to visualize the position of the patient on the clinical pathway, decision support cannot be provided by personalized guidelines for the next treatment step. The Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) provides common reference terminology as well as the semantic link for combining the pathways and the patient-specific information. This paper proposes a model-based approach to support the development of guideline-compliant pathways combined with patient-specific structured and unstructured information using SNOMED CT. To identify SNOMED CT concepts, a software was developed to extract SNOMED CT codes out of structured and unstructured German data to map these with clinical pathways annotated in accordance with the systematized nomenclature.

  4. Database integration in a multimedia-modeling environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dorow, Kevin E.

    2002-09-02

    Integration of data from disparate remote sources has direct applicability to modeling, which can support Brownfield assessments. To accomplish this task, a data integration framework needs to be established. A key element in this framework is the metadata that creates the relationship between the pieces of information that are important in the multimedia modeling environment and the information that is stored in the remote data source. The design philosophy is to allow modelers and database owners to collaborate by defining this metadata in such a way that allows interaction between their components. The main parts of this framework include toolsmore » to facilitate metadata definition, database extraction plan creation, automated extraction plan execution / data retrieval, and a central clearing house for metadata and modeling / database resources. Cross-platform compatibility (using Java) and standard communications protocols (http / https) allow these parts to run in a wide variety of computing environments (Local Area Networks, Internet, etc.), and, therefore, this framework provides many benefits. Because of the specific data relationships described in the metadata, the amount of data that have to be transferred is kept to a minimum (only the data that fulfill a specific request are provided as opposed to transferring the complete contents of a data source). This allows for real-time data extraction from the actual source. Also, the framework sets up collaborative responsibilities such that the different types of participants have control over the areas in which they have domain knowledge-the modelers are responsible for defining the data relevant to their models, while the database owners are responsible for mapping the contents of the database using the metadata definitions. Finally, the data extraction mechanism allows for the ability to control access to the data and what data are made available.« less

  5. A protocol for pressurized liquid extraction and processing methods to isolate modern and ancient bone cholesterol for compound-specific stable isotope analysis.

    PubMed

    Laffey, Ann O; Krigbaum, John; Zimmerman, Andrew R

    2017-02-15

    Bone lipid compound-specific isotope analysis (CSIA) and bone collagen and apatite stable isotope ratio analysis are important sources of ecological and paleodietary information. Pressurized liquid extraction (PLE) is quicker and utilizes less solvent than traditional methods of lipid extraction such as soxhlet and ultrasonication. This study facilitates dietary analysis by optimizing and testing a standardized methodology for PLE of bone cholesterol. Modern and archaeological bones were extracted by PLE using varied temperatures, solvent solutions, and sample weights. The efficiency of PLE was assessed via quantification of cholesterol yields. Stable isotopic ratio integrity was evaluated by comparing isotopic signatures (δ 13 C and δ 18 O values) of cholesterol derived from whole bone, bone collagen and bone apatite. Gas chromatography/mass spectrometry (GC/MS) and gas chromatography isotope ratio mass spectrometry (GC/IRMS) were conducted on purified collagen and lipid extracts to assess isotopic responses to PLE. Lipid yield was optimized at two PLE extraction cycles of 75 °C using dichloromethane/methanol (2:1 v/v) as a solvent with 0.25-0.75 g bone sample. Following lipid extraction, saponification combined with the derivatization of the neutral fraction using trimethylsilylation yielded nearly twice the cholesterol of non-saponified or non-derivatized samples. It was also found that lipids extracted from purified bone collagen and apatite could be used for cholesterol CSIA. There was no difference in the bulk δ 13 C values of collagen extracted from bone with or without lipid. However, there was a significant depletion in 18 O of bone apatite due to lipid presence or processing. These results should assist sample selection and provide an effective, alternative extraction method for bone cholesterol that may be used for isotopic and paleodietary analysis. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  6. Is There a European View on Health Economic Evaluations? Results from a Synopsis of Methodological Guidelines Used in the EUnetHTA Partner Countries.

    PubMed

    Heintz, Emelie; Gerber-Grote, Andreas; Ghabri, Salah; Hamers, Francoise F; Rupel, Valentina Prevolnik; Slabe-Erker, Renata; Davidson, Thomas

    2016-01-01

    The objectives of this study were to review current methodological guidelines for economic evaluations of all types of technologies in the 33 countries with organizations involved in the European Network for Health Technology Assessment (EUnetHTA), and to provide a general framework for economic evaluation at a European level. Methodological guidelines for health economic evaluations used by EUnetHTA partners were collected through a survey. Information from each guideline was extracted using a pre-tested extraction template. On the basis of the extracted information, a summary describing the methods used by the EUnetHTA countries was written for each methodological item. General recommendations were formulated for methodological issues where the guidelines of the EUnetHTA partners were in agreement or where the usefulness of economic evaluations may be increased by presenting the results in a specific way. At least one contact person from all 33 EUnetHTA countries (100 %) responded to the survey. In total, the review included 51 guidelines, representing 25 countries (eight countries had no methodological guideline for health economic evaluations). On the basis of the results of the extracted information from all 51 guidelines, EUnetHTA issued ten main recommendations for health economic evaluations. The presented review of methodological guidelines for health economic evaluations and the consequent recommendations will hopefully improve the comparability, transferability and overall usefulness of economic evaluations performed within EUnetHTA. Nevertheless, there are still methodological issues that need to be investigated further.

  7. Sortal anaphora resolution to enhance relation extraction from biomedical literature.

    PubMed

    Kilicoglu, Halil; Rosemblat, Graciela; Fiszman, Marcelo; Rindflesch, Thomas C

    2016-04-14

    Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.

  8. Information extraction from multi-institutional radiology reports.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Review of Extracting Information From the Social Web for Health Personalization

    PubMed Central

    Karlsen, Randi; Bonander, Jason

    2011-01-01

    In recent years the Web has come into its own as a social platform where health consumers are actively creating and consuming Web content. Moreover, as the Web matures, consumers are gaining access to personalized applications adapted to their health needs and interests. The creation of personalized Web applications relies on extracted information about the users and the content to personalize. The Social Web itself provides many sources of information that can be used to extract information for personalization apart from traditional Web forms and questionnaires. This paper provides a review of different approaches for extracting information from the Social Web for health personalization. We reviewed research literature across different fields addressing the disclosure of health information in the Social Web, techniques to extract that information, and examples of personalized health applications. In addition, the paper includes a discussion of technical and socioethical challenges related to the extraction of information for health personalization. PMID:21278049

  10. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text

    PubMed Central

    Divita, Guy; Carter, Marjorie E.; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H.; Gundlapalli, Adi V.

    2016-01-01

    Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records. PMID:27683667

  11. Usability-driven pruning of large ontologies: the case of SNOMED CT.

    PubMed

    López-García, Pablo; Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan

    2012-06-01

    To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Graph-traversal heuristics provided high coverage (71-96% of terms in the test sets of discharge summaries) at the expense of subset size (17-51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24-55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available.

  12. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  13. Prediction of isometric motor tasks and effort levels based on high-density EMG in patients with incomplete spinal cord injury

    NASA Astrophysics Data System (ADS)

    Jordanić, Mislav; Rojas-Martínez, Mónica; Mañanas, Miguel Angel; Francesc Alonso, Joan

    2016-08-01

    Objective. The development of modern assistive and rehabilitation devices requires reliable and easy-to-use methods to extract neural information for control of devices. Group-specific pattern recognition identifiers are influenced by inter-subject variability. Based on high-density EMG (HD-EMG) maps, our research group has already shown that inter-subject muscle activation patterns exist in a population of healthy subjects. The aim of this paper is to analyze muscle activation patterns associated with four tasks (flexion/extension of the elbow, and supination/pronation of the forearm) at three different effort levels in a group of patients with incomplete Spinal Cord Injury (iSCI). Approach. Muscle activation patterns were evaluated by the automatic identification of these four isometric tasks along with the identification of levels of voluntary contractions. Two types of classifiers were considered in the identification: linear discriminant analysis and support vector machine. Main results. Results show that performance of classification increases when combining features extracted from intensity and spatial information of HD-EMG maps (accuracy = 97.5%). Moreover, when compared to a population with injuries at different levels, a lower variability between activation maps was obtained within a group of patients with similar injury suggesting stronger task-specific and effort-level-specific co-activation patterns, which enable better prediction results. Significance. Despite the challenge of identifying both the four tasks and the three effort levels in patients with iSCI, promising results were obtained which support the use of HD-EMG features for providing useful information regarding motion and force intention.

  14. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media

    PubMed Central

    Cameron, Delroy; Smith, Gary A.; Daniulaityte, Raminta; Sheth, Amit P.; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z.; Falck, Russel

    2013-01-01

    Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel Semantic Web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO) (pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC). A combination of lexical, pattern-based and semantics-based techniques is used together with the domain knowledge to extract fine-grained semantic information from UGC. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, routes of administration, etc. The DAO is also used to help recognize three types of data, namely: 1) entities, 2) relationships and 3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information from UGC, and querying, search, trend analysis and overall content analysis of social media related to prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. PMID:23892295

  15. High-Resolution Remote Sensing Image Building Extraction Based on Markov Model

    NASA Astrophysics Data System (ADS)

    Zhao, W.; Yan, L.; Chang, Y.; Gong, L.

    2018-04-01

    With the increase of resolution, remote sensing images have the characteristics of increased information load, increased noise, more complex feature geometry and texture information, which makes the extraction of building information more difficult. To solve this problem, this paper designs a high resolution remote sensing image building extraction method based on Markov model. This method introduces Contourlet domain map clustering and Markov model, captures and enhances the contour and texture information of high-resolution remote sensing image features in multiple directions, and further designs the spectral feature index that can characterize "pseudo-buildings" in the building area. Through the multi-scale segmentation and extraction of image features, the fine extraction from the building area to the building is realized. Experiments show that this method can restrain the noise of high-resolution remote sensing images, reduce the interference of non-target ground texture information, and remove the shadow, vegetation and other pseudo-building information, compared with the traditional pixel-level image information extraction, better performance in building extraction precision, accuracy and completeness.

  16. Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species.

    PubMed

    Ludeña-Choez, Jimmy; Quispe-Soncco, Raisa; Gallardo-Antolín, Ascensión

    2017-01-01

    Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorporates this specific information through the non-negative decomposition of bird sound spectrograms. It consists of the following two different stages: short-time feature extraction and temporal feature integration. In the first stage, which aims at providing a better spectral representation of bird sounds on a frame-by-frame basis, two methods are evaluated. In the first method, cepstral-like features (NMF_CC) are extracted by using a filter bank that is automatically learned by means of the application of Non-Negative Matrix Factorization (NMF) on bird audio spectrograms. In the second method, the features are directly derived from the activation coefficients of the spectrogram decomposition as performed through NMF (H_CC). The second stage summarizes the most relevant information contained in the short-time features by computing several statistical measures over long segments. The experiments show that the use of NMF_CC and H_CC in conjunction with temporal integration significantly improves the performance of a Support Vector Machine (SVM)-based ABSC system with respect to conventional MFCC.

  17. Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species

    PubMed Central

    Quispe-Soncco, Raisa

    2017-01-01

    Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorporates this specific information through the non-negative decomposition of bird sound spectrograms. It consists of the following two different stages: short-time feature extraction and temporal feature integration. In the first stage, which aims at providing a better spectral representation of bird sounds on a frame-by-frame basis, two methods are evaluated. In the first method, cepstral-like features (NMF_CC) are extracted by using a filter bank that is automatically learned by means of the application of Non-Negative Matrix Factorization (NMF) on bird audio spectrograms. In the second method, the features are directly derived from the activation coefficients of the spectrogram decomposition as performed through NMF (H_CC). The second stage summarizes the most relevant information contained in the short-time features by computing several statistical measures over long segments. The experiments show that the use of NMF_CC and H_CC in conjunction with temporal integration significantly improves the performance of a Support Vector Machine (SVM)-based ABSC system with respect to conventional MFCC. PMID:28628630

  18. Antimicrobial potential of Australian macrofungi extracts against foodborne and other pathogens.

    PubMed

    Bala, Neeraj; Aitken, Elizabeth A B; Cusack, Andrew; Steadman, Kathryn J

    2012-03-01

    Basidiomycetous macrofungi have therapeutic potential due to antimicrobial activity but little information is available for Australian macrofungi. Therefore, the present study investigated 12 Australian basidiomycetous macrofungi, previously shown to have promising activity against Staphylococcus aureus and Escherichia coli, for their antimicrobial potential against a range of other clinically relevant micro-organisms. Fruiting bodies were collected from across Queensland, Australia, freeze-dried and sequentially extracted with water and ethanol. The crude extracts were tested at 10 mg/mL and 1 mg/mL against six pathogens including two Gram-positive and two Gram-negative bacteria along with two fungi using a high throughput 96-well microplate bioassay. A degree of specificity in activity was exhibited by the water extract of Ramaria sp. (Gomphaceae) and the ethanol extracts of Psathyrella sp. (Psathyrellaceae) and Hohenbuehelia sp., which inhibited the growth of the two fungal pathogens used in the assay. Similarly, the ethanol extract of Fomitopsis lilacinogilva (Fomitopsidaceae) was active against the Gram-positive bacteria B. cereus only. Activity against a wider range of the microorganisms used in the assay was exhibited by the ethanol extract of Ramaria sp. and the water extract of Hohenbuehelia sp. (Pleurotaceae). These macrofungi can serve as new sources for the discovery and development of much needed new antimicrobials. Copyright © 2011 John Wiley & Sons, Ltd.

  19. Automatic Requirements Specification Extraction from Natural Language (ARSENAL)

    DTIC Science & Technology

    2014-10-01

    designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise...communication of technical descriptions between the various stakeholders (e.g., customers, designers, imple- menters) involved in the design of software systems...the accuracy of the natural language processing stage, the degree of automation, and robustness to noise. 1 2 Introduction Software systems operate in

  20. Gallbladder Boundary Segmentation from Ultrasound Images Using Active Contour Model

    NASA Astrophysics Data System (ADS)

    Ciecholewski, Marcin

    Extracting the shape of the gallbladder from an ultrasonography (US) image allows superfluous information which is immaterial in the diagnostic process to be eliminated. In this project an active contour model was used to extract the shape of the gallbladder, both for cases free of lesions, and for those showing specific disease units, namely: lithiasis, polyps and changes in the shape of the organ, such as folds or turns of the gallbladder. The approximate shape of the gallbladder was found by applying the motion equation model. The tests conducted have shown that for the 220 US images of the gallbladder, the area error rate (AER) amounted to 18.15%.

  1. A mask quality control tool for the OSIRIS multi-object spectrograph

    NASA Astrophysics Data System (ADS)

    López-Ruiz, J. C.; Vaz Cedillo, Jacinto Javier; Ederoclite, Alessandro; Bongiovanni, Ángel; González Escalera, Víctor

    2012-09-01

    OSIRIS multi object spectrograph uses a set of user-customised-masks, which are manufactured on-demand. The manufacturing process consists of drilling the specified slits on the mask with the required accuracy. Ensuring that slits are on the right place when observing is of vital importance. We present a tool for checking the quality of the process of manufacturing the masks which is based on analyzing the instrument images obtained with the manufactured masks on place. The tool extracts the slit information from these images, relates specifications with the extracted slit information, and finally communicates to the operator if the manufactured mask fulfills the expectations of the mask designer. The proposed tool has been built using scripting languages and using standard libraries such as opencv, pyraf and scipy. The software architecture, advantages and limits of this tool in the lifecycle of a multiobject acquisition are presented.

  2. Searching and Extracting Data from the EMBL-EBI Complex Portal.

    PubMed

    Meldal, Birgit H M; Orchard, Sandra

    2018-01-01

    The Complex Portal ( www.ebi.ac.uk/complexportal ) is an encyclopedia of macromolecular complexes. Complexes are assigned unique, stable IDs, are species specific, and list all participating members with links to an appropriate reference database (UniProtKB, ChEBI, RNAcentral). Each complex is annotated extensively with its functions, properties, structure, stoichiometry, tissue expression profile, and subcellular location. Links to domain-specific databases allow the user to access additional information and enable data searching and filtering. Complexes can be saved and downloaded in PSI-MI XML, MI-JSON, and tab-delimited formats.

  3. A rapid extraction of landslide disaster information research based on GF-1 image

    NASA Astrophysics Data System (ADS)

    Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na

    2015-08-01

    In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.

  4. Deep Learning from EEG Reports for Inferring Underspecified Information

    PubMed Central

    Goodwin, Travis R.; Harabagiu, Sanda M.

    2017-01-01

    Secondary use1of electronic health records (EHRs) often relies on the ability to automatically identify and extract information from EHRs. Unfortunately, EHRs are known to suffer from a variety of idiosyncrasies – most prevalently, they have been shown to often omit or underspecify information. Adapting traditional machine learning methods for inferring underspecified information relies on manually specifying features characterizing the specific information to recover (e.g. particular findings, test results, or physician’s impressions). By contrast, in this paper, we present a method for jointly (1) automatically extracting word- and report-level features and (2) inferring underspecified information from EHRs. Our approach accomplishes these two tasks jointly by combining recent advances in deep neural learning with access to textual data in electroencephalogram (EEG) reports. We evaluate the performance of our model on the problem of inferring the neurologist’s over-all impression (normal or abnormal) from electroencephalogram (EEG) reports and report an accuracy of 91.4% precision of 94.4% recall of 91.2% and F1 measure of 92.8% (a 40% improvement over the performance obtained using Doc2Vec). These promising results demonstrate the power of our approach, while error analysis reveals remaining obstacles as well as areas for future improvement. PMID:28815118

  5. Development and evaluation of task-specific NLP framework in China.

    PubMed

    Ge, Caixia; Zhang, Yinsheng; Huang, Zhenzhen; Jia, Zheng; Ju, Meizhi; Duan, Huilong; Li, Haomin

    2015-01-01

    Natural language processing (NLP) has been designed to convert narrative text into structured data. Although some general NLP architectures have been developed, a task-specific NLP framework to facilitate the effective use of data is still a challenge in lexical resource limited regions, such as China. The purpose of this study is to design and develop a task-specific NLP framework to extract targeted information from particular documents by adopting dedicated algorithms on current limited lexical resources. In this framework, a shared and evolving ontology mechanism was designed. The result has shown that such a free text driven platform will accelerate the NLP technology acceptance in China.

  6. Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

    ERIC Educational Resources Information Center

    Sun, Chong

    2012-01-01

    More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…

  7. Conjugation of Specifically Developed Antibodies for High- and Low-Molecular-Weight Glutenins with Fluorescent Quantum Dots as a Tool for Their Detection in Wheat Flour Dough.

    PubMed

    Bonilla, Jose C; Ryan, Valerie; Yazar, Gamze; Kokini, Jozef L; Bhunia, Arun K

    2018-04-25

    The importance of gluten proteins, gliadins and glutenins, is well-known in the quality of wheat products. To gain more specific information about the role of glutenins in wheat dough, the two major subunits of glutenin, high- and low-molecular-weight (HMW and LMW) glutenins, were extracted, isolated, and identified by mass spectrometry. Antibodies for HMW and LMW glutenins were developed using the proteomic information on the characterized glutenin subunits. The antibodies were found to be specific to each subunit by western immunoblots and were then conjugated to quantum dots (QDs) using site-click conjugation, a new method to keep antibody integrity. A fluorescence-link immunosorbent assay tested the successful QD conjugation. The QD-conjugated antibodies were applied to dough samples, where they recognized glutenin subunits and were visualized using a confocal laser scanning microscope.

  8. Extracting Useful Semantic Information from Large Scale Corpora of Text

    ERIC Educational Resources Information Center

    Mendoza, Ray Padilla, Jr.

    2012-01-01

    Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…

  9. Validation of Shared and Specific Independent Component Analysis (SSICA) for Between-Group Comparisons in fMRI

    PubMed Central

    Maneshi, Mona; Vahdat, Shahabeddin; Gotman, Jean; Grova, Christophe

    2016-01-01

    Independent component analysis (ICA) has been widely used to study functional magnetic resonance imaging (fMRI) connectivity. However, the application of ICA in multi-group designs is not straightforward. We have recently developed a new method named “shared and specific independent component analysis” (SSICA) to perform between-group comparisons in the ICA framework. SSICA is sensitive to extract those components which represent a significant difference in functional connectivity between groups or conditions, i.e., components that could be considered “specific” for a group or condition. Here, we investigated the performance of SSICA on realistic simulations, and task fMRI data and compared the results with one of the state-of-the-art group ICA approaches to infer between-group differences. We examined SSICA robustness with respect to the number of allowable extracted specific components and between-group orthogonality assumptions. Furthermore, we proposed a modified formulation of the back-reconstruction method to generate group-level t-statistics maps based on SSICA results. We also evaluated the consistency and specificity of the extracted specific components by SSICA. The results on realistic simulated and real fMRI data showed that SSICA outperforms the regular group ICA approach in terms of reconstruction and classification performance. We demonstrated that SSICA is a powerful data-driven approach to detect patterns of differences in functional connectivity across groups/conditions, particularly in model-free designs such as resting-state fMRI. Our findings in task fMRI show that SSICA confirms results of the general linear model (GLM) analysis and when combined with clustering analysis, it complements GLM findings by providing additional information regarding the reliability and specificity of networks. PMID:27729843

  10. Information Science Panel joint meeting with Imaging Science Panel

    NASA Technical Reports Server (NTRS)

    1982-01-01

    Specific activity in information extraction science (taken to include data handling) is needed to: help identify the bounds of practical missions; identify potential data handling and analysis scenarios; identify the required enabling technology; and identify the requirements for a design data base to be used by the disciplines in determining potential parameters for future missions. It was defined that specific analysis topics were a function of the discipline involved, and therefore no attempt was made to define any specific analysis developments required. Rather, it was recognized that a number of generic data handling requirements exist whose solutions cannot be typically supported by the disciplines. The areas of concern were therefore defined as: data handling aspects of system design considerations; enabling technology for data handling, with specific attention to rectification and registration; and enabling technology for analysis. Within each of these areas, the following topics were addressed: state of the art (current status and contributing factors); critical issues; and recommendations for research and/or development.

  11. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    NASA Astrophysics Data System (ADS)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  12. Text Mining for Protein Docking

    PubMed Central

    Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.

    2015-01-01

    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466

  13. Hyperpolarized xenon NMR and MRI signal amplification by gas extraction

    PubMed Central

    Zhou, Xin; Graziani, Dominic; Pines, Alexander

    2009-01-01

    A method is reported for enhancing the sensitivity of NMR of dissolved xenon by detecting the signal after extraction to the gas phase. We demonstrate hyperpolarized xenon signal amplification by gas extraction (Hyper-SAGE) in both NMR spectra and magnetic resonance images with time-of-flight information. Hyper-SAGE takes advantage of a change in physical phase to increase the density of polarized gas in the detection coil. At equilibrium, the concentration of gas-phase xenon is ≈10 times higher than that of the dissolved-phase gas. After extraction the xenon density can be further increased by several orders of magnitude by compression and/or liquefaction. Additionally, being a remote detection technique, the Hyper-SAGE effect is further enhanced in situations where the sample of interest would occupy only a small proportion of the traditional NMR receiver. Coupled with targeted xenon biosensors, Hyper-SAGE offers another path to highly sensitive molecular imaging of specific cell markers by detection of exhaled xenon gas. PMID:19805177

  14. The BioLexicon: a large-scale terminological resource for biomedical text mining

    PubMed Central

    2011-01-01

    Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. PMID:21992002

  15. The BioLexicon: a large-scale terminological resource for biomedical text mining.

    PubMed

    Thompson, Paul; McNaught, John; Montemagni, Simonetta; Calzolari, Nicoletta; del Gratta, Riccardo; Lee, Vivian; Marchi, Simone; Monachini, Monica; Pezik, Piotr; Quochi, Valeria; Rupp, C J; Sasaki, Yutaka; Venturi, Giulia; Rebholz-Schuhmann, Dietrich; Ananiadou, Sophia

    2011-10-12

    Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.

  16. A extract method of mountainous area settlement place information from GF-1 high resolution optical remote sensing image under semantic constraints

    NASA Astrophysics Data System (ADS)

    Guo, H., II

    2016-12-01

    Spatial distribution information of mountainous area settlement place is of great significance to the earthquake emergency work because most of the key earthquake hazardous areas of china are located in the mountainous area. Remote sensing has the advantages of large coverage and low cost, it is an important way to obtain the spatial distribution information of mountainous area settlement place. At present, fully considering the geometric information, spectral information and texture information, most studies have applied object-oriented methods to extract settlement place information, In this article, semantic constraints is to be added on the basis of object-oriented methods. The experimental data is one scene remote sensing image of domestic high resolution satellite (simply as GF-1), with a resolution of 2 meters. The main processing consists of 3 steps, the first is pretreatment, including ortho rectification and image fusion, the second is Object oriented information extraction, including Image segmentation and information extraction, the last step is removing the error elements under semantic constraints, in order to formulate these semantic constraints, the distribution characteristics of mountainous area settlement place must be analyzed and the spatial logic relation between settlement place and other objects must be considered. The extraction accuracy calculation result shows that the extraction accuracy of object oriented method is 49% and rise up to 86% after the use of semantic constraints. As can be seen from the extraction accuracy, the extract method under semantic constraints can effectively improve the accuracy of mountainous area settlement place information extraction. The result shows that it is feasible to extract mountainous area settlement place information form GF-1 image, so the article proves that it has a certain practicality to use domestic high resolution optical remote sensing image in earthquake emergency preparedness.

  17. Analysis of Non-Tactical Vehicle Utilization at Fort Carson Colorado

    DTIC Science & Technology

    2012-01-01

    regenerative braking energy recovery. The mass of the vehicles monitored in this study was not known. However, some useful information may be... regenerative energy recovery potential for specific duty cycles was also quantified through a cumulative assessment of the number and severity of deceleration...extracted on usage time, distance, vehicle speed and geographic location in order to compare vehicle driving profiles. The regenerative energy recovery

  18. Proton chemical shift tensors determined by 3D ultrafast MAS double-quantum NMR spectroscopy

    NASA Astrophysics Data System (ADS)

    Zhang, Rongchun; Mroue, Kamal H.; Ramamoorthy, Ayyalusamy

    2015-10-01

    Proton NMR spectroscopy in the solid state has recently attracted much attention owing to the significant enhancement in spectral resolution afforded by the remarkable advances in ultrafast magic angle spinning (MAS) capabilities. In particular, proton chemical shift anisotropy (CSA) has become an important tool for obtaining specific insights into inter/intra-molecular hydrogen bonding. However, even at the highest currently feasible spinning frequencies (110-120 kHz), 1H MAS NMR spectra of rigid solids still suffer from poor resolution and severe peak overlap caused by the strong 1H-1H homonuclear dipolar couplings and narrow 1H chemical shift (CS) ranges, which render it difficult to determine the CSA of specific proton sites in the standard CSA/single-quantum (SQ) chemical shift correlation experiment. Herein, we propose a three-dimensional (3D) 1H double-quantum (DQ) chemical shift/CSA/SQ chemical shift correlation experiment to extract the CS tensors of proton sites whose signals are not well resolved along the single-quantum chemical shift dimension. As extracted from the 3D spectrum, the F1/F3 (DQ/SQ) projection provides valuable information about 1H-1H proximities, which might also reveal the hydrogen-bonding connectivities. In addition, the F2/F3 (CSA/SQ) correlation spectrum, which is similar to the regular 2D CSA/SQ correlation experiment, yields chemical shift anisotropic line shapes at different isotropic chemical shifts. More importantly, since the F2/F1 (CSA/DQ) spectrum correlates the CSA with the DQ signal induced by two neighboring proton sites, the CSA spectrum sliced at a specific DQ chemical shift position contains the CSA information of two neighboring spins indicated by the DQ chemical shift. If these two spins have different CS tensors, both tensors can be extracted by numerical fitting. We believe that this robust and elegant single-channel proton-based 3D experiment provides useful atomistic-level structural and dynamical information for a variety of solid systems that possess high proton density.

  19. Proton chemical shift tensors determined by 3D ultrafast MAS double-quantum NMR spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Rongchun; Mroue, Kamal H.; Ramamoorthy, Ayyalusamy, E-mail: ramamoor@umich.edu

    2015-10-14

    Proton NMR spectroscopy in the solid state has recently attracted much attention owing to the significant enhancement in spectral resolution afforded by the remarkable advances in ultrafast magic angle spinning (MAS) capabilities. In particular, proton chemical shift anisotropy (CSA) has become an important tool for obtaining specific insights into inter/intra-molecular hydrogen bonding. However, even at the highest currently feasible spinning frequencies (110–120 kHz), {sup 1}H MAS NMR spectra of rigid solids still suffer from poor resolution and severe peak overlap caused by the strong {sup 1}H–{sup 1}H homonuclear dipolar couplings and narrow {sup 1}H chemical shift (CS) ranges, which rendermore » it difficult to determine the CSA of specific proton sites in the standard CSA/single-quantum (SQ) chemical shift correlation experiment. Herein, we propose a three-dimensional (3D) {sup 1}H double-quantum (DQ) chemical shift/CSA/SQ chemical shift correlation experiment to extract the CS tensors of proton sites whose signals are not well resolved along the single-quantum chemical shift dimension. As extracted from the 3D spectrum, the F1/F3 (DQ/SQ) projection provides valuable information about {sup 1}H–{sup 1}H proximities, which might also reveal the hydrogen-bonding connectivities. In addition, the F2/F3 (CSA/SQ) correlation spectrum, which is similar to the regular 2D CSA/SQ correlation experiment, yields chemical shift anisotropic line shapes at different isotropic chemical shifts. More importantly, since the F2/F1 (CSA/DQ) spectrum correlates the CSA with the DQ signal induced by two neighboring proton sites, the CSA spectrum sliced at a specific DQ chemical shift position contains the CSA information of two neighboring spins indicated by the DQ chemical shift. If these two spins have different CS tensors, both tensors can be extracted by numerical fitting. We believe that this robust and elegant single-channel proton-based 3D experiment provides useful atomistic-level structural and dynamical information for a variety of solid systems that possess high proton density.« less

  20. Testing the Technology Acceptance Model: HIV case managers' intention to use a continuity of care record with context-specific links.

    PubMed

    Schnall, Rebecca; Bakken, Suzanne

    2011-09-01

    To assess the applicability of the Technology Acceptance Model (TAM) constructs in explaining HIV case managers' behavioural intention to use a continuity of care record (CCR) with context-specific links designed to meet their information needs. Data were collected from 94 case managers who provide care to persons living with HIV (PLWH) using an online survey comprising three components: (1) demographic information: age, gender, ethnicity, race, Internet usage and computer experience; (2) mock-up of CCR with context-specific links; and items related to TAM constructs. Data analysis included: principal components factor analysis (PCA), assessment of internal consistency reliability and univariate and multivariate analysis. PCA extracted three factors (Perceived Ease of Use, Perceived Usefulness and Perceived Barriers to Use), explained variance = 84.9%, Cronbach's ά = 0.69-0.91. In a linear regression model, Perceived Ease of Use, Perceived Usefulness and Perceived Barriers to Use explained 43.6% (p < 0.001) of the variance in Behavioural Intention to use a CCR with context-specific links. Our study contributes to the evidence base regarding TAM in health care through expanding the type of professional surveyed, study setting and Health Information Technology assessed.

  1. Application of MPEG-7 descriptors for content-based indexing of sports videos

    NASA Astrophysics Data System (ADS)

    Hoeynck, Michael; Auweiler, Thorsten; Ohm, Jens-Rainer

    2003-06-01

    The amount of multimedia data available worldwide is increasing every day. There is a vital need to annotate multimedia data in order to allow universal content access and to provide content-based search-and-retrieval functionalities. Since supervised video annotation can be time consuming, an automatic solution is appreciated. We review recent approaches to content-based indexing and annotation of videos for different kind of sports, and present our application for the automatic annotation of equestrian sports videos. Thereby, we especially concentrate on MPEG-7 based feature extraction and content description. We apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information and taking specific domain knowledge into account. Having determined single shot positions as well as the visual highlights, the information is jointly stored together with additional textual information in an MPEG-7 description scheme. Using this information, we generate content summaries which can be utilized in a user front-end in order to provide content-based access to the video stream, but further content-based queries and navigation on a video-on-demand streaming server.

  2. Characterizing rainfall in the Tenerife island

    NASA Astrophysics Data System (ADS)

    Díez-Sierra, Javier; del Jesus, Manuel; Losada Rodriguez, Inigo

    2017-04-01

    In many locations, rainfall data are collected through networks of meteorological stations. The data collection process is nowadays automated in many places, leading to the development of big databases of rainfall data covering extensive areas of territory. However, managers, decision makers and engineering consultants tend not to extract most of the information contained in these databases due to the lack of specific software tools for their exploitation. Here we present the modeling and development effort put in place in the Tenerife island in order to develop MENSEI-L, a software tool capable of automatically analyzing a complete rainfall database to simplify the extraction of information from observations. MENSEI-L makes use of weather type information derived from atmospheric conditions to separate the complete time series into homogeneous groups where statistical distributions are fitted. Normal and extreme regimes are obtained in this manner. MENSEI-L is also able to complete missing data in the time series and to generate synthetic stations by using Kriging techniques. These techniques also serve to generate the spatial regimes of precipitation, both normal and extreme ones. MENSEI-L makes use of weather type information to also provide a stochastic three-day probability forecast for rainfall.

  3. Autism, Context/Noncontext Information Processing, and Atypical Development

    PubMed Central

    Skoyles, John R.

    2011-01-01

    Autism has been attributed to a deficit in contextual information processing. Attempts to understand autism in terms of such a defect, however, do not include more recent computational work upon context. This work has identified that context information processing depends upon the extraction and use of the information hidden in higher-order (or indirect) associations. Higher-order associations underlie the cognition of context rather than that of situations. This paper starts by examining the differences between higher-order and first-order (or direct) associations. Higher-order associations link entities not directly (as with first-order ones) but indirectly through all the connections they have via other entities. Extracting this information requires the processing of past episodes as a totality. As a result, this extraction depends upon specialised extraction processes separate from cognition. This information is then consolidated. Due to this difference, the extraction/consolidation of higher-order information can be impaired whilst cognition remains intact. Although not directly impaired, cognition will be indirectly impaired by knock on effects such as cognition compensating for absent higher-order information with information extracted from first-order associations. This paper discusses the implications of this for the inflexible, literal/immediate, and inappropriate information processing of autistic individuals. PMID:22937255

  4. Pomological Traits, Sensory Profile and Nutraceutical Properties of Nine Cultivars of Loquat (Eriobotrya japonica Lindl.) Fruits Grown in Mediterranean Area.

    PubMed

    Gentile, C; Reig, C; Corona, O; Todaro, A; Mazzaglia, A; Perrone, A; Gianguzzi, G; Agusti, M; Farina, V

    2016-09-01

    In this paper the diversity of fruit quality within nine loquat cultivars, including five international affirmed cultivars (Algerie, Golden Nugget, Peluche, Bueno, El Buenet) and four local cultivars (Sanfilippara, Nespolone di Trabia, BRT20 and Claudia), were investigated in order to discriminate the variation in pomological characteristics, sensory profile, and antioxidant properties. Finally, to evaluate potential bioactivity, antiproliferative activity of hydrophilic extracts from loquat fruits was assessed, at dietary relevant concentrations, against three human epithelial cell lines. Even though the international cultivars confirmed an appropriate level of commercial qualities in association to high levels in antioxidant compounds, the local cultivars revealed the best performances in a wide range of chemical-physical and sensory characteristics. Concerning bioactivity, our results indicate that hydrophilic extracts from all tested cultivars showed concentration-dependent antiproliferative activity with a significant variability of effects between different cell lines and between different cultivars. HeLa cells, the most sensitive and hydrophilic extracts from Peluche, showed the highest inhibitory effect followed by Nespolone di Trabia and Claudia. The results of this trial provide useful information on the pomological traits and the not yet known specific nutritional and functional properties of loquat fruits. Our data, besides helping to promote specific local cultivars, could serve to establish a database that will permit to improve the utilization of specific genetic resources in breeding programs.

  5. Assistance to neurosurgical planning: using a fuzzy spatial graph model of the brain for locating anatomical targets in MRI

    NASA Astrophysics Data System (ADS)

    Villéger, Alice; Ouchchane, Lemlih; Lemaire, Jean-Jacques; Boire, Jean-Yves

    2007-03-01

    Symptoms of neurodegenerative pathologies such as Parkinson's disease can be relieved through Deep Brain Stimulation. This neurosurgical technique relies on high precision positioning of electrodes in specific areas of the basal ganglia and the thalamus. These subcortical anatomical targets must be located at pre-operative stage, from a set of MRI acquired under stereotactic conditions. In order to assist surgical planning, we designed a semi-automated image analysis process for extracting anatomical areas of interest. Complementary information, provided by both patient's data and expert knowledge, is represented as fuzzy membership maps, which are then fused by means of suitable possibilistic operators in order to achieve the segmentation of targets. More specifically, theoretical prior knowledge on brain anatomy is modelled within a 'virtual atlas' organised as a spatial graph: a list of vertices linked by edges, where each vertex represents an anatomical structure of interest and contains relevant information such as tissue composition, whereas each edge represents a spatial relationship between two structures, such as their relative directions. The model is built using heterogeneous sources of information such as qualitative descriptions from the expert, or quantitative information from prelabelled images. For each patient, tissue membership maps are extracted from MR data through a classification step. Prior model and patient's data are then matched by using a research algorithm (or 'strategy') which simultaneously computes an estimation of the location of every structures. The method was tested on 10 clinical images, with promising results. Location and segmentation results were statistically assessed, opening perspectives for enhancements.

  6. Reconstructing biochemical pathways from time course data.

    PubMed

    Srividhya, Jeyaraman; Crampin, Edmund J; McSharry, Patrick E; Schnell, Santiago

    2007-03-01

    Time series data on biochemical reactions reveal transient behavior, away from chemical equilibrium, and contain information on the dynamic interactions among reacting components. However, this information can be difficult to extract using conventional analysis techniques. We present a new method to infer biochemical pathway mechanisms from time course data using a global nonlinear modeling technique to identify the elementary reaction steps which constitute the pathway. The method involves the generation of a complete dictionary of polynomial basis functions based on the law of mass action. Using these basis functions, there are two approaches to model construction, namely the general to specific and the specific to general approach. We demonstrate that our new methodology reconstructs the chemical reaction steps and connectivity of the glycolytic pathway of Lactococcus lactis from time course experimental data.

  7. A Modified Protocol with Improved Detection Rate for Mis-Matched Donor HLA from Low Quantities of DNA in Urine Samples from Kidney Graft Recipients.

    PubMed

    Kwok, Janette; Choi, Leo C W; Ho, Jenny C Y; Chan, Gavin S W; Mok, Maggie M Y; Lam, Man-Fei; Chak, Wai-Leung; Cheuk, Au; Chau, Ka-Foon; Tong, Matthew; Chan, Kwok-Wah; Chan, Tak-Mao

    2016-01-01

    Urine from kidney transplant recipient has proven to be a viable source for donor DNA. However, an optimized protocol would be required to determine mis-matched donor HLA specificities in view of the scarcity of DNA obtained in some cases. In this study, fresh early morning urine specimens were obtained from 155 kidney transplant recipients with known donor HLA phenotype. DNA was extracted and typing of HLA-A, B and DRB1 loci by polymerase chain reaction-specific sequence primers was performed using tailor-made condition according to the concentration of extracted DNA. HLA typing of DNA extracted from urine revealed both recipient and donor HLA phenotypes, allowing the deduction of the unknown donor HLA and hence the degree of HLA mis-match. By adopting the modified procedures, mis-matched donor HLA phenotypes were successfully deduced in all of 35 tested urine samples at DNA quantities spanning the range of 620-24,000 ng. This urine-based method offers a promising and reliable non-invasive means for the identification of mis-matched donor HLA antigens in kidney transplant recipients with unknown donor HLA phenotype or otherwise inadequate donor information.

  8. Gender-specific automatic valence recognition of affective olfactory stimulation through the analysis of the electrodermal activity.

    PubMed

    Greco, Alberto; Lanata, Antonio; Valenza, Gaetano; Di Francesco, Fabio; Scilingo, Enzo Pasquale

    2016-08-01

    This study reports on the development of a gender-specific classification system able to discern between two valence levels of smell, through information gathered from electrodermal activity (EDA) dynamics. Specifically, two odorants were administered to 32 healthy volunteers (16 males) while monitoring EDA. CvxEDA model was used to process the EDA signal and extract features from both tonic and phasic components. The feature set was used as input to a K-NN classifier implementing a leave-one-subject-out procedure. Results show strong differences in the accuracy of valence recognition between men (62.5%) and women (78%). We can conclude that affective olfactory stimulation significantly affect EDA dynamics with a highly specific gender dependency.

  9. Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

    ERIC Educational Resources Information Center

    North, Jamie S.; Williams, A. Mark

    2008-01-01

    The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…

  10. Review on the Extraction Methods of Crude oil from all Generation Biofuels in last few Decades

    NASA Astrophysics Data System (ADS)

    Bhargavi, G.; Nageswara Rao, P.; Renganathan, S.

    2018-03-01

    The ever growing demand for the energy fuels, economy of oil, depletion of energy resources and environmental protection are the inevitable challenges required to be solved meticulously in future decades in order to sustain the life of humans and other creatures. Switching to alternate fuels that are renewable, biodegradable, economically and environmentally friendly can quench the minimum thirst of fuel demands, in addition to mitigation of climate changes. At this moment, production of biofuels has got prominence. The term biofuels broadly refer to the fuels derived from living matter either animals or plants. Among the competent biofuels, biodiesel is one of the promising alternates for diesel engines. Biodiesel is renewable, environmentally friendly, safe to use with wide applications and biodegradable. Due to which, it has become a major focus of intensive global research and development of alternate energy. The present review has been focused specifically on biodiesel. Concerning to the biodiesel production, the major steps includes lipid extraction followed by esterification/transesterification. For the extraction of lipids, several extraction techniques have been put forward irrespective of the generations and feed stocks used. This review provides theoretical background on the two major extraction methods, mechanical and chemical extraction methods. The practical issues of each extraction method such as efficiency of extraction, extraction time, oil sources and its pros and cons are discussed. It is conceived that congregating information on oil extraction methods may helpful in further research advancements to ease biofuel production.

  11. Can we replace curation with information extraction software?

    PubMed

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.

  12. Clinical records anonymisation and text extraction (CRATE): an open-source software system.

    PubMed

    Cardinal, Rudolf N

    2017-04-26

    Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.

  13. An audit of the reliability of influenza vaccination and medical information extracted from eHealth records in general practice.

    PubMed

    Regan, Annette K; Gibbs, Robyn A; Effler, Paul V

    2018-05-31

    To evaluate the reliability of information in general practice (GP) electronic health records (EHRs), 2100 adult patients were randomly selected for interview regarding the presence of specific medical conditions and recent influenza vaccination. Agreement between self-report and data extracted from EHRs was compared using Cohen's kappa coefficient (k) and interpreted in accordance with Altman's Kappa Benchmarking criteria; 377 (18%) patients declined participation, and 608 (29%) could not be contacted. Of 1115 (53%) remaining, 856 (77%) were active patients (≥3 visits to the GP practice in the last two years) who provided complete information for analysis. Although a higher proportion of patients self-reported being vaccinated or having a medical condition compared to the EHR (50.7% vs 36.9%, and 39.4% vs 30.3%, respectively), there was "good" agreement between self-report and EHR for both vaccination status (κ = 0.67) and medical conditions (κ = 0.66). These findings suggest EHR may be useful for public health surveillance. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.

  14. Combining Experiments and Simulations of Extraction Kinetics and Thermodynamics in Advanced Separation Processes for Used Nuclear Fuel

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nilsson, Mikael

    This 3-year project was a collaboration between University of California Irvine (UC Irvine), Pacific Northwest National Laboratory (PNNL), Idaho National Laboratory (INL), Argonne National Laboratory (ANL) and with an international collaborator at ForschungZentrum Jülich (FZJ). The project was led from UC Irvine under the direction of Profs. Mikael Nilsson and Hung Nguyen. The leads at PNNL, INL, ANL and FZJ were Dr. Liem Dang, Dr. Peter Zalupski, Dr. Nathaniel Hoyt and Dr. Giuseppe Modolo, respectively. Involved in this project at UC Irvine were three full time PhD graduate students, Tro Babikian, Ted Yoo, and Quynh Vo, and one MS student,more » Alba Font Bosch. The overall objective of this project was to study how the kinetics and thermodynamics of metal ion extraction can be described by molecular dynamic (MD) simulations and how the simulations can be validated by experimental data. Furthermore, the project includes the applied separation by testing the extraction systems in a single stage annular centrifugal contactor and coupling the experimental data with computational fluid dynamic (CFD) simulations. Specific objectives of the proposed research were: Study and establish a rigorous connection between MD simulations based on polarizable force fields and extraction thermodynamic and kinetic data. Compare and validate CFD simulations of extraction processes for An/Ln separation using different sizes (and types) of annular centrifugal contactors. Provide a theoretical/simulation and experimental base for scale-up of batch-wise extraction to continuous contactors. We approached objective 1 and 2 in parallel. For objective 1 we started by studying a well established extraction system with a relatively simple extraction mechanism, namely tributyl phosphate. What we found was that well optimized simulations can inform experiments and new information on TBP behavior was presented in this project, as well be discussed below. The second objective proved a larger challenge and most of the efforts were devoted to experimental studies.« less

  15. Ionization Electron Signal Processing in Single Phase LArTPCs II. Data/Simulation Comparison and Performance in MicroBooNE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Adams, C.; et al.

    The single-phase liquid argon time projection chamber (LArTPC) provides a large amount of detailed information in the form of fine-grained drifted ionization charge from particle traces. To fully utilize this information, the deposited charge must be accurately extracted from the raw digitized waveforms via a robust signal processing chain. Enabled by the ultra-low noise levels associated with cryogenic electronics in the MicroBooNE detector, the precise extraction of ionization charge from the induction wire planes in a single-phase LArTPC is qualitatively demonstrated on MicroBooNE data with event display images, and quantitatively demonstrated via waveform-level and track-level metrics. Improved performance of inductionmore » plane calorimetry is demonstrated through the agreement of extracted ionization charge measurements across different wire planes for various event topologies. In addition to the comprehensive waveform-level comparison of data and simulation, a calibration of the cryogenic electronics response is presented and solutions to various MicroBooNE-specific TPC issues are discussed. This work presents an important improvement in LArTPC signal processing, the foundation of reconstruction and therefore physics analyses in MicroBooNE.« less

  16. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

    PubMed Central

    Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G

    2010-01-01

    We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853

  17. Informatics in radiology: automated Web-based graphical dashboard for radiology operational business intelligence.

    PubMed

    Nagy, Paul G; Warnock, Max J; Daly, Mark; Toland, Christopher; Meenan, Christopher D; Mezrich, Reuben S

    2009-11-01

    Radiology departments today are faced with many challenges to improve operational efficiency, performance, and quality. Many organizations rely on antiquated, paper-based methods to review their historical performance and understand their operations. With increased workloads, geographically dispersed image acquisition and reading sites, and rapidly changing technologies, this approach is increasingly untenable. A Web-based dashboard was constructed to automate the extraction, processing, and display of indicators and thereby provide useful and current data for twice-monthly departmental operational meetings. The feasibility of extracting specific metrics from clinical information systems was evaluated as part of a longer-term effort to build a radiology business intelligence architecture. Operational data were extracted from clinical information systems and stored in a centralized data warehouse. Higher-level analytics were performed on the centralized data, a process that generated indicators in a dynamic Web-based graphical environment that proved valuable in discussion and root cause analysis. Results aggregated over a 24-month period since implementation suggest that this operational business intelligence reporting system has provided significant data for driving more effective management decisions to improve productivity, performance, and quality of service in the department.

  18. Biometric verification in dynamic writing

    NASA Astrophysics Data System (ADS)

    George, Susan E.

    2002-03-01

    Pen-tablet devices capable of capturing the dynamics of writing record temporal and pressure information as well as the spatial pattern. This paper explores biometric verification based upon the dynamics of writing where writers are distinguished not on the basis of what they write (ie the signature), but how they write. We have collected samples of dynamic writing from 38 Chinese writers. Each writer was asked to provide 10 copies of a paragraph of text and the same number of signature samples. From the data we have extracted stroke-based primitives from the sentence data utilizing pen-up/down information and heuristic rules about the shape of the character. The x, y and pressure values of each primitive were interpolated into an even temporal range based upon a 20 msec sampling rate. We applied the Daubechies 1 wavelet transform to the x signal, y signal and pressure signal using the coefficients as inputs to a multi-layer perceptron trained with back-propagation on the sentence data. We found a sensitivity of 0.977 and specificity of 0.990 recognizing writers based on test primitives extracted from sentence data and measures of 0.916 and 0.961 respectively, from test primitives extracted from signature data.

  19. Motion processing with two eyes in three dimensions.

    PubMed

    Rokers, Bas; Czuba, Thaddeus B; Cormack, Lawrence K; Huk, Alexander C

    2011-02-11

    The movement of an object toward or away from the head is perhaps the most critical piece of information an organism can extract from its environment. Such 3D motion produces horizontally opposite motions on the two retinae. Little is known about how or where the visual system combines these two retinal motion signals, relative to the wealth of knowledge about the neural hierarchies involved in 2D motion processing and binocular vision. Canonical conceptions of primate visual processing assert that neurons early in the visual system combine monocular inputs into a single cyclopean stream (lacking eye-of-origin information) and extract 1D ("component") motions; later stages then extract 2D pattern motion from the cyclopean output of the earlier stage. Here, however, we show that 3D motion perception is in fact affected by the comparison of opposite 2D pattern motions between the two eyes. Three-dimensional motion sensitivity depends systematically on pattern motion direction when dichoptically viewing gratings and plaids-and a novel "dichoptic pseudoplaid" stimulus provides strong support for use of interocular pattern motion differences by precluding potential contributions from conventional disparity-based mechanisms. These results imply the existence of eye-of-origin information in later stages of motion processing and therefore motivate the incorporation of such eye-specific pattern-motion signals in models of motion processing and binocular integration.

  20. Communication at an online infertility expert forum: provider responses to patients' emotional and informational cues.

    PubMed

    Aarts, J W M; van Oers, A M; Faber, M J; Cohlen, B J; Nelen, W L D M; Kremer, J A M; van Dulmen, A M

    2015-01-01

    Online patient-provider communication has become increasingly popular in fertility care. However, it is not known to what extent patients express cues or concerns and how providers respond. In this study, we investigated cues and responses that occur in online patient-provider communication at an infertility-specific expert forum. We extracted 106 threads from the multidisciplinary expert forum of two Dutch IVF clinics. We performed the following analyses: (1) thematic analysis of patients' questions; and (2) rating patients' emotional and informational cues and subsequent professionals' responses using an adaptation of the validated Medical Interview Aural Rating Scale. Frequencies of themes, frequencies of cues and responses, and sequences (what cue is followed by what response) were extracted. Sixty-five infertile patients and 19 providers participated. The most common themes included medication and lifestyle. Patients gave more informational than emotional cues (106 versus 64). Responses to informational cues were mostly adequate (61%). The most common response to emotional cues was empathic acknowledgment (72%). Results indicate that an online expert forum could have a positive effect on patient outcomes, which should guide future research. Offering infertile patients an expert forum to communicate with providers can be a promising supplement to usual care in both providing information and addressing patients' concerns.

  1. Usability-driven pruning of large ontologies: the case of SNOMED CT

    PubMed Central

    Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan

    2012-01-01

    Objectives To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Materials and Methods Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Results Graph-traversal heuristics provided high coverage (71–96% of terms in the test sets of discharge summaries) at the expense of subset size (17–51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24–55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Discussion Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Conclusion Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available. PMID:22268217

  2. Automatic movie skimming with general tempo analysis

    NASA Astrophysics Data System (ADS)

    Lee, Shih-Hung; Yeh, Chia-Hung; Kuo, C. C. J.

    2003-11-01

    Story units are extracted by general tempo analysis including tempos analysis including tempos of audio and visual information in this research. Although many schemes have been proposed to successfully segment video data into shots using basic low-level features, how to group shots into meaningful units called story units is still a challenging problem. By focusing on a certain type of video such as sport or news, we can explore models with the specific application domain knowledge. For movie contents, many heuristic rules based on audiovisual clues have been proposed with limited success. We propose a method to extract story units using general tempo analysis. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.

  3. Multivariate analysis for scanning tunneling spectroscopy data

    NASA Astrophysics Data System (ADS)

    Yamanishi, Junsuke; Iwase, Shigeru; Ishida, Nobuyuki; Fujita, Daisuke

    2018-01-01

    We applied principal component analysis (PCA) to two-dimensional tunneling spectroscopy (2DTS) data obtained on a Si(111)-(7 × 7) surface to explore the effectiveness of multivariate analysis for interpreting 2DTS data. We demonstrated that several components that originated mainly from specific atoms at the Si(111)-(7 × 7) surface can be extracted by PCA. Furthermore, we showed that hidden components in the tunneling spectra can be decomposed (peak separation), which is difficult to achieve with normal 2DTS analysis without the support of theoretical calculations. Our analysis showed that multivariate analysis can be an additional powerful way to analyze 2DTS data and extract hidden information from a large amount of spectroscopic data.

  4. [Study on Chemical Constituents of Fat-soluble Extraction from Lepidium meyenii].

    PubMed

    Fan, Cai-hong; Ge, Fa-huan

    2015-02-01

    To study the chemical constituents of the fat-soluble extraction from Lepidium meyenii root. Different extraction methods were studied, including supercritical carbon dioxide extraction, circumfluence extraction and steam distillation. Chemical constituents of the fat-soluble extraction from Lepidium meyenii were analyzed by GC/MS. The number of compounds isolated by the above four methods were 38, 31, 14, 21 (specific gravity less than 1 in steam distillation) , and 25 (specific gravity greater than 1 in steam distillation), accounting for 85.79%, 81.18%, 62.08%, 98.36% (specific gravity less than 1 in steam distillation) and 81.54% (specific gravity greater than 1 in steam distillation) of each total peak area, respectively. This study lays a certain foundation for further study and development of functional factors in Lepidium meyenii root.

  5. Differences among heat-treated, raw, and commercial peanut extracts by skin testing and immunoblotting.

    PubMed

    Maleki, Soheila J; Casillas, Adrian M; Kaza, Ujwala; Wilson, Brian A; Nesbit, Jacqueline B; Reimoneqnue, Chantrel; Cheng, Hsiaopo; Bahna, Sami L

    2010-12-01

    Peanut allergenicity has been reported to be influenced by heat treatment, yet the commonly available extracts for skin prick testing (SPT) are derived from raw extracts. To assess the effect of heat treatment on the SPT reactivity and specific IgE binding to peanut. Three commercial extracts and 3 laboratory-prepared extracts, including raw, roasted, and boiled, were used for SPT in 19 patients with suspected peanut allergy and in 4 individuals who eat peanut without any symptoms. Serum samples were obtained to measure total IgE in addition to specific IgE binding to the study extracts by immunoblotting. Peanut allergy was confirmed with challenge test unless the individual had a convincing history of a severe reaction. Eleven study participants were considered peanut allergic based on a strong history or positive challenge test result. SPT with the prepared and commercial reagents showed that the boiled extract had the highest specificity (67% vs 42%-63% for the other extracts). The prepared extracts showed similar SPT sensitivity (81%). Three patients with a history of severe reaction and elevated specific IgE levels to peanut to the 3 study extracts had variable SPT reactivity to 1 or more of the commercial extracts. IgE binding to Ara h 2 was found in nearly all patients, regardless of their clinical reactivity. None of the extracts tested showed optimal diagnostic reliability regarding both sensitivity and specificity. Perhaps testing should be performed with multiple individual extracts prepared by different methods. Copyright © 2010 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  6. Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

    PubMed

    Dave, Jaydev K; Gingold, Eric L

    2013-01-01

    The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.

  7. Accurate facade feature extraction method for buildings from three-dimensional point cloud data considering structural information

    NASA Astrophysics Data System (ADS)

    Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia

    2018-05-01

    Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.

  8. A prototype system to support evidence-based practice.

    PubMed

    Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E; Clayton, Jennifer; Thoma, George R

    2008-11-06

    Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans.

  9. MRMer, an interactive open source and cross-platform system for data extraction and visualization of multiple reaction monitoring experiments.

    PubMed

    Martin, Daniel B; Holzman, Ted; May, Damon; Peterson, Amelia; Eastham, Ashley; Eng, Jimmy; McIntosh, Martin

    2008-11-01

    Multiple reaction monitoring (MRM) mass spectrometry identifies and quantifies specific peptides in a complex mixture with very high sensitivity and speed and thus has promise for the high throughput screening of clinical samples for candidate biomarkers. We have developed an interactive software platform, called MRMer, for managing highly complex MRM-MS experiments, including quantitative analyses using heavy/light isotopic peptide pairs. MRMer parses and extracts information from MS files encoded in the platform-independent mzXML data format. It extracts and infers precursor-product ion transition pairings, computes integrated ion intensities, and permits rapid visual curation for analyses exceeding 1000 precursor-product pairs. Results can be easily output for quantitative comparison of consecutive runs. Additionally MRMer incorporates features that permit the quantitative analysis experiments including heavy and light isotopic peptide pairs. MRMer is open source and provided under the Apache 2.0 license.

  10. A Prototype System to Support Evidence-based Practice

    PubMed Central

    Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E.; Clayton, Jennifer; Thoma, George R.

    2008-01-01

    Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients’ problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans. PMID:18998835

  11. The Agent of extracting Internet Information with Lead Order

    NASA Astrophysics Data System (ADS)

    Mo, Zan; Huang, Chuliang; Liu, Aijun

    In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.

  12. Energy Dissipation in Ex-Vivo Porcine Liver during Electrosurgery

    PubMed Central

    Karaki, Wafaa; Akyildiz, Ali; De, Suvranu

    2017-01-01

    This paper explores energy dissipation in ex-vivo liver tissue during radiofrequency current excitation with application in electrosurgery. Tissue surface temperature for monopolar electrode configuration is measured using infrared thermometry. The experimental results are fitted to a finite element model for transient heat transfer taking into account energy storage and conduction in order to extract information about “apparent” specific heat, which encompasses storage and phase change. The average apparent specific heat determined for low temperatures is in agreement with published data. However, at temperatures approaching the boiling point of water, apparent specific heat increases by a factor of five, indicating that vaporization plays an important role in the energy dissipation through latent heat loss. PMID:27479955

  13. Green ultrasound-assisted extraction of anthocyanin and phenolic compounds from purple sweet potato using response surface methodology

    NASA Astrophysics Data System (ADS)

    Zhu, Zhenzhou; Guan, Qingyan; Guo, Ying; He, Jingren; Liu, Gang; Li, Shuyi; Barba, Francisco J.; Jaffrin, Michel Y.

    2016-01-01

    Response surface methodology was used to optimize experimental conditions for ultrasound-assisted extraction of valuable components (anthocyanins and phenolics) from purple sweet potatoes using water as a solvent. The Box-Behnken design was used for optimizing extraction responses of anthocyanin extraction yield, phenolic extraction yield, and specific energy consumption. Conditions to obtain maximal anthocyanin extraction yield, maximal phenolic extraction yield, and minimal specific energy consumption were different; an overall desirability function was used to search for overall optimal conditions: extraction temperature of 68ºC, ultrasonic treatment time of 52 min, and a liquid/solid ratio of 20. The optimized anthocyanin extraction yield, phenolic extraction yield, and specific energy consumption were 4.91 mg 100 g-1 fresh weight, 3.24 mg g-1 fresh weight, and 2.07 kWh g-1, respectively, with a desirability of 0.99. This study indicates that ultrasound-assisted extraction should contribute to a green process for valorization of purple sweet potatoes.

  14. PREDOSE: a semantic web platform for drug abuse epidemiology using social media.

    PubMed

    Cameron, Delroy; Smith, Gary A; Daniulaityte, Raminta; Sheth, Amit P; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z; Falck, Russel

    2013-12-01

    The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel semantic web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO--pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC), through combination of lexical, pattern-based and semantics-based techniques. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information, which facilitate search, trend analysis and overall content analysis using social media on prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

    PubMed Central

    Bao, Jiatong; Jia, Yunyi; Cheng, Yu; Tang, Hongru; Xi, Ning

    2016-01-01

    Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications. PMID:27983604

  16. QTLTableMiner++: semantic mining of QTL tables in scientific articles.

    PubMed

    Singh, Gurnoor; Kuzniar, Arnold; van Mulligen, Erik M; Gavai, Anand; Bachem, Christian W; Visser, Richard G F; Finkers, Richard

    2018-05-25

    A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner ++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.

  17. Longitudinal Analysis of New Information Types in Clinical Notes

    PubMed Central

    Zhang, Rui; Pakhomov, Serguei; Melton, Genevieve B.

    2014-01-01

    It is increasingly recognized that redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous, significant, and may negatively impact the secondary use of these notes for research and patient care. We investigated several automated methods to identify redundant versus relevant new information in clinical reports. These methods may provide a valuable approach to extract clinically pertinent information and further improve the accuracy of clinical information extraction systems. In this study, we used UMLS semantic types to extract several types of new information, including problems, medications, and laboratory information. Automatically identified new information highly correlated with manual reference standard annotations. Methods to identify different types of new information can potentially help to build up more robust information extraction systems for clinical researchers as well as aid clinicians and researchers in navigating clinical notes more effectively and quickly identify information pertaining to changes in health states. PMID:25717418

  18. Two traditional maize inbred lines of contrasting technological abilities are discriminated by the seed flour proteome.

    PubMed

    Pinheiro, Carla; Sergeant, Kjell; Machado, Cátia M; Renaut, Jenny; Ricardo, Cândido P

    2013-07-05

    The seed proteome of two traditional maize inbred lines (pb269 and pb369) contrasting in grain hardness and in preferable use for bread-making was evaluated. The pb269 seeds, of flint type (i.e., hard endosperm), are preferably used by manufacturers, while pb369 (dent, soft endosperm) is rejected. The hypothesis that the content and relative amounts of specific proteins in the maize flour are relevant for such discrimination of the inbred lines was tested. The flour proteins were sequentially extracted following the Osborne fractionation (selective solubilization), and the four Osborne fractions were submitted to two-dimensional electrophoresis (2DE). The total amount of protein extracted from the seeds was not significantly different, but pb369 flour exhibited significantly higher proportions of salt-extracted proteins (globulins) and ethanol-extracted proteins (alcohol-soluble prolamins). The proteome analysis allowed discrimination between the two inbred lines, with pb269 demonstrating higher heterogeneity than pb369. From the 967 spots (358 common to both lines, 208 specific to pb269, and 401 specific to pb369), 588 were submitted to mass spectrometry (MS). Through the combined use of trypsin and chymotrypsin it was possible to identify proteins in 436 spots. The functional categorization in combination with multivariate analysis highlighted the most discriminant biological processes (carbohydrate metabolic process, response to stress, chitin catabolic process, oxidation-reduction process) and molecular function (nutrient reservoir activity). The inbred lines exhibited quantitative and qualitative differences in these categories. Differences were also revealed in the amounts, proportions, and distribution of several groups of storage proteins, which can have an impact on the organization of the protein body and endosperm hardness. For some proteins (granule-bound starch synthase-1, cyclophilin, zeamatin), a change in the protein solubility rather than in the total amount extracted was observed, which reveals distinct in vivo associations and/or changes in binding strength between the inbred lines. Our approach produced information that relates protein content, relative protein content, and specific protein types to endosperm hardness and to the preferable use for "broa" bread-making.

  19. Optimal Information Extraction of Laser Scanning Dataset by Scale-Adaptive Reduction

    NASA Astrophysics Data System (ADS)

    Zang, Y.; Yang, B.

    2018-04-01

    3D laser technology is widely used to collocate the surface information of object. For various applications, we need to extract a good perceptual quality point cloud from the scanned points. To solve the problem, most of existing methods extract important points based on a fixed scale. However, geometric features of 3D object come from various geometric scales. We propose a multi-scale construction method based on radial basis function. For each scale, important points are extracted from the point cloud based on their importance. We apply a perception metric Just-Noticeable-Difference to measure degradation of each geometric scale. Finally, scale-adaptive optimal information extraction is realized. Experiments are undertaken to evaluate the effective of the proposed method, suggesting a reliable solution for optimal information extraction of object.

  20. Thresholds of information leakage for speech security outside meeting rooms.

    PubMed

    Robinson, Matthew; Hopkins, Carl; Worrall, Ken; Jackson, Tim

    2014-09-01

    This paper describes an approach to provide speech security outside meeting rooms where a covert listener might attempt to extract confidential information. Decision-based experiments are used to establish a relationship between an objective measurement of the Speech Transmission Index (STI) and a subjective assessment relating to the threshold of information leakage. This threshold is defined for a specific percentage of English words that are identifiable with a maximum safe vocal effort (e.g., "normal" speech) used by the meeting participants. The results demonstrate that it is possible to quantify an offset that links STI with a specific threshold of information leakage which describes the percentage of words identified. The offsets for male talkers are shown to be approximately 10 dB larger than for female talkers. Hence for speech security it is possible to determine offsets for the threshold of information leakage using male talkers as the "worst case scenario." To define a suitable threshold of information leakage, the results show that a robust definition can be based upon 1%, 2%, or 5% of words identified. For these percentages, results are presented for offset values corresponding to different STI values in a range from 0.1 to 0.3.

  1. Query-oriented evidence extraction to support evidence-based medicine practice.

    PubMed

    Sarker, Abeed; Mollá, Diego; Paris, Cecile

    2016-02-01

    Evidence-based medicine practice requires medical practitioners to rely on the best available evidence, in addition to their expertise, when making clinical decisions. The medical domain boasts a large amount of published medical research data, indexed in various medical databases such as MEDLINE. As the size of this data grows, practitioners increasingly face the problem of information overload, and past research has established the time-associated obstacles faced by evidence-based medicine practitioners. In this paper, we focus on the problem of automatic text summarisation to help practitioners quickly find query-focused information from relevant documents. We utilise an annotated corpus that is specialised for the task of evidence-based summarisation of text. In contrast to past summarisation approaches, which mostly rely on surface level features to identify salient pieces of texts that form the summaries, our approach focuses on the use of corpus-based statistics, and domain-specific lexical knowledge for the identification of summary contents. We also apply a target-sentence-specific summarisation technique that reduces the problem of underfitting that persists in generic summarisation models. In automatic evaluations run over a large number of annotated summaries, our extractive summarisation technique statistically outperforms various baseline and benchmark summarisation models with a percentile rank of 96.8%. A manual evaluation shows that our extractive summarisation approach is capable of selecting content with high recall and precision, and may thus be used to generate bottom-line answers to practitioners' queries. Our research shows that the incorporation of specialised data and domain-specific knowledge can significantly improve text summarisation performance in the medical domain. Due to the vast amounts of medical text available, and the high growth of this form of data, we suspect that such summarisation techniques will address the time-related obstacles associated with evidence-based medicine. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters

    PubMed Central

    Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei

    2016-01-01

    Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme. PMID:27362762

  3. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters.

    PubMed

    Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei

    2016-01-01

    Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme.

  4. Extraction of hexavalent chromium from chromated copper arsenate treated wood under alkaline conditions.

    PubMed

    Radivojevic, Suzana; Cooper, Paul A

    2008-05-15

    Information on chromium (Cr) oxidation states is essential for the assessment of environmental and health risks associated with the overall life-cycle of chromated copper arsenate (CCA) treated wood products because of differences in toxicity between trivalent [Cr(III)] and hexavalent [Cr(VI)] chromium compounds. Hypothetical Cr(VI) fixation products were investigated in CCA type C treated sawdust of aspen and red pine during or following preservative fixation by extraction with Cr(VI)-specific extractants. Cr(VI) was found only in alkaline extracts of treated wood. A major source of Cr(VI) was method-induced oxidation of fixed Cr(III) during alkaline extraction, as confirmed by demonstrated oxidation of Cr(III) from CrCl3 treated wood. Oxidation of nontoxic and immobile Cr(III) to toxic and mobile Cr(VI) was facilitated by the presence of wood at pH > 8.5. Thermodynamic equilibrium between Cr(III) and Cr(VI) is affected by pH, temperature, rates of dissolution of CrIII) compounds, and oxygen availability. Results of this study recommend against alkaline extraction protocols for determination of Cr(VI) in treated wood. This Cr oxidation mechanism can act as a previously unrecognized route for generation of hazardous Cr(VI) if CCA treated wood is exposed to alkaline conditions during its production, use, or waste management.

  5. Research on Optimal Observation Scale for Damaged Buildings after Earthquake Based on Optimal Feature Space

    NASA Astrophysics Data System (ADS)

    Chen, J.; Chen, W.; Dou, A.; Li, W.; Sun, Y.

    2018-04-01

    A new information extraction method of damaged buildings rooted in optimal feature space is put forward on the basis of the traditional object-oriented method. In this new method, ESP (estimate of scale parameter) tool is used to optimize the segmentation of image. Then the distance matrix and minimum separation distance of all kinds of surface features are calculated through sample selection to find the optimal feature space, which is finally applied to extract the image of damaged buildings after earthquake. The overall extraction accuracy reaches 83.1 %, the kappa coefficient 0.813. The new information extraction method greatly improves the extraction accuracy and efficiency, compared with the traditional object-oriented method, and owns a good promotional value in the information extraction of damaged buildings. In addition, the new method can be used for the information extraction of different-resolution images of damaged buildings after earthquake, then to seek the optimal observation scale of damaged buildings through accuracy evaluation. It is supposed that the optimal observation scale of damaged buildings is between 1 m and 1.2 m, which provides a reference for future information extraction of damaged buildings.

  6. Plant-uptake of uranium: Hydroponic and soil system studies

    USGS Publications Warehouse

    Ramaswami, A.; Carr, P.; Burkhardt, M.

    2001-01-01

    Limited information is available on screening and selection of terrestrial plants for uptake and translocation of uranium from soil. This article evaluates the removal of uranium from water and soil by selected plants, comparing plant performance in hydroponic systems with that in two soil systems (a sandy-loam soil and an organic-rich soil). Plants selected for this study were Sunflower (Helianthus giganteus), Spring Vetch (Vicia sativa), Hairy Vetch (Vicia villosa), Juniper (Juniperus monosperma), Indian Mustard (Brassica juncea), and Bush Bean (Phaseolus nanus). Plant performance was evaluated both in terms of the percent uranium extracted from the three systems, as well as the biological absorption coefficient (BAC) that normalized uranium uptake to plant biomass. Study results indicate that uranium extraction efficiency decreased sharply across hydroponic, sandy and organic soil systems, indicating that soil organic matter sequestered uranium, rendering it largely unavailable for plant uptake. These results indicate that site-specific soils must be used to screen plants for uranium extraction capability; plant behavior in hydroponic systems does not correlate well with that in soil systems. One plant species, Juniper, exhibited consistent uranium extraction efficiencies and BACs in both sandy and organic soils, suggesting unique uranium extraction capabilities.

  7. Neutron Polarization Analysis for Biphasic Solvent Extraction Systems

    DOE PAGES

    Motokawa, Ryuhei; Endo, Hitoshi; Nagao, Michihiro; ...

    2016-06-16

    Here we performed neutron polarization analysis (NPA) of extracted organic phases containing complexes, comprised of Zr(NO 3) 4 and tri-n-butyl phosphate, which enabled decomposition of the intensity distribution of small-angle neutron scattering (SANS) into the coherent and incoherent scattering components. The coherent scattering intensity, containing structural information, and the incoherent scattering compete over a wide range of magnitude of scattering vector, q, specifically when q is larger than q* ≈ 1/R g, where R g is the radius of gyration of scatterer. Therefore, it is important to determine the incoherent scattering intensity exactly to perform an accurate structural analysis frommore » SANS data when R g is small, such as the aforementioned extracted coordination species. Although NPA is the best method for evaluating the incoherent scattering component for accurately determining the coherent scattering in SANS, this method is not used frequently in SANS data analysis because it is technically challenging. In this study, we successfully demonstrated that experimental determination of the incoherent scattering using NPA is suitable for sample systems containing a small scatterer with a weak coherent scattering intensity, such as extracted complexes in biphasic solvent extraction systems.« less

  8. Using latent semantic analysis and the predication algorithm to improve extraction of meanings from a diagnostic corpus.

    PubMed

    Jorge-Botana, Guillermo; Olmos, Ricardo; León, José Antonio

    2009-11-01

    There is currently a widespread interest in indexing and extracting taxonomic information from large text collections. An example is the automatic categorization of informally written medical or psychological diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as an heuristic tool for helping doctors. Vector space models have been successfully used to this end (Lee, Cimino, Zhu, Sable, Shanker, Ely & Yu, 2006; Pakhomov, Buntrock & Chute, 2006). In this study we use a computational model known as Latent Semantic Analysis (LSA) on a diagnostic corpus with the aim of retrieving definitions (in the form of lists of semantic neighbors) of common structures it contains (e.g. "storm phobia", "dog phobia") or less common structures that might be formed by logical combinations of categories and diagnostic symptoms (e.g. "gun personality" or "germ personality"). In the quest to bring definitions into line with the meaning of structures and make them in some way representative, various problems commonly arise while recovering content using vector space models. We propose some approaches which bypass these problems, such as Kintsch's (2001) predication algorithm and some corrections to the way lists of neighbors are obtained, which have already been tested on semantic spaces in a non-specific domain (Jorge-Botana, León, Olmos & Hassan-Montero, under review). The results support the idea that the predication algorithm may also be useful for extracting more precise meanings of certain structures from scientific corpora, and that the introduction of some corrections based on vector length may increases its efficiency on non-representative terms.

  9. A gradient-based approach for automated crest-line detection and analysis of sand dune patterns on planetary surfaces

    NASA Astrophysics Data System (ADS)

    Lancaster, N.; LeBlanc, D.; Bebis, G.; Nicolescu, M.

    2015-12-01

    Dune-field patterns are believed to behave as self-organizing systems, but what causes the patterns to form is still poorly understood. The most obvious (and in many cases the most significant) aspect of a dune system is the pattern of dune crest lines. Extracting meaningful features such as crest length, orientation, spacing, bifurcations, and merging of crests from image data can reveal important information about the specific dune-field morphological properties, development, and response to changes in boundary conditions, but manual methods are labor-intensive and time-consuming. We are developing the capability to recognize and characterize patterns of sand dunes on planetary surfaces. Our goal is to develop a robust methodology and the necessary algorithms for automated or semi-automated extraction of dune morphometric information from image data. Our main approach uses image processing methods to extract gradient information from satellite images of dune fields. Typically, the gradients have a dominant magnitude and orientation. In many cases, the images have two major dominant gradient orientations, for the sunny and shaded side of the dunes. A histogram of the gradient orientations is used to determine the dominant orientation. A threshold is applied to the image based on gradient orientations which agree with the dominant orientation. The contours of the binary image can then be used to determine the dune crest-lines, based on pixel intensity values. Once the crest-lines have been extracted, the morphological properties can be computed. We have tested our approach on a variety of images of linear and crescentic (transverse) dunes and compared dune detection algorithms with manually-digitized dune crest lines, achieving true positive values of 0.57-0.99; and false positives values of 0.30-0.67, indicating that out approach is generally robust.

  10. Automated extraction of chemical structure information from digital raster images

    PubMed Central

    Park, Jungkap; Rosania, Gus R; Shedden, Kerby A; Nguyen, Mandee; Lyu, Naesung; Saitou, Kazuhiro

    2009-01-01

    Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles. PMID:19196483

  11. Analysis of Technique to Extract Data from the Web for Improved Performance

    NASA Astrophysics Data System (ADS)

    Gupta, Neena; Singh, Manish

    2010-11-01

    The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.

  12. A study on building data warehouse of hospital information system.

    PubMed

    Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo

    2011-08-01

    Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

  13. Chemical modification of birch allergen extract leads to a reduction in allergenicity as well as immunogenicity.

    PubMed

    Würtzen, Peter Adler; Lund, Lise; Lund, Gitte; Holm, Jens; Millner, Anders; Henmar, Helene

    2007-01-01

    In Europe, specific immunotherapy is currently conducted with vaccines containing allergen preparations based on intact extracts. In addition to this, chemically modified allergen extracts (allergoids) are used for specific allergy treatment. Reduced allergenicity and thereby reduced risk of side effects in combination with retained ability to activate T cells and induce protective allergen-specific antibody responses has been claimed for allergoids. In the current study, we compared intact allergen extracts and allergoids with respect to allergenicity and immunogenicity. The immunological response to birch allergen extract, alum-adsorbed extract, birch allergoid and alum-adsorbed allergoid was investigated in vitro in human basophil histamine release assay and by stimulation of human allergen-specific T cell lines. In vivo, Bet v 1-specific IgG titers in mice were determined after repetitive immunizations. In all patients tested (n = 8), allergoid stimulations led to reduced histamine release compared to the intact allergen extract. However, the allergoid preparations were not recognized by Bet v 1-specific T cell lines (n = 7), which responded strongly to the intact allergen extract. Mouse immunizations showed a clearly reduced IgG induction by allergoids and a strongly potentiating effect of the alum adjuvant. Optimal IgG titers were obtained after 3 immunizations with intact allergen extracts, while 5 immunizations were needed to obtain maximal response to the allergoid. The reduced histamine release observed for allergoid preparations may be at the expense of immunological efficacy because the chemical modifications lead to a clear reduction in T cell activation and the ability to induce allergen-specific IgG antibody responses. Copyright 2007 S. Karger AG, Basel.

  14. Knowledge Representation Of CT Scans Of The Head

    NASA Astrophysics Data System (ADS)

    Ackerman, Laurens V.; Burke, M. W.; Rada, Roy

    1984-06-01

    We have been investigating diagnostic knowledge models which assist in the automatic classification of medical images by combining information extracted from each image with knowledge specific to that class of images. In a more general sense we are trying to integrate verbal and pictorial descriptions of disease via representations of knowledge, study automatic hypothesis generation as related to clinical medicine, evolve new mathematical image measures while integrating them into the total diagnostic process, and investigate ways to augment the knowledge of the physician. Specifically, we have constructed an artificial intelligence knowledge model using the technique of a production system blending pictorial and verbal knowledge about the respective CT scan and patient history. It is an attempt to tie together different sources of knowledge representation, picture feature extraction and hypothesis generation. Our knowledge reasoning and representation system (KRRS) works with data at the conscious reasoning level of the practicing physician while at the visual perceptional level we are building another production system, the picture parameter extractor (PPE). This paper describes KRRS and its relationship to PPE.

  15. Attenuated Total Reflection Mid-Infrared (ATR-MIR) Spectroscopy and Chemometrics for the Identification and Classification of Commercial Tannins.

    PubMed

    Ricci, Arianna; Parpinello, Giuseppina P; Olejar, Kenneth J; Kilmartin, Paul A; Versari, Andrea

    2015-11-01

    Attenuated total reflection Fourier transform infrared (FT-IR) spectroscopy was used to characterize 40 commercial tannins, including condensed and hydrolyzable chemical classes, provided as powder extracts from suppliers. Spectral data were processed to detect typical molecular vibrations of tannins bearing different chemical groups and of varying botanical origin (univariate qualitative analysis). The mid-infrared region between 4000 and 520 cm(-1) was analyzed, with a particular emphasis on the vibrational modes in the fingerprint region (1800-520 cm(-1)), which provide detailed information about skeletal structures and specific substituents. The region 1800-1500 cm(-1) contained signals due to hydrolyzable structures, while bands due to condensed tannins appeared at 1300-900 cm(-1) and exhibited specific hydroxylation patterns useful to elucidate the structure of the flavonoid monomeric units. The spectra were investigated further using principal component analysis for discriminative purposes, to enhance the ability of infrared spectroscopy in the classification and quality control of commercial dried extracts and to enhance their industrial exploitation.

  16. "The caterpillar": a novel reading passage for assessment of motor speech disorders.

    PubMed

    Patel, Rupal; Connaghan, Kathryn; Franco, Diana; Edsall, Erika; Forgit, Dory; Olsen, Laura; Ramage, Lianna; Tyler, Emily; Russell, Scott

    2013-02-01

    A review of the salient characteristics of motor speech disorders and common assessment protocols revealed the need for a novel reading passage tailored specifically to differentiate between and among the dysarthrias (DYSs) and apraxia of speech (AOS). "The Caterpillar" passage was designed to provide a contemporary, easily read, contextual speech sample with specific tasks (e.g., prosodic contrasts, words of increasing length and complexity) targeted to inform the assessment of motor speech disorders. Twenty-two adults, 15 with DYS or AOS and 7 healthy controls (HC), were recorded reading "The Caterpillar" passage to demonstrate its utility in examining motor speech performance. Analysis of performance across a subset of segmental and prosodic variables illustrated that "The Caterpillar" passage showed promise for extracting individual profiles of impairment that could augment current assessment protocols and inform treatment planning in motor speech disorders.

  17. A Proteomic Workflow Using High-Throughput De Novo Sequencing Towards Complementation of Genome Information for Improved Comparative Crop Science.

    PubMed

    Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie

    2016-01-01

    The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.

  18. Diagnosis of food allergies: the impact of oral food challenge testing.

    PubMed

    Ito, Komei

    2013-01-01

    A diagnosis of food allergies should be made based on the observation of allergic symptoms following the intake of suspected foods and the presence of allergen-specific IgE antibodies. The oral food challenge (OFC) test is the most reliable clinical procedure for diagnosing food allergies. Specific IgE testing of allergen components as well as classical crude allergen extracts helps to make a more specific diagnosis of food allergies. The Japanese Society of Pediatric Allergy and Clinical Immunology issued the 'Japanese Pediatric Guideline for Food Allergy 2012' to provide information regarding the standardized diagnosis and management of food allergies. This review summarizes recent progress in the diagnosis of food allergies, focusing on the use of specific IgE tests and the OFC procedure in accordance with the Japanese guidelines.

  19. A randomized control trial comparing the visual and verbal communication methods for reducing fear and anxiety during tooth extraction.

    PubMed

    Gazal, Giath; Tola, Ahmed W; Fareed, Wamiq M; Alnazzawi, Ahmad A; Zafar, Muhammad S

    2016-04-01

    To evaluate the value of using the visual information for reducing the level of dental fear and anxiety in patients undergoing teeth extraction under LA. A total of 64 patients were indiscriminately allotted to solitary of the study groups following reading the information sheet and signing the formal consent. If patient was in the control group, only verbal information and routine warnings were provided. If patient was in the study group, tooth extraction video was showed. The level of dental fear and anxiety was detailed by the patients on customary 100 mm visual analog scales (VAS), with "no dental fear and anxiety" (0 mm) and "severe dental distress and unease" (100 mm). Evaluation of dental apprehension and fretfulness was made pre-operatively, following visual/verbal information and post-extraction. There was a substantial variance among the mean dental fear and anxiety scores for both groups post-extraction (p-value < 0.05). Patients in tooth extraction video group were more comfortable after dental extraction than verbal information and routine warning group. For tooth extraction video group there were major decreases in dental distress and anxiety scores between the pre-operative and either post video information scores or postoperative scores (p-values < 0.05). Younger patients recorded higher dental fear and anxiety scores than older ones (P < 0.05). Dental fear and anxiety associated with dental extractions under local anesthesia can be reduced by showing a tooth extraction video to the patients preoperatively.

  20. Pattern recognition of satellite cloud imagery for improved weather prediction

    NASA Technical Reports Server (NTRS)

    Gautier, Catherine; Somerville, Richard C. J.; Volfson, Leonid B.

    1986-01-01

    The major accomplishment was the successful development of a method for extracting time derivative information from geostationary meteorological satellite imagery. This research is a proof-of-concept study which demonstrates the feasibility of using pattern recognition techniques and a statistical cloud classification method to estimate time rate of change of large-scale meteorological fields from remote sensing data. The cloud classification methodology is based on typical shape function analysis of parameter sets characterizing the cloud fields. The three specific technical objectives, all of which were successfully achieved, are as follows: develop and test a cloud classification technique based on pattern recognition methods, suitable for the analysis of visible and infrared geostationary satellite VISSR imagery; develop and test a methodology for intercomparing successive images using the cloud classification technique, so as to obtain estimates of the time rate of change of meteorological fields; and implement this technique in a testbed system incorporating an interactive graphics terminal to determine the feasibility of extracting time derivative information suitable for comparison with numerical weather prediction products.

  1. Domain-independent information extraction in unstructured text

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Irwin, N.H.

    Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness whenmore » compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.« less

  2. Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance.

    PubMed

    Andrei, Victor; Arandjelović, Ognjen

    2016-12-01

    The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public medical literature corpus.

  3. Utilization of remotely-sensed data in the management of inland wetlands

    NASA Technical Reports Server (NTRS)

    Carter, V.; Smith, D. G.

    1973-01-01

    The author has identified the following significant results. ERTS-1 data and aerial photography are proving to be a useful tool for the inventory and management of inland wetlands. Two examples of the application of remotely-sensed data to specific wetland management needs or requirements are discussed. Studies of the Great Dismal Swamp are utilizing ERTS-1 imagery and color IR photography in: (1) study area selection; (2) field inspection; (3) vegetation mapping; (4) identification of drainage characteristics and moisture regime; (5) location of intensive study areas; and (6) detection of change. Thematic extractions of ERTS-1 data made using the United States Geological Survey's Autographic Theme Extraction System are aiding analyses of swamp hydrologic regime and providing information pertinent to quick recognition and inventory of wetlands from ERTS-1. DCP'S in south Florida wetlands provide near-real time data for water resources managers. Data relayed by satellite can be entered into models to provide predictive data and water storage information for long-term and short-term decision making.

  4. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    PubMed

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Information Extraction from Unstructured Text for the Biodefense Knowledge Center

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samatova, N F; Park, B; Krishnamurthy, R

    2005-04-29

    The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces ofmore » our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.« less

  6. An Effective Approach to Biomedical Information Extraction with Limited Training Data

    ERIC Educational Resources Information Center

    Jonnalagadda, Siddhartha

    2011-01-01

    In the current millennium, extensive use of computers and the internet caused an exponential increase in information. Few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text. Limitations in the size of training data, lack of lexicons and lack of…

  7. Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

    ERIC Educational Resources Information Center

    Finch, Dezon Kile

    2012-01-01

    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…

  8. Rare tradition of the folk medicinal use of Aconitum spp. is kept alive in Solčavsko, Slovenia.

    PubMed

    Povšnar, Marija; Koželj, Gordana; Kreft, Samo; Lumpert, Mateja

    2017-08-08

    Aconitum species are poisonous plants that have been used in Western medicine for centuries. In the nineteenth century, these plants were part of official and folk medicine in the Slovenian territory. According to current ethnobotanical studies, folk use of Aconitum species is rarely reported in Europe. The purpose of this study was to research the folk medicinal use of Aconitum species in Solčavsko, Slovenia; to collect recipes for the preparation of Aconitum spp., indications for use, and dosing; and to investigate whether the folk use of aconite was connected to poisoning incidents. In Solčavsko, a remote alpine area in northern Slovenia, we performed semi-structured interviews with 19 informants in Solčavsko, 3 informants in Luče, and two retired physicians who worked in that area. Three samples of homemade ethanolic extracts were obtained from informants, and the concentration of aconitine was measured. In addition, four extracts were prepared according to reported recipes. All 22 informants knew of Aconitum spp. and their therapeutic use, and 5 of them provided a detailed description of the preparation and use of "voukuc", an ethanolic extract made from aconite roots. Seven informants were unable to describe the preparation in detail, since they knew of the extract only from the narration of others or they remembered it from childhood. Most likely, the roots of Aconitum tauricum and Aconitum napellus were used for the preparation of the extract, and the solvent was homemade spirits. Four informants kept the extract at home; two extracts were prepared recently (1998 and 2015). Three extracts were analyzed, and 2 contained aconitine. Informants reported many indications for the use of the extract; it was used internally and, in some cases, externally as well. The extract was also used in animals. The extract was measured in drops, but the number of drops differed among the informants. The informants reported nine poisonings with Aconitum spp., but none of them occurred as a result of medicinal use of the extract. In this study, we determined that folk knowledge of the medicinal use of Aconitum spp. is still present in Solčavsko, but Aconitum preparations are used only infrequently.

  9. New Method for Knowledge Management Focused on Communication Pattern in Product Development

    NASA Astrophysics Data System (ADS)

    Noguchi, Takashi; Shiba, Hajime

    In the field of manufacturing, the importance of utilizing knowledge and know-how has been growing. To meet this background, there is a need for new methods to efficiently accumulate and extract effective knowledge and know-how. To facilitate the extraction of knowledge and know-how needed by engineers, we first defined business process information which includes schedule/progress information, document data, information about communication among parties concerned, and information which corresponds to these three types of information. Based on our definitions, we proposed an IT system (FlexPIM: Flexible and collaborative Process Information Management) to register and accumulate business process information with the least effort. In order to efficiently extract effective information from huge volumes of accumulated business process information, focusing attention on “actions” and communication patterns, we propose a new extraction method using communication patterns. And the validity of this method has been verified for some communication patterns.

  10. Automated prostate cancer localization without the need for peripheral zone extraction using multiparametric MRI.

    PubMed

    Liu, Xin; Yetik, Imam Samil

    2011-06-01

    Multiparametric magnetic resonance imaging (MRI) has been shown to have higher localization accuracy than transrectal ultrasound (TRUS) for prostate cancer. Therefore, automated cancer segmentation using multiparametric MRI is receiving a growing interest, since MRI can provide both morphological and functional images for tissue of interest. However, all automated methods to this date are applicable to a single zone of the prostate, and the peripheral zone (PZ) of the prostate needs to be extracted manually, which is a tedious and time-consuming job. In this paper, our goal is to remove the need of PZ extraction by incorporating the spatial and geometric information of prostate tumors with multiparametric MRI derived from T2-weighted MRI, diffusion-weighted imaging (DWI) and dynamic contrast enhanced MRI (DCE-MRI). In order to remove the need of PZ extraction, the authors propose a new method to incorporate the spatial information of the cancer. This is done by introducing a new feature called location map. This new feature is constructed by applying a nonlinear transformation to the spatial position coordinates of each pixel, so that the location map implicitly represents the geometric position of each pixel with respect to the prostate region. Then, this new feature is combined with multiparametric MR images to perform tumor localization. The proposed algorithm is applied to multiparametric prostate MRI data obtained from 20 patients with biopsy-confirmed prostate cancer. The proposed method which does not need the masks of PZ was found to have prostate cancer detection specificity of 0.84, sensitivity of 0.80 and dice coefficient value of 0.42. The authors have found that fusing the spatial information allows us to obtain tumor outline without the need of PZ extraction with a considerable success (better or similar performance to methods that require manual PZ extraction). Our experimental results quantitatively demonstrate the effectiveness of the proposed method, depicting that the proposed method has a slightly better or similar localization performance compared to methods which require the masks of PZ.

  11. Prognostic value of tissue-type plasminogen activator (tPA) and its complex with the type-1 inhibitor (PAI-1) in breast cancer

    PubMed Central

    Witte, J H de; Sweep, C G J; Klijn, J G M; Grebenschikov, N; Peters, H A; Look, M P; Tienoven, ThH van; Heuvel, J J T M; Vries, J Bolt-De; Benraad, ThJ; Foekens, J A

    1999-01-01

    The prognostic value of tissue-type plasminogen activator (tPA) measured in samples derived from 865 patients with primary breast cancer using a recently developed enzyme-linked immunosorbent assay (ELISA) was evaluated. Since the assay could easily be adapted to the assessment of the complex of tPA with its type-1 inhibitor (PAI-1), it was investigated whether the tPA:PAI-1 complex also provides prognostic information. To this end, cytosolic extracts and corresponding detergent extracts of 100 000 g pellets obtained after ultracentrifugation when preparing the cytosolic fractions for routine steroid hormone receptor determination were assayed. Statistically significant correlations were found between the cytosolic levels and those determined in the pellet extracts (Spearman correlation coefficient rs = 0.75, P < 0.001 for tPA and r = 0.50, P < 0.001 for tPA:PAI-1 complex). In both Cox univariate and multivariate analysis elevated levels of (total) tPA determined in the pellet extracts, but not in cytosols, were associated with prolonged relapse-free (RFS) and overall survival (OS). In contrast, high levels of the tPA:PAI-1 complex measured in cytosols, but not in the pellet extracts, were associated with a poor RFS and OS. The prognostic information provided by the cytosolic tPA:PAI-1 complex was comparable to that provided by cytosolic (total) PAI-1. Furthermore, the estimated levels of free, uncomplexed tPA and PAI-1, in cytosols and in pellet extracts, were related to patient prognosis in a similar way as the (total) levels of tPA and PAI-1 respectively. Determination of specific forms of components of the plasminogen activation system, i.e. tPA:PAI-1 complex and free, uncomplexed tPA and/or PAI-1, may be considered a useful adjunct to the analyses of the separate components (tPA and/or PAI-1) and provide valuable additional prognostic information with respect to survival of breast cancer patients. © 1999 Cancer Research Campaign PMID:10390010

  12. MIMS - MEDICAL INFORMATION MANAGEMENT SYSTEM

    NASA Technical Reports Server (NTRS)

    Frankowski, J. W.

    1994-01-01

    MIMS, Medical Information Management System is an interactive, general purpose information storage and retrieval system. It was first designed to be used in medical data management, and can be used to handle all aspects of data related to patient care. Other areas of application for MIMS include: managing occupational safety data in the public and private sectors; handling judicial information where speed and accuracy are high priorities; systemizing purchasing and procurement systems; and analyzing organizational cost structures. Because of its free format design, MIMS can offer immediate assistance where manipulation of large data bases is required. File structures, data categories, field lengths and formats, including alphabetic and/or numeric, are all user defined. The user can quickly and efficiently extract, display, and analyze the data. Three means of extracting data are provided: certain short items of information, such as social security numbers, can be used to uniquely identify each record for quick access; records can be selected which match conditions defined by the user; and specific categories of data can be selected. Data may be displayed and analyzed in several ways which include: generating tabular information assembled from comparison of all the records on the system; generating statistical information on numeric data such as means, standard deviations and standard errors; and displaying formatted listings of output data. The MIMS program is written in Microsoft FORTRAN-77. It was designed to operate on IBM Personal Computers and compatibles running under PC or MS DOS 2.00 or higher. MIMS was developed in 1987.

  13. If you watch it move, you'll recognize it in 3D: Transfer of depth cues between encoding and retrieval.

    PubMed

    Papenmeier, Frank; Schwan, Stephan

    2016-02-01

    Viewing objects with stereoscopic displays provides additional depth cues through binocular disparity supporting object recognition. So far, it was unknown whether this results from the representation of specific stereoscopic information in memory or a more general representation of an object's depth structure. Therefore, we investigated whether continuous object rotation acting as depth cue during encoding results in a memory representation that can subsequently be accessed by stereoscopic information during retrieval. In Experiment 1, we found such transfer effects from continuous object rotation during encoding to stereoscopic presentations during retrieval. In Experiments 2a and 2b, we found that the continuity of object rotation is important because only continuous rotation and/or stereoscopic depth but not multiple static snapshots presented without stereoscopic information caused the extraction of an object's depth structure into memory. We conclude that an object's depth structure and not specific depth cues are represented in memory. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. TOXMAP: A GIS-Based Gateway to Environmental Health Resources

    PubMed Central

    Hochstein, Colette; Szczur, Marti

    2009-01-01

    The National Library of Medicine (NLM) has an extensive collection of environmental health information, including bibliographic and technical data on hazardous chemical substances, in its TOXNET databases. TOXNET also provides access to the United States Environment Protection Agency (EPA)’s Toxics Release Inventory (TRI) data, which covers release of specific chemicals via air, water, and land, and by underground injection, as reported by industrial facilities around the United States. NLM has developed a Web-based geographic information system (GIS), TOXMAP , which allows users to create dynamic maps that show where TRI chemicals are released and that provides direct links to information about the chemicals in TOXNET. By extracting the associated regional geographic text terms from the displayed map (e.g., rivers, towns, county, state), TOXMAP also provides customized chemical and/or region-specific searches of NLM’s bibliographic biomedical resources. This paper focuses on TOXMAP’s features, data accuracy issues, challenges, user feedback techniques, and future directions. PMID:16893844

  15. Embedding strategies for effective use of information from multiple sequence alignments.

    PubMed Central

    Henikoff, S.; Henikoff, J. G.

    1997-01-01

    We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452

  16. PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

    PubMed

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/

  17. Validated UPLC-MS/MS method for quantification of seven compounds in rat plasma and tissues: Application to pharmacokinetic and tissue distribution studies in rats after oral administration of extract of Eclipta prostrata L.

    PubMed

    Du, Guangyan; Fu, Lingling; Jia, Jun; Pang, Xu; Yu, Haiyang; Zhang, Youcai; Fan, Guanwei; Gao, Xiumei; Han, Lifeng

    2018-06-01

    A rapid, sensitive and specific ultra-high-performance liquid chromatography coupled with tandem mass spectrometry (UPLC-MS/MS) method was developed to investigate the pharmacokinetics and tissue distribution of Eclipta prostrata extract. Rats were orally administrated the 70% ethanol extract of E. prostrata, and their plasma as well as various organs were collected. The concentrations of seven main compounds, ecliptasaponin IV, ecliptasaponin A, apigenin, 3'-hydroxybiochanin A, luteolin, luteolin-7-O-glucoside and wedelolactone, were quantified by UPLC-MS/MS through multiple reactions monitoring method. The precisions (RSD) of the analytes were all <15.00%. The extraction recoveries ranged from 74.65 to 107.45% with RSD ≤ 15.36%. The matrix effects ranged from 78.00 to 118.06% with RSD ≤ 15.04%. To conclude, the present pharmacokinetic and tissue distribution studies provided useful information for the clinical usage of Eclipta prostrata L. Copyright © 2018 John Wiley & Sons, Ltd.

  18. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature

    PubMed Central

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628

  19. Generic Information Can Retrieve Known Biological Associations: Implications for Biomedical Knowledge Discovery

    PubMed Central

    van Haagen, Herman H. H. B. M.; 't Hoen, Peter A. C.; Mons, Barend; Schultes, Erik A.

    2013-01-01

    Motivation Weighted semantic networks built from text-mined literature can be used to retrieve known protein-protein or gene-disease associations, and have been shown to anticipate associations years before they are explicitly stated in the literature. Our text-mining system recognizes over 640,000 biomedical concepts: some are specific (i.e., names of genes or proteins) others generic (e.g., ‘Homo sapiens’). Generic concepts may play important roles in automated information retrieval, extraction, and inference but may also result in concept overload and confound retrieval and reasoning with low-relevance or even spurious links. Here, we attempted to optimize the retrieval performance for protein-protein interactions (PPI) by filtering generic concepts (node filtering) or links to generic concepts (edge filtering) from a weighted semantic network. First, we defined metrics based on network properties that quantify the specificity of concepts. Then using these metrics, we systematically filtered generic information from the network while monitoring retrieval performance of known protein-protein interactions. We also systematically filtered specific information from the network (inverse filtering), and assessed the retrieval performance of networks composed of generic information alone. Results Filtering generic or specific information induced a two-phase response in retrieval performance: initially the effects of filtering were minimal but beyond a critical threshold network performance suddenly drops. Contrary to expectations, networks composed exclusively of generic information demonstrated retrieval performance comparable to unfiltered networks that also contain specific concepts. Furthermore, an analysis using individual generic concepts demonstrated that they can effectively support the retrieval of known protein-protein interactions. For instance the concept “binding” is indicative for PPI retrieval and the concept “mutation abnormality” is indicative for gene-disease associations. Conclusion Generic concepts are important for information retrieval and cannot be removed from semantic networks without negative impact on retrieval performance. PMID:24260124

  20. Information Extraction Using Controlled English to Support Knowledge-Sharing and Decision-Making

    DTIC Science & Technology

    2012-06-01

    or language variants. CE-based information extraction will greatly facilitate the processes in the cognitive and social domains that enable forces...terminology or language variants. CE-based information extraction will greatly facilitate the processes in the cognitive and social domains that...processor is run to turn the atomic CE into a more “ stylistically felicitous” CE, using techniques such as: aggregating all information about an entity

  1. Real-Time Information Extraction from Big Data

    DTIC Science & Technology

    2015-10-01

    I N S T I T U T E F O R D E F E N S E A N A L Y S E S Real-Time Information Extraction from Big Data Robert M. Rolfe...Information Extraction from Big Data Jagdeep Shah Robert M. Rolfe Francisco L. Loaiza-Lemos October 7, 2015 I N S T I T U T E F O R D E F E N S E...AN A LY S E S Abstract We are drowning under the 3 Vs (volume, velocity and variety) of big data . Real-time information extraction from big

  2. An Automated High-Throughput System to Fractionate Plant Natural Products for Drug Discovery

    PubMed Central

    Tu, Ying; Jeffries, Cynthia; Ruan, Hong; Nelson, Cynthia; Smithson, David; Shelat, Anang A.; Brown, Kristin M.; Li, Xing-Cong; Hester, John P.; Smillie, Troy; Khan, Ikhlas A.; Walker, Larry; Guy, Kip; Yan, Bing

    2010-01-01

    The development of an automated, high-throughput fractionation procedure to prepare and analyze natural product libraries for drug discovery screening is described. Natural products obtained from plant materials worldwide were extracted and first prefractionated on polyamide solid-phase extraction cartridges to remove polyphenols, followed by high-throughput automated fractionation, drying, weighing, and reformatting for screening and storage. The analysis of fractions with UPLC coupled with MS, PDA and ELSD detectors provides information that facilitates characterization of compounds in active fractions. Screening of a portion of fractions yielded multiple assay-specific hits in several high-throughput cellular screening assays. This procedure modernizes the traditional natural product fractionation paradigm by seamlessly integrating automation, informatics, and multimodal analytical interrogation capabilities. PMID:20232897

  3. Making a protein extract from plant pathogenic fungi for gel- and LC-based proteomics.

    PubMed

    Fernández, Raquel González; Redondo, Inmaculada; Jorrin-Novo, Jesus V

    2014-01-01

    Proteomic technologies have become a successful tool to provide relevant information on fungal biology. In the case of plant pathogenic fungi, this approach would allow a deeper knowledge of the interaction and the biological cycle of the pathogen, as well as the identification of pathogenicity and virulence factors. These two elements open up new possibilities for crop disease diagnosis and environment-friendly crop protection. Phytopathogenic fungi, due to its particular cellular characteristics, can be considered as a recalcitrant biological material, which makes it difficult to obtain quality protein samples for proteomic analysis. This chapter focuses on protein extraction for gel- and LC-based proteomics with specific protocols of our current research with Botrytis cinerea.

  4. JPRS Report, Science & Technology, Japan, 27th Aircraft Symposium

    DTIC Science & Technology

    1990-10-29

    screen; the relative attitude is then determined . 2) Video Sensor System Specific patterns (grapple target, etc.) drawn on the target spacecraft , or the...entire target spacecraft , is imaged by camera . Navigation information is obtained by on-board image processing, such as extraction of contours and...standard figure called "grapple target" located in the vicinity of the grapple fixture on the target spacecraft is imaged by camera . Contour lines and

  5. Extraction of Data from a Hospital Information System to Perform Process Mining.

    PubMed

    Neira, Ricardo Alfredo Quintano; de Vries, Gert-Jan; Caffarel, Jennifer; Stretton, Erin

    2017-01-01

    The aim of this work is to share our experience in relevant data extraction from a hospital information system in preparation for a research study using process mining techniques. The steps performed were: research definition, mapping the normative processes, identification of tables and fields names of the database, and extraction of data. We then offer lessons learned during data extraction phase. Any errors made in the extraction phase will propagate and have implications on subsequent analyses. Thus, it is essential to take the time needed and devote sufficient attention to detail to perform all activities with the goal of ensuring high quality of the extracted data. We hope this work will be informative for other researchers to plan and execute extraction of data for process mining research studies.

  6. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  7. Videomicroscopic extraction of specific information on cell proliferation and migration in vitro

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Debeir, Olivier; Megalizzi, Veronique; Warzee, Nadine

    2008-10-01

    In vitro cell imaging is a useful exploratory tool for cell behavior monitoring with a wide range of applications in cell biology and pharmacology. Combined with appropriate image analysis techniques, this approach has been shown to provide useful information on the detection and dynamic analysis of cell events. In this context, numerous efforts have been focused on cell migration analysis. In contrast, the cell division process has been the subject of fewer investigations. The present work focuses on this latter aspect and shows that, in complement to cell migration data, interesting information related to cell division can be extracted frommore » phase-contrast time-lapse image series, in particular cell division duration, which is not provided by standard cell assays using endpoint analyses. We illustrate our approach by analyzing the effects induced by two sigma-1 receptor ligands (haloperidol and 4-IBP) on the behavior of two glioma cell lines using two in vitro cell models, i.e., the low-density individual cell model and the high-density scratch wound model. This illustration also shows that the data provided by our approach are suggestive as to the mechanism of action of compounds, and are thus capable of informing the appropriate selection of further time-consuming and more expensive biological evaluations required to elucidate a mechanism.« less

  8. Place in Perspective: Extracting Online Information about Points of Interest

    NASA Astrophysics Data System (ADS)

    Alves, Ana O.; Pereira, Francisco C.; Rodrigues, Filipe; Oliveirinha, João

    During the last few years, the amount of online descriptive information about places has reached reasonable dimensions for many cities in the world. Being such information mostly in Natural Language text, Information Extraction techniques are needed for obtaining the meaning of places that underlies these massive amounts of commonsense and user made sources. In this article, we show how we automatically label places using Information Extraction techniques applied to online resources such as Wikipedia, Yellow Pages and Yahoo!.

  9. Extraction of indirectly captured information for use in a comparison of offline pH measurement technologies.

    PubMed

    Ritchie, Elspeth K; Martin, Elaine B; Racher, Andy; Jaques, Colin

    2017-06-10

    Understanding the causes of discrepancies in pH readings of a sample can allow more robust pH control strategies to be implemented. It was found that 59.4% of differences between two offline pH measurement technologies for an historical dataset lay outside an expected instrument error range of ±0.02pH. A new variable, Osmo Res , was created using multiple linear regression (MLR) to extract information indirectly captured in the recorded measurements for osmolality. Principal component analysis and time series analysis were used to validate the expansion of the historical dataset with the new variable Osmo Res . MLR was used to identify variables strongly correlated (p<0.05) with differences in pH readings by the two offline pH measurement technologies. These included concentrations of specific chemicals (e.g. glucose) and Osmo Res, indicating culture medium and bolus feed additions as possible causes of discrepancies between the offline pH measurement technologies. Temperature was also identified as statistically significant. It is suggested that this was a result of differences in pH-temperature compensations employed by the pH measurement technologies. In summary, a method for extracting indirectly captured information has been demonstrated, and it has been shown that competing pH measurement technologies were not necessarily interchangeable at the desired level of control (±0.02pH). Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Residual and Destroyed Accessible Information after Measurements

    NASA Astrophysics Data System (ADS)

    Han, Rui; Leuchs, Gerd; Grassl, Markus

    2018-04-01

    When quantum states are used to send classical information, the receiver performs a measurement on the signal states. The amount of information extracted is often not optimal due to the receiver's measurement scheme and experimental apparatus. For quantum nondemolition measurements, there is potentially some residual information in the postmeasurement state, while part of the information has been extracted and the rest is destroyed. Here, we propose a framework to characterize a quantum measurement by how much information it extracts and destroys, and how much information it leaves in the residual postmeasurement state. The concept is illustrated for several receivers discriminating coherent states.

  11. Question analysis for Indonesian comparative question

    NASA Astrophysics Data System (ADS)

    Saelan, A.; Purwarianti, A.; Widyantoro, D. H.

    2017-01-01

    Information seeking is one of human needs today. Comparing things using search engine surely take more times than search only one thing. In this paper, we analyzed comparative questions for comparative question answering system. Comparative question is a question that comparing two or more entities. We grouped comparative questions into 5 types: selection between mentioned entities, selection between unmentioned entities, selection between any entity, comparison, and yes or no question. Then we extracted 4 types of information from comparative questions: entity, aspect, comparison, and constraint. We built classifiers for classification task and information extraction task. Features used for classification task are bag of words, whether for information extraction, we used lexical, 2 previous and following words lexical, and previous label as features. We tried 2 scenarios: classification first and extraction first. For classification first, we used classification result as a feature for extraction. Otherwise, for extraction first, we used extraction result as features for classification. We found that the result would be better if we do extraction first before classification. For the extraction task, classification using SMO gave the best result (88.78%), while for classification, it is better to use naïve bayes (82.35%).

  12. Subject-based feature extraction by using fisher WPD-CSP in brain-computer interfaces.

    PubMed

    Yang, Banghua; Li, Huarong; Wang, Qian; Zhang, Yunyuan

    2016-06-01

    Feature extraction of electroencephalogram (EEG) plays a vital role in brain-computer interfaces (BCIs). In recent years, common spatial pattern (CSP) has been proven to be an effective feature extraction method. However, the traditional CSP has disadvantages of requiring a lot of input channels and the lack of frequency information. In order to remedy the defects of CSP, wavelet packet decomposition (WPD) and CSP are combined to extract effective features. But WPD-CSP method considers less about extracting specific features that are fitted for the specific subject. So a subject-based feature extraction method using fisher WPD-CSP is proposed in this paper. The idea of proposed method is to adapt fisher WPD-CSP to each subject separately. It mainly includes the following six steps: (1) original EEG signals from all channels are decomposed into a series of sub-bands using WPD; (2) average power values of obtained sub-bands are computed; (3) the specified sub-bands with larger values of fisher distance according to average power are selected for that particular subject; (4) each selected sub-band is reconstructed to be regarded as a new EEG channel; (5) all new EEG channels are used as input of the CSP and a six-dimensional feature vector is obtained by the CSP. The subject-based feature extraction model is so formed; (6) the probabilistic neural network (PNN) is used as the classifier and the classification accuracy is obtained. Data from six subjects are processed by the subject-based fisher WPD-CSP, the non-subject-based fisher WPD-CSP and WPD-CSP, respectively. Compared with non-subject-based fisher WPD-CSP and WPD-CSP, the results show that the proposed method yields better performance (sensitivity: 88.7±0.9%, and specificity: 91±1%) and the classification accuracy from subject-based fisher WPD-CSP is increased by 6-12% and 14%, respectively. The proposed subject-based fisher WPD-CSP method can not only remedy disadvantages of CSP by WPD but also discriminate helpless sub-bands for each subject and make remaining fewer sub-bands keep better separability by fisher distance, which leads to a higher classification accuracy than WPD-CSP method. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  13. Deconstructing thermodynamic parameters of a coupled system from site-specific observables.

    PubMed

    Chowdhury, Sandipan; Chanda, Baron

    2010-11-02

    Cooperative interactions mediate information transfer between structural domains of a protein molecule and are major determinants of protein function and modulation. The prevalent theories to understand the thermodynamic origins of cooperativity have been developed to reproduce the complex behavior of a global thermodynamic observable such as ligand binding or enzyme activity. However, in most cases the measurement of a single global observable cannot uniquely define all the terms that fully describe the energetics of the system. Here we establish a theoretical groundwork for analyzing protein thermodynamics using site-specific information. Our treatment involves extracting a site-specific parameter (defined as χ value) associated with a structural unit. We demonstrate that, under limiting conditions, the χ value is related to the direct interaction terms associated with the structural unit under observation and its intrinsic activation energy. We also introduce a site-specific interaction energy term (χ(diff)) that is a function of the direct interaction energy of that site with every other site in the system. When combined with site-directed mutagenesis and other molecular level perturbations, analyses of χ values of site-specific observables may provide valuable insights into protein thermodynamics and structure.

  14. Automated extraction of clinical traits of multiple sclerosis in electronic medical records

    PubMed Central

    Davis, Mary F; Sriram, Subramaniam; Bush, William S; Denny, Joshua C; Haines, Jonathan L

    2013-01-01

    Objectives The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course. Materials and methods We used four algorithms based on ICD-9 codes, text keywords, and medications to identify individuals with MS from a de-identified, research version of the EMR at Vanderbilt University. Using a training dataset of the records of 899 individuals, algorithms were constructed to identify and extract detailed information regarding the clinical course of MS from the text of the medical records, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale (EDSS) scores, timed 25-foot walk scores, and MS medications. Algorithms were evaluated on a test set validated by two independent reviewers. Results We identified 5789 individuals with MS. For all clinical traits extracted, precision was at least 87% and specificity was greater than 80%. Recall values for clinical subtype, EDSS scores, and timed 25-foot walk scores were greater than 80%. Discussion and conclusion This collection of clinical data represents one of the largest databases of detailed, clinical traits available for research on MS. This work demonstrates that detailed clinical information is recorded in the EMR and can be extracted for research purposes with high reliability. PMID:24148554

  15. Semantic Preview Benefit in English: Individual Differences in the Extraction and Use of Parafoveal Semantic Information

    ERIC Educational Resources Information Center

    Veldre, Aaron; Andrews, Sally

    2016-01-01

    Although there is robust evidence that skilled readers of English extract and use orthographic and phonological information from the parafovea to facilitate word identification, semantic preview benefits have been elusive. We sought to establish whether individual differences in the extraction and/or use of parafoveal semantic information could…

  16. A novel sample preparation and on-line HPLC-DAD-MS/MS-BCD analysis for rapid screening and characterization of specific enzyme inhibitors in herbal extracts: case study of α-glucosidase.

    PubMed

    Li, D Q; Zhao, J; Xie, J; Li, S P

    2014-01-01

    Drug discovery from complex mixture like Chinese herbs is a challenge and extensive false positives make the obtainment of specific bioactive compounds difficult. In the present study, a novel sample preparation method was proposed to rapidly reveal the specific bioactive compounds from complex mixtures using α-glucosidase as a case. Firstly, aqueous and methanol extracts of 500 traditional Chinese medicines were carried out with the aim of finding new sources of α-glucosidase inhibitors. As a result, the extracts of fruit of Terminalia chebula (FTC), flowers of Rosa rugosa (FRR) and Eugenia caryophyllata (FEC) as well as husk of Punica granatum (HPG) showed high inhibition on α-glucosidase. On-line liquid chromatography-diode array detection-tandem mass spectrometry and biochemical detection (HPLC-DAD-MS/MS-BCD) was performed to rapidly screen and characterize α-glucosidase inhibitors in these four extracts. After tentative identification, most of compounds with inhibitory activity in the investigated crude extracts were found to be tannins commonly recognized as non-specific enzyme inhibitors in vitro. Subsequently, the four extracts were treated with gelatin to improve specificity of the on-line system. Finally, two compounds with specific α-glucosidase inhibition were identified as corilagin and ellagic acid. The developed method could discover specific α-glucosidase inhibitors in complex mixtures such as plant extracts, which could also be used for discovery of specific inhibitors of other enzymes. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Identification of Proteins in Promastigote and Amastigote-like Leishmania Using an Immunoproteomic Approach

    PubMed Central

    Coelho, Vinicio T. S.; Oliveira, Jamil S.; Valadares, Diogo G.; Chávez-Fumagalli, Miguel A.; Duarte, Mariana C.; Lage, Paula S.; Soto, Manuel; Santoro, Marcelo M.; Tavares, Carlos A. P.; Fernandes, Ana Paula; Coelho, Eduardo A. F.

    2012-01-01

    Background The present study aims to identify antigens in protein extracts of promastigote and amastigote-like Leishmania (Leishmania) chagasi syn. L. (L.) infantum recognized by antibodies present in the sera of dogs with asymptomatic and symptomatic visceral leishmaniasis (VL). Methodology/Principal Findings Proteins recognized by sera samples were separated by two-dimensional electrophoresis (2DE) and identified by mass spectrometry. A total of 550 spots were observed in the 2DE gels, and approximately 104 proteins were identified. Several stage-specific proteins could be identified by either or both classes of sera, including, as expected, previously known proteins identified as diagnosis, virulence factors, drug targets, or vaccine candidates. Three, seven, and five hypothetical proteins could be identified in promastigote antigenic extracts; while two, eleven, and three hypothetical proteins could be identified in amastigote-like antigenic extracts by asymptomatic and symptomatic sera, as well as a combination of both, respectively. Conclusions/Significance The present study represents a significant contribution not only in identifying stage-specific L. infantum molecules, but also in revealing the expression of a large number of hypothetical proteins. Moreover, when combined, the identified proteins constitute a significant source of information for the improvement of diagnostic tools and/or vaccine development to VL. PMID:22272364

  18. Model of experts for decision support in the diagnosis of leukemia patients.

    PubMed

    Corchado, Juan M; De Paz, Juan F; Rodríguez, Sara; Bajo, Javier

    2009-07-01

    Recent advances in the field of biomedicine, specifically in the field of genomics, have led to an increase in the information available for conducting expression analysis. Expression analysis is a technique used in transcriptomics, a branch of genomics that deals with the study of messenger ribonucleic acid (mRNA) and the extraction of information contained in the genes. This increase in information is reflected in the exon arrays, which require the use of new techniques in order to extract the information. The purpose of this study is to provide a tool based on a mixture of experts model that allows the analysis of the information contained in the exon arrays, from which automatic classifications for decision support in diagnoses of leukemia patients can be made. The proposed model integrates several cooperative algorithms characterized for their efficiency for data processing, filtering, classification and knowledge extraction. The Cancer Institute of the University of Salamanca is making an effort to develop tools to automate the evaluation of data and to facilitate de analysis of information. This proposal is a step forward in this direction and the first step toward the development of a mixture of experts tool that integrates different cognitive and statistical approaches to deal with the analysis of exon arrays. The mixture of experts model presented within this work provides great capacities for learning and adaptation to the characteristics of the problem in consideration, using novel algorithms in each of the stages of the analysis process that can be easily configured and combined, and provides results that notably improve those provided by the existing methods for exon arrays analysis. The material used consists of data from exon arrays provided by the Cancer Institute that contain samples from leukemia patients. The methodology used consists of a system based on a mixture of experts. Each one of the experts incorporates novel artificial intelligence techniques that improve the process of carrying out various tasks such as pre-processing, filtering, classification and extraction of knowledge. This article will detail the manner in which individual experts are combined so that together they generate a system capable of extracting knowledge, thus permitting patients to be classified in an automatic and efficient manner that is also comprehensible for medical personnel. The system has been tested in a real setting and has been used for classifying patients who suffer from different forms of leukemia at various stages. Personnel from the Cancer Institute supervised and participated throughout the testing period. Preliminary results are promising, notably improving the results obtained with previously used tools. The medical staff from the Cancer Institute considers the tools that have been developed to be positive and very useful in a supporting capacity for carrying out their daily tasks. Additionally the mixture of experts supplies a tool for the extraction of necessary information in order to explain the associations that have been made in simple terms. That is, it permits the extraction of knowledge for each classification made and generalized in order to be used in subsequent classifications. This allows for a large amount of learning and adaptation within the proposed system.

  19. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

    PubMed Central

    Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John

    2008-01-01

    Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948

  20. Using Airborne Remote Sensing to Increase Situational Awareness in Civil Protection and Humanitarian Relief - the Importance of User Involvement

    NASA Astrophysics Data System (ADS)

    Römer, H.; Kiefl, R.; Henkel, F.; Wenxi, C.; Nippold, R.; Kurz, F.; Kippnich, U.

    2016-06-01

    Enhancing situational awareness in real-time (RT) civil protection and emergency response scenarios requires the development of comprehensive monitoring concepts combining classical remote sensing disciplines with geospatial information science. In the VABENE++ project of the German Aerospace Center (DLR) monitoring tools are being developed by which innovative data acquisition approaches are combined with information extraction as well as the generation and dissemination of information products to a specific user. DLR's 3K and 4k camera system which allow for a RT acquisition and pre-processing of high resolution aerial imagery are applied in two application examples conducted with end users: a civil protection exercise with humanitarian relief organisations and a large open-air music festival in cooperation with a festival organising company. This study discusses how airborne remote sensing can significantly contribute to both, situational assessment and awareness, focussing on the downstream processes required for extracting information from imagery and for visualising and disseminating imagery in combination with other geospatial information. Valuable user feedback and impetus for further developments has been obtained from both applications, referring to innovations in thematic image analysis (supporting festival site management) and product dissemination (editable web services). Thus, this study emphasises the important role of user involvement in application-related research, i.e. by aligning it closer to user's requirements.

  1. Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

    PubMed

    Qiu, John X; Yoon, Hong-Jun; Fearn, Paul A; Tourassi, Georgia D

    2018-01-01

    Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. In this study, we investigated deep learning and a convolutional neural network (CNN), for extracting ICD-O-3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro- and macro-F score increases of up to 0.132 and 0.226, respectively, when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on the CNN method and cancer site. These encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.

  2. Implementation and Initial Testing of Advanced Processing and Analysis Algorithms for Correlated Neutron Counting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santi, Peter Angelo; Cutler, Theresa Elizabeth; Favalli, Andrea

    In order to improve the accuracy and capabilities of neutron multiplicity counting, additional quantifiable information is needed in order to address the assumptions that are present in the point model. Extracting and utilizing higher order moments (Quads and Pents) from the neutron pulse train represents the most direct way of extracting additional information from the measurement data to allow for an improved determination of the physical properties of the item of interest. The extraction of higher order moments from a neutron pulse train required the development of advanced dead time correction algorithms which could correct for dead time effects inmore » all of the measurement moments in a self-consistent manner. In addition, advanced analysis algorithms have been developed to address specific assumptions that are made within the current analysis model, namely that all neutrons are created at a single point within the item of interest, and that all neutrons that are produced within an item are created with the same energy distribution. This report will discuss the current status of implementation and initial testing of the advanced dead time correction and analysis algorithms that have been developed in an attempt to utilize higher order moments to improve the capabilities of correlated neutron measurement techniques.« less

  3. An innovative approach for characteristic analysis and state-of-health diagnosis for a Li-ion cell based on the discrete wavelet transform

    NASA Astrophysics Data System (ADS)

    Kim, Jonghoon; Cho, B. H.

    2014-08-01

    This paper introduces an innovative approach to analyze electrochemical characteristics and state-of-health (SOH) diagnosis of a Li-ion cell based on the discrete wavelet transform (DWT). In this approach, the DWT has been applied as a powerful tool in the analysis of the discharging/charging voltage signal (DCVS) with non-stationary and transient phenomena for a Li-ion cell. Specifically, DWT-based multi-resolution analysis (MRA) is used for extracting information on the electrochemical characteristics in both time and frequency domain simultaneously. Through using the MRA with implementation of the wavelet decomposition, the information on the electrochemical characteristics of a Li-ion cell can be extracted from the DCVS over a wide frequency range. Wavelet decomposition based on the selection of the order 3 Daubechies wavelet (dB3) and scale 5 as the best wavelet function and the optimal decomposition scale is implemented. In particular, this present approach develops these investigations one step further by showing low and high frequency components (approximation component An and detail component Dn, respectively) extracted from variable Li-ion cells with different electrochemical characteristics caused by aging effect. Experimental results show the clearness of the DWT-based approach for the reliable diagnosis of the SOH for a Li-ion cell.

  4. Semantic and syntactic interoperability in online processing of big Earth observation data.

    PubMed

    Sudmanns, Martin; Tiede, Dirk; Lang, Stefan; Baraldi, Andrea

    2018-01-01

    The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation (EO) data is still unsolved. Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets. The proposed system wraps world models, (semantic interoperability) into OGC Web Processing Services (syntactic interoperability) for semantic online analyses. World models describe spatio-temporal entities and their relationships in a formal way. The proposed system serves as enabler for (1) technical interoperability using a standardised interface to be used by all types of clients and (2) allowing experts from different domains to develop complex analyses together as collaborative effort. Users are connecting the world models online to the data, which are maintained in a centralised storage as 3D spatio-temporal data cubes. It allows also non-experts to extract valuable information from EO data because data management, low-level interactions or specific software issues can be ignored. We discuss the concept of the proposed system, provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO, gridded, multi-temporal data sets (CORINE land cover).

  5. Semantic and syntactic interoperability in online processing of big Earth observation data

    PubMed Central

    Sudmanns, Martin; Tiede, Dirk; Lang, Stefan; Baraldi, Andrea

    2018-01-01

    ABSTRACT The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation (EO) data is still unsolved. Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets. The proposed system wraps world models, (semantic interoperability) into OGC Web Processing Services (syntactic interoperability) for semantic online analyses. World models describe spatio-temporal entities and their relationships in a formal way. The proposed system serves as enabler for (1) technical interoperability using a standardised interface to be used by all types of clients and (2) allowing experts from different domains to develop complex analyses together as collaborative effort. Users are connecting the world models online to the data, which are maintained in a centralised storage as 3D spatio-temporal data cubes. It allows also non-experts to extract valuable information from EO data because data management, low-level interactions or specific software issues can be ignored. We discuss the concept of the proposed system, provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO, gridded, multi-temporal data sets (CORINE land cover). PMID:29387171

  6. [Extraction of buildings three-dimensional information from high-resolution satellite imagery based on Barista software].

    PubMed

    Zhang, Pei-feng; Hu, Yuan-man; He, Hong-shi

    2010-05-01

    The demand for accurate and up-to-date spatial information of urban buildings is becoming more and more important for urban planning, environmental protection, and other vocations. Today's commercial high-resolution satellite imagery offers the potential to extract the three-dimensional information of urban buildings. This paper extracted the three-dimensional information of urban buildings from QuickBird imagery, and validated the precision of the extraction based on Barista software. It was shown that the extraction of three-dimensional information of the buildings from high-resolution satellite imagery based on Barista software had the advantages of low professional level demand, powerful universality, simple operation, and high precision. One pixel level of point positioning and height determination accuracy could be achieved if the digital elevation model (DEM) and sensor orientation model had higher precision and the off-Nadir View Angle was relatively perfect.

  7. On the Preservation of Unitarity during Black Hole Evolution and Information Extraction from its Interior

    NASA Astrophysics Data System (ADS)

    Pappas, Nikolaos D.

    2012-06-01

    For more than 30 years the discovery that black holes radiate like black bodies of specific temperature has triggered a multitude of puzzling questions concerning their nature and the fate of information that goes down the black hole during its lifetime. The most tricky issue in what is known as information loss paradox is the apparent violation of unitarity during the formation/evaporation process of black holes. A new idea is proposed based on the combination of our knowledge on Hawking radiation as well as the Einstein-Podolsky-Rosen phenomenon, that could resolve the paradox and spare physicists from the unpalatable idea that unitarity can ultimately be irreversibly violated even under special conditions.

  8. Investigation related to multispectral imaging systems

    NASA Technical Reports Server (NTRS)

    Nalepka, R. F.; Erickson, J. D.

    1974-01-01

    A summary of technical progress made during a five year research program directed toward the development of operational information systems based on multispectral sensing and the use of these systems in earth-resource survey applications is presented. Efforts were undertaken during this program to: (1) improve the basic understanding of the many facets of multispectral remote sensing, (2) develop methods for improving the accuracy of information generated by remote sensing systems, (3) improve the efficiency of data processing and information extraction techniques to enhance the cost-effectiveness of remote sensing systems, (4) investigate additional problems having potential remote sensing solutions, and (5) apply the existing and developing technology for specific users and document and transfer that technology to the remote sensing community.

  9. Applying traditional signal processing techniques to social media exploitation for situational understanding

    NASA Astrophysics Data System (ADS)

    Abdelzaher, Tarek; Roy, Heather; Wang, Shiguang; Giridhar, Prasanna; Al Amin, Md. Tanvir; Bowman, Elizabeth K.; Kolodny, Michael A.

    2016-05-01

    Signal processing techniques such as filtering, detection, estimation and frequency domain analysis have long been applied to extract information from noisy sensor data. This paper describes the exploitation of these signal processing techniques to extract information from social networks, such as Twitter and Instagram. Specifically, we view social networks as noisy sensors that report events in the physical world. We then present a data processing stack for detection, localization, tracking, and veracity analysis of reported events using social network data. We show using a controlled experiment that the behavior of social sources as information relays varies dramatically depending on context. In benign contexts, there is general agreement on events, whereas in conflict scenarios, a significant amount of collective filtering is introduced by conflicted groups, creating a large data distortion. We describe signal processing techniques that mitigate such distortion, resulting in meaningful approximations of actual ground truth, given noisy reported observations. Finally, we briefly present an implementation of the aforementioned social network data processing stack in a sensor network analysis toolkit, called Apollo. Experiences with Apollo show that our techniques are successful at identifying and tracking credible events in the physical world.

  10. Validation and application of quantitative PCR assays using host-specific Bacteroidales genetic markers for swine fecal pollution tracking.

    PubMed

    Fan, Lihua; Shuai, Jiangbing; Zeng, Ruoxue; Mo, Hongfei; Wang, Suhua; Zhang, Xiaofeng; He, Yongqiang

    2017-12-01

    Genome fragment enrichment (GFE) method was applied to identify host-specific bacterial genetic markers that differ among different fecal metagenomes. To enrich for swine-specific DNA fragments, swine fecal DNA composite (n = 34) was challenged against a DNA composite consisting of cow, human, goat, sheep, chicken, duck and goose fecal DNA extracts (n = 83). Bioinformatic analyses of 384 non-redundant swine enriched metagenomic sequences indicated a preponderance of Bacteroidales-like regions predicted to encode metabolism-associated, cellular processes and information storage and processing. After challenged against fecal DNA extracted from different animal sources, four sequences from the clone libraries targeting two Bacteroidales- (genes 1-38 and 3-53), a Clostridia- (gene 2-109) as well as a Bacilli-like sequence (gene 2-95), respectively, showed high specificity to swine feces based on PCR analysis. Host-specificity and host-sensitivity analysis confirmed that oligonucleotide primers and probes capable of annealing to select Bacteroidales-like sequences (1-38 and 3-53) exhibited high specificity (>90%) in quantitative PCR assays with 71 fecal DNAs from non-target animal sources. The two assays also demonstrated broad distributions of corresponding genetic markers (>94% positive) among 72 swine feces. After evaluation with environmental water samples from different areas, swine-targeted assays based on two Bacteroidales-like GFE sequences appear to be suitable quantitative tracing tools for swine fecal pollution. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. a Statistical Texture Feature for Building Collapse Information Extraction of SAR Image

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, H.; Chen, Q.; Liu, X.

    2018-04-01

    Synthetic Aperture Radar (SAR) has become one of the most important ways to extract post-disaster collapsed building information, due to its extreme versatility and almost all-weather, day-and-night working capability, etc. In view of the fact that the inherent statistical distribution of speckle in SAR images is not used to extract collapsed building information, this paper proposed a novel texture feature of statistical models of SAR images to extract the collapsed buildings. In the proposed feature, the texture parameter of G0 distribution from SAR images is used to reflect the uniformity of the target to extract the collapsed building. This feature not only considers the statistical distribution of SAR images, providing more accurate description of the object texture, but also is applied to extract collapsed building information of single-, dual- or full-polarization SAR data. The RADARSAT-2 data of Yushu earthquake which acquired on April 21, 2010 is used to present and analyze the performance of the proposed method. In addition, the applicability of this feature to SAR data with different polarizations is also analysed, which provides decision support for the data selection of collapsed building information extraction.

  12. Extraction of Built-Up Areas Using Convolutional Neural Networks and Transfer Learning from SENTINEL-2 Satellite Images

    NASA Astrophysics Data System (ADS)

    Bramhe, V. S.; Ghosh, S. K.; Garg, P. K.

    2018-04-01

    With rapid globalization, the extent of built-up areas is continuously increasing. Extraction of features for classifying built-up areas that are more robust and abstract is a leading research topic from past many years. Although, various studies have been carried out where spatial information along with spectral features has been utilized to enhance the accuracy of classification. Still, these feature extraction techniques require a large number of user-specific parameters and generally application specific. On the other hand, recently introduced Deep Learning (DL) techniques requires less number of parameters to represent more abstract aspects of the data without any manual effort. Since, it is difficult to acquire high-resolution datasets for applications that require large scale monitoring of areas. Therefore, in this study Sentinel-2 image has been used for built-up areas extraction. In this work, pre-trained Convolutional Neural Networks (ConvNets) i.e. Inception v3 and VGGNet are employed for transfer learning. Since these networks are trained on generic images of ImageNet dataset which are having very different characteristics from satellite images. Therefore, weights of networks are fine-tuned using data derived from Sentinel-2 images. To compare the accuracies with existing shallow networks, two state of art classifiers i.e. Gaussian Support Vector Machine (SVM) and Back-Propagation Neural Network (BP-NN) are also implemented. Both SVM and BP-NN gives 84.31 % and 82.86 % overall accuracies respectively. Inception-v3 and VGGNet gives 89.43 % of overall accuracy using fine-tuned VGGNet and 92.10 % when using Inception-v3. The results indicate high accuracy of proposed fine-tuned ConvNets on a 4-channel Sentinel-2 dataset for built-up area extraction.

  13. Hypoxia affects cellular responses to plant extracts.

    PubMed

    Liew, Sien-Yei; Stanbridge, Eric J; Yusoff, Khatijah; Shafee, Norazizah

    2012-11-21

    Microenvironmental conditions contribute towards varying cellular responses to plant extract treatments. Hypoxic cancer cells are known to be resistant to radio- and chemo-therapy. New therapeutic strategies specifically targeting these cells are needed. Plant extracts used in Traditional Chinese Medicine (TCM) can offer promising candidates. Despite their widespread usage, information on their effects in hypoxic conditions is still lacking. In this study, we examined the cytotoxicity of a series of known TCM plant extracts under normoxic versus hypoxic conditions. Pereskia grandifolia, Orthosiphon aristatus, Melastoma malabathricum, Carica papaya, Strobilanthes crispus, Gynura procumbens, Hydrocotyle sibthorpioides, Pereskia bleo and Clinacanthus nutans leaves were dried, blended into powder form, extracted in methanol and evaporated to produce crude extracts. Human Saos-2 osteosarcoma cells were treated with various concentrations of the plant extracts under normoxia or hypoxia (0.5% oxygen). 24h after treatment, an MTT assay was performed and the IC(50) values were calculated. Effect of the extracts on hypoxia inducible factor (HIF) activity was evaluated using a hypoxia-driven firefly luciferase reporter assay. The relative cytotoxicity of each plant extract on Saos-2 cells was different in hypoxic versus normoxic conditions. Hypoxia increased the IC(50) values for Pereskia grandifola and Orthosiphon aristatus extracts, but decreased the IC(50) values for Melastoma malabathricum and Carica papaya extracts. Extracts of Strobilanthes crispus, Gynura procumbens, Hydrocotyle sibthorpioides had equivalent cytotoxic effects under both conditions. Pereskia bleo and Clinacanthus nutans extracts were not toxic to cells within the concentration ranges tested. The most interesting result was noted for the Carica papaya extract, where its IC(50) in hypoxia was reduced by 3-fold when compared to the normoxic condition. This reduction was found to be associated with HIF inhibition. Hypoxia variably alters the cytotoxic effects of TCM plant extracts on cancer cells. Carica papaya showed enhanced cytotoxic effect on hypoxic cancer cells by inhibiting HIF activities. These findings provide a plausible approach to killing hypoxic cancer cells in solid tumors. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  14. Single molecule optical measurements of orientation and rotations of biological macromolecules.

    PubMed

    Shroder, Deborah Y; Lippert, Lisa G; Goldman, Yale E

    2016-11-22

    Subdomains of macromolecules often undergo large orientation changes during their catalytic cycles that are essential for their activity. Tracking these rearrangements in real time opens a powerful window into the link between protein structure and functional output. Site-specific labeling of individual molecules with polarized optical probes and measurement of their spatial orientation can give insight into the crucial conformational changes, dynamics, and fluctuations of macromolecules. Here we describe the range of single molecule optical technologies that can extract orientation information from these probes, review the relevant types of probes and labeling techniques, and highlight the advantages and disadvantages of these technologies for addressing specific inquiries.

  15. Using the TIGR gene index databases for biological discovery.

    PubMed

    Lee, Yuandan; Quackenbush, John

    2003-11-01

    The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

  16. Diode laser soldering using a lead-free filler material for electronic packaging structures

    NASA Astrophysics Data System (ADS)

    Chaminade, C.; Fogarassy, E.; Boisselier, D.

    2006-04-01

    As of today, several lead-free soldering pastes have been qualified for currently used soldering process. Regarding the new potential of laser-assisted soldering processes, the behaviour of the SnAgCu soldering paste requires, however, new investigations. In the first part of this study, the specific temperature profile of a laser soldering process is investigated using a high power diode laser (HPDL). These experimental results are compared to a thermal simulation developed for this specific application. The second part of this work deals with the diffusion of the tin-based filler material through the nickel barrier using the information extracted from the temperature simulations.

  17. Information Metacatalog for a Grid

    NASA Technical Reports Server (NTRS)

    Kolano, Paul

    2007-01-01

    SWIM is a Software Information Metacatalog that gathers detailed information about the software components and packages installed on a grid resource. Information is currently gathered for Executable and Linking Format (ELF) executables and shared libraries, Java classes, shell scripts, and Perl and Python modules. SWIM is built on top of the POUR framework, which is described in the preceding article. SWIM consists of a set of Perl modules for extracting software information from a system, an XML schema defining the format of data that can be added by users, and a POUR XML configuration file that describes how these elements are used to generate periodic, on-demand, and user-specified information. Periodic software information is derived mainly from the package managers used on each system. SWIM collects information from native package managers in FreeBSD, Solaris, and IRX as well as the RPM, Perl, and Python package managers on multiple platforms. Because not all software is available, or installed in package form, SWIM also crawls the set of relevant paths from the File System Hierarchy Standard that defines the standard file system structure used by all major UNIX distributions. Using these two techniques, the vast majority of software installed on a system can be located. SWIM computes the same information gathered by the periodic routines for specific files on specific hosts, and locates software on a system given only its name and type.

  18. MRI and unilateral NMR study of reindeer skin tanning processes.

    PubMed

    Zhu, Lizheng; Del Federico, Eleonora; Ilott, Andrew J; Klokkernes, Torunn; Kehlet, Cindie; Jerschow, Alexej

    2015-04-07

    The study of arctic or subarctic indigenous skin clothing material, known for its design and ability to keep the body warm, provides information about the tanning materials and techniques. The study also provides clues about the culture that created it, since tanning processes are often specific to certain indigenous groups. Untreated skin samples and samples treated with willow (Salix sp) bark extract and cod liver oil are compared in this study using both MRI and unilateral NMR techniques. The two types of samples show different proton spatial distributions and different relaxation times, which may also provide information about the tanning technique and aging behavior.

  19. Observation of the immune response of cells and tissue through multimodal label-free microscopy

    NASA Astrophysics Data System (ADS)

    Pavillon, Nicolas; Smith, Nicholas I.

    2017-02-01

    We present applications of a label-free approach to assess the immune response based on the combination of interferometric microscopy and Raman spectroscopy, which makes it possible to simultaneously acquire morphological and molecular information of live cells. We employ this approach to derive statistical models for predicting the activation state of macrophage cells based both on morphological parameters extracted from the high-throughput full-field quantitative phase imaging, and on the molecular content information acquired through Raman spectroscopy. We also employ a system for 3D imaging based on coherence gating, enabling specific targeting of the Raman channel to structures of interest within tissue.

  20. Extracting remaining information from an inconclusive result in optimal unambiguous state discrimination

    NASA Astrophysics Data System (ADS)

    Zhang, Gang; Yu, Long-Bao; Zhang, Wen-Hai; Cao, Zhuo-Liang

    2014-12-01

    In unambiguous state discrimination, the measurement results consist of the error-free results and an inconclusive result, and an inconclusive result is conventionally regarded as a useless remainder from which no information about initial states is extracted. In this paper, we investigate the problem of extracting remaining information from an inconclusive result, provided that the optimal total success probability is determined. We present three simple examples. An inconclusive answer in the first two examples can be extracted partial information, while an inconclusive answer in the third one cannot be. The initial states in the third example are defined as the highly symmetric states.

  1. Automatic information extraction from unstructured mammography reports using distributed semantics.

    PubMed

    Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L

    2018-02-01

    To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Gender differences in the impact of population-level alcohol policy interventions: evidence synthesis of systematic reviews.

    PubMed

    Fitzgerald, Niamh; Angus, Kathryn; Emslie, Carol; Shipton, Deborah; Bauld, Linda

    2016-10-01

    Consistent review-level evidence supports the effectiveness of population-level alcohol policies in reducing alcohol-related harms. Such policies interact with well-established social, cultural and biological differences in how men and women perceive, relate to and use alcohol, and with wider inequalities, in ways which may give rise to gender differences in policy effectiveness. This paper aimed to examine the extent to which gender-specific data and analyses were considered in, and are available from, systematic reviews of population-level alcohol policy interventions, and where possible, to conduct a narrative synthesis of relevant data. A prior systematic 'review of reviews' of population level alcohol interventions 2002-2012 was updated to May 2014, all gender-relevant data extracted, and the level and quality of gender reporting assessed. A narrative synthesis of extracted findings was conducted. Sixty-three systematic reviews, covering ten policy areas, were included. Five reviews (8%) consistently provided information on baseline participation by gender for each individual study in the review and twenty-nine (46%) reported some gender-specific information on the impact of the policies under consideration. Specific findings include evidence of possible gender differences in the impact of and exposure to alcohol marketing, and a failure to consider potential unintended consequences and harm to others in most reviews. Gender is poorly reported in systematic reviews of population-level interventions to reduce alcohol-related harm, hindering assessment of the intended and unintended effects of such policies on women and men. © 2016 Society for the Study of Addiction.

  3. Scan-To Output Validation: Towards a Standardized Geometric Quality Assessment of Building Information Models Based on Point Clouds

    NASA Astrophysics Data System (ADS)

    Bonduel, M.; Bassier, M.; Vergauwen, M.; Pauwels, P.; Klein, R.

    2017-11-01

    The use of Building Information Modeling (BIM) for existing buildings based on point clouds is increasing. Standardized geometric quality assessment of the BIMs is needed to make them more reliable and thus reusable for future users. First, available literature on the subject is studied. Next, an initial proposal for a standardized geometric quality assessment is presented. Finally, this method is tested and evaluated with a case study. The number of specifications on BIM relating to existing buildings is limited. The Levels of Accuracy (LOA) specification of the USIBD provides definitions and suggestions regarding geometric model accuracy, but lacks a standardized assessment method. A deviation analysis is found to be dependent on (1) the used mathematical model, (2) the density of the point clouds and (3) the order of comparison. Results of the analysis can be graphical and numerical. An analysis on macro (building) and micro (BIM object) scale is necessary. On macro scale, the complete model is compared to the original point cloud and vice versa to get an overview of the general model quality. The graphical results show occluded zones and non-modeled objects respectively. Colored point clouds are derived from this analysis and integrated in the BIM. On micro scale, the relevant surface parts are extracted per BIM object and compared to the complete point cloud. Occluded zones are extracted based on a maximum deviation. What remains is classified according to the LOA specification. The numerical results are integrated in the BIM with the use of object parameters.

  4. Recent progress of task-specific ionic liquids in chiral resolution and extraction of biological samples and metal ions.

    PubMed

    Wu, Datong; Cai, Pengfei; Zhao, Xiaoyong; Kong, Yong; Pan, Yuanjiang

    2018-01-01

    Ionic liquids have been functionalized for modern applications. The functional ionic liquids are also called task-specific ionic liquids. Various task-specific ionic liquids with certain groups have been constructed and exploited widely in the field of separation. To take advantage of their properties in separation science, task-specific ionic liquids are generally used in techniques such as liquid-liquid extraction, solid-phase extraction, gas chromatography, high-performance liquid chromatography, and capillary electrophoresis. This review mainly covers original research papers published in the last five years, and we will focus on task-specific ionic liquids as the chiral selectors in chiral resolution and as extractant or sensor for biological samples and metal ion purification. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Semantic Information Extraction of Lanes Based on Onboard Camera Videos

    NASA Astrophysics Data System (ADS)

    Tang, L.; Deng, T.; Ren, C.

    2018-04-01

    In the field of autonomous driving, semantic information of lanes is very important. This paper proposes a method of automatic detection of lanes and extraction of semantic information from onboard camera videos. The proposed method firstly detects the edges of lanes by the grayscale gradient direction, and improves the Probabilistic Hough transform to fit them; then, it uses the vanishing point principle to calculate the lane geometrical position, and uses lane characteristics to extract lane semantic information by the classification of decision trees. In the experiment, 216 road video images captured by a camera mounted onboard a moving vehicle were used to detect lanes and extract lane semantic information. The results show that the proposed method can accurately identify lane semantics from video images.

  6. Weak characteristic information extraction from early fault of wind turbine generator gearbox

    NASA Astrophysics Data System (ADS)

    Xu, Xiaoli; Liu, Xiuli

    2017-09-01

    Given the weak early degradation characteristic information during early fault evolution in gearbox of wind turbine generator, traditional singular value decomposition (SVD)-based denoising may result in loss of useful information. A weak characteristic information extraction based on μ-SVD and local mean decomposition (LMD) is developed to address this problem. The basic principle of the method is as follows: Determine the denoising order based on cumulative contribution rate, perform signal reconstruction, extract and subject the noisy part of signal to LMD and μ-SVD denoising, and obtain denoised signal through superposition. Experimental results show that this method can significantly weaken signal noise, effectively extract the weak characteristic information of early fault, and facilitate the early fault warning and dynamic predictive maintenance.

  7. Subject-based discriminative sparse representation model for detection of concealed information.

    PubMed

    Akhavan, Amir; Moradi, Mohammad Hassan; Vand, Safa Rafiei

    2017-05-01

    The use of machine learning approaches in concealed information test (CIT) plays a key role in the progress of this neurophysiological field. In this paper, we presented a new machine learning method for CIT in which each subject is considered independent of the others. The main goal of this study is to adapt the discriminative sparse models to be applicable for subject-based concealed information test. In order to provide sufficient discriminability between guilty and innocent subjects, we introduced a novel discriminative sparse representation model and its appropriate learning methods. For evaluation of the method forty-four subjects participated in a mock crime scenario and their EEG data were recorded. As the model input, in this study the recurrence plot features were extracted from single trial data of different stimuli. Then the extracted feature vectors were reduced using statistical dependency method. The reduced feature vector went through the proposed subject-based sparse model in which the discrimination power of sparse code and reconstruction error were applied simultaneously. Experimental results showed that the proposed approach achieved better performance than other competing discriminative sparse models. The classification accuracy, sensitivity and specificity of the presented sparsity-based method were about 93%, 91% and 95% respectively. Using the EEG data of a single subject in response to different stimuli types and with the aid of the proposed discriminative sparse representation model, one can distinguish guilty subjects from innocent ones. Indeed, this property eliminates the necessity of several subject EEG data in model learning and decision making for a specific subject. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Photo-CIDNP NMR spectroscopy of amino acids and proteins.

    PubMed

    Kuhn, Lars T

    2013-01-01

    Photo-chemically induced dynamic nuclear polarization (CIDNP) is a nuclear magnetic resonance (NMR) phenomenon which, among other things, is exploited to extract information on biomolecular structure via probing solvent-accessibilities of tryptophan (Trp), tyrosine (Tyr), and histidine (His) amino acid side chains both in polypeptides and proteins in solution. The effect, normally triggered by a (laser) light-induced photochemical reaction in situ, yields both positive and/or negative signal enhancements in the resulting NMR spectra which reflect the solvent exposure of these residues both in equilibrium and during structural transformations in "real time". As such, the method can offer - qualitatively and, to a certain extent, quantitatively - residue-specific structural and kinetic information on both the native and, in particular, the non-native states of proteins which, often, is not readily available from more routine NMR techniques. In this review, basic experimental procedures of the photo-CIDNP technique as applied to amino acids and proteins are discussed, recent improvements to the method highlighted, and future perspectives presented. First, the basic principles of the phenomenon based on the theory of the radical pair mechanism (RPM) are outlined. Second, a description of standard photo-CIDNP applications is given and it is shown how the effect can be exploited to extract residue-specific structural information on the conformational space sampled by unfolded or partially folded proteins on their "path" to the natively folded form. Last, recent methodological advances in the field are highlighted, modern applications of photo-CIDNP in the context of biological NMR evaluated, and an outlook into future perspectives of the method is given.

  9. Building a diabetes screening population data repository using electronic medical records.

    PubMed

    Tuan, Wen-Jan; Sheehy, Ann M; Smith, Maureen A

    2011-05-01

    There has been a rapid advancement of information technology in the area of clinical and population health data management since 2000. However, with the fast growth of electronic medical records (EMRs) and the increasing complexity of information systems, it has become challenging for researchers to effectively access, locate, extract, and analyze information critical to their research. This article introduces an outpatient encounter data framework designed to construct an EMR-based population data repository for diabetes screening research. The outpatient encounter data framework is developed on a hybrid data structure of entity-attribute-value models, dimensional models, and relational models. This design preserves a small number of subject-specific tables essential to key clinical constructs in the data repository. It enables atomic information to be maintained in a transparent and meaningful way to researchers and health care practitioners who need to access data and still achieve the same performance level as conventional data warehouse models. A six-layer information processing strategy is developed to extract and transform EMRs to the research data repository. The data structure also complies with both Health Insurance Portability and Accountability Act regulations and the institutional review board's requirements. Although developed for diabetes screening research, the design of the outpatient encounter data framework is suitable for other types of health service research. It may also provide organizations a tool to improve health care quality and efficiency, consistent with the "meaningful use" objectives of the Health Information Technology for Economic and Clinical Health Act. © 2011 Diabetes Technology Society.

  10. A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.

    PubMed

    Zarei, Roozbeh; He, Jing; Siuly, Siuly; Zhang, Yanchun

    2017-07-01

    Feature extraction of EEG signals plays a significant role in Brain-computer interface (BCI) as it can significantly affect the performance and the computational time of the system. The main aim of the current work is to introduce an innovative algorithm for acquiring reliable discriminating features from EEG signals to improve classification performances and to reduce the time complexity. This study develops a robust feature extraction method combining the principal component analysis (PCA) and the cross-covariance technique (CCOV) for the extraction of discriminatory information from the mental states based on EEG signals in BCI applications. We apply the correlation based variable selection method with the best first search on the extracted features to identify the best feature set for characterizing the distribution of mental state signals. To verify the robustness of the proposed feature extraction method, three machine learning techniques: multilayer perceptron neural networks (MLP), least square support vector machine (LS-SVM), and logistic regression (LR) are employed on the obtained features. The proposed methods are evaluated on two publicly available datasets. Furthermore, we evaluate the performance of the proposed methods by comparing it with some recently reported algorithms. The experimental results show that all three classifiers achieve high performance (above 99% overall classification accuracy) for the proposed feature set. Among these classifiers, the MLP and LS-SVM methods yield the best performance for the obtained feature. The average sensitivity, specificity and classification accuracy for these two classifiers are same, which are 99.32%, 100%, and 99.66%, respectively for the BCI competition dataset IVa and 100%, 100%, and 100%, for the BCI competition dataset IVb. The results also indicate the proposed methods outperform the most recently reported methods by at least 0.25% average accuracy improvement in dataset IVa. The execution time results show that the proposed method has less time complexity after feature selection. The proposed feature extraction method is very effective for getting representatives information from mental states EEG signals in BCI applications and reducing the computational complexity of classifiers by reducing the number of extracted features. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. The Extraction of Post-Earthquake Building Damage Informatiom Based on Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Chen, M.; Wang, X.; Dou, A.; Wu, X.

    2018-04-01

    The seismic damage information of buildings extracted from remote sensing (RS) imagery is meaningful for supporting relief and effective reduction of losses caused by earthquake. Both traditional pixel-based and object-oriented methods have some shortcoming in extracting information of object. Pixel-based method can't make fully use of contextual information of objects. Object-oriented method faces problem that segmentation of image is not ideal, and the choice of feature space is difficult. In this paper, a new stratage is proposed which combines Convolution Neural Network (CNN) with imagery segmentation to extract building damage information from remote sensing imagery. the key idea of this method includes two steps. First to use CNN to predicate the probability of each pixel and then integrate the probability within each segmentation spot. The method is tested through extracting the collapsed building and uncollapsed building from the aerial image which is acquired in Longtoushan Town after Ms 6.5 Ludian County, Yunnan Province earthquake. The results show that the proposed method indicates its effectiveness in extracting damage information of buildings after earthquake.

  12. The Patient-Reported Information Multidimensional Exploration (PRIME) Framework for Investigating Emotions and Other Factors of Prostate Cancer Patients with Low Intermediate Risk Based on Online Cancer Support Group Discussions.

    PubMed

    Bandaragoda, Tharindu; Ranasinghe, Weranja; Adikari, Achini; de Silva, Daswin; Lawrentschuk, Nathan; Alahakoon, Damminda; Persad, Raj; Bolton, Damien

    2018-06-01

    This study aimed to use the Patient Reported Information Multidimensional Exploration (PRIME) framework, a novel ensemble of machine-learning and deep-learning algorithms, to extract, analyze, and correlate self-reported information from Online Cancer Support Groups (OCSG) by patients (and partners of patients) with low intermediate-risk prostate cancer (PCa) undergoing radical prostatectomy (RP), external beam radiotherapy (EBRT), and active surveillance (AS), and to investigate its efficacy in quality-of-life (QoL) and emotion measures. From patient-reported information on 10 OCSG, the PRIME framework automatically filtered and extracted conversations on low intermediate-risk PCa with active user participation. Side effects as well as emotional and QoL outcomes for 6084 patients were analyzed. Side-effect profiles differed between the methods analyzed, with men after RP having more urinary and sexual side effects and men after EBRT having more bowel symptoms. Key findings from the analysis of emotional expressions showed that PCa patients younger than 40 years expressed significantly high positive and negative emotions compared with other age groups, that partners of patients expressed more negative emotions than the patients, and that selected cohorts (< 40 years, > 70 years, partners of patients) have frequently used the same terms to express their emotions, which is indicative of QoL issues specific to those cohorts. Despite recent advances in patient-centerd care, patient emotions are largely overlooked, especially in younger men with a diagnosis of PCa and their partners. The authors present a novel approach, the PRIME framework, to extract, analyze, and correlate key patient factors. This framework improves understanding of QoL and identifies low intermediate-risk PCa patients who require additional support.

  13. Multiple kernel learning in protein-protein interaction extraction from biomedical literature.

    PubMed

    Yang, Zhihao; Tang, Nan; Zhang, Xiao; Lin, Hongfei; Li, Yanpeng; Yang, Zhiwei

    2011-03-01

    Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature. We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information. Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the AImed corpus. As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information. Copyright © 2010 Elsevier B.V. All rights reserved.

  14. Accessing and Visualizing Satellite Data for Fisheries Managers in the Northeast Large Marine Ecosystem

    NASA Astrophysics Data System (ADS)

    Young Morse, R.; Mecray, E. L.; Pershing, A. J.

    2015-12-01

    As interest in the global change in temperatures and precipitation patterns grow, federal, state, and local agencies are turning to the delivery of 'actionable science and information' or 'information for decision-makers.' NOAA/National Centers for Environmental Information's Regional Climate Services program builds these bridges between the user of information and the producers of the information. With the Climate Data Records program, this study will present the extraction and use of the sea-surface temperature datasets specifically for access and use by fisheries managers in the north Atlantic. The work demonstrates the staged approach of accessing the records, converting their initial data formats into maps and charts, and the delivery of the data as a value-added information dashboard for use by managers. The questions to be reviewed include the ease of access, the delivery of open source software for visualizing the information, and a discussion on the roles of government and the private sector in the provision of climate information at different scales.

  15. Applying natural language processing techniques to develop a task-specific EMR interface for timely stroke thrombolysis: A feasibility study.

    PubMed

    Sung, Sheng-Feng; Chen, Kuanchin; Wu, Darren Philbert; Hung, Ling-Chien; Su, Yu-Hsiang; Hu, Ya-Han

    2018-04-01

    To reduce errors in determining eligibility for intravenous thrombolytic therapy (IVT) in stroke patients through use of an enhanced task-specific electronic medical record (EMR) interface powered by natural language processing (NLP) techniques. The information processing algorithm utilized MetaMap to extract medical concepts from IVT eligibility criteria and expanded the concepts using the Unified Medical Language System Metathesaurus. Concepts identified from clinical notes by MetaMap were compared to those from IVT eligibility criteria. The task-specific EMR interface displays IVT-relevant information by highlighting phrases that contain matched concepts. Clinical usability was assessed with clinicians staffing the acute stroke team by comparing user performance while using the task-specific and the current EMR interfaces. The algorithm identified IVT-relevant concepts with micro-averaged precisions, recalls, and F1 measures of 0.998, 0.812, and 0.895 at the phrase level and of 1, 0.972, and 0.986 at the document level. Users using the task-specific interface achieved a higher accuracy score than those using the current interface (91% versus 80%, p = 0.016) in assessing the IVT eligibility criteria. The completion time between the interfaces was statistically similar (2.46 min versus 1.70 min, p = 0.754). Although the information processing algorithm had room for improvement, the task-specific EMR interface significantly reduced errors in assessing IVT eligibility criteria. The study findings provide evidence to support an NLP enhanced EMR system to facilitate IVT decision-making by presenting meaningful and timely information to clinicians, thereby offering a new avenue for improvements in acute stroke care. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. 21 CFR 73.1100 - Cochineal extract; carmine.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... LISTING OF COLOR ADDITIVES EXEMPT FROM CERTIFICATION Drugs § 73.1100 Cochineal extract; carmine. (a) Identity and specifications. (1) The color additives cochineal extract and carmine shall conform in identity and specifications to the requirements of § 73.100(a) (1) and (2) and (b). (2) Color additive...

  17. Agricultural land cover mapping in the context of a geographically referenced digital information system. [Carroll, Macon, and Gentry Counties, Missouri

    NASA Technical Reports Server (NTRS)

    Stoner, E. R.

    1982-01-01

    The introduction of soil map information to the land cover mapping process can improve discrimination of land cover types and reduce confusion among crop types that may be caused by soil-specific management practices and background reflectance characteristics. Multiple dates of LANDSAT MSS digital were analyzed for three study areas in northern Missouri to produce cover types for major agricultural land cover classes. Digital data bases were then developed by adding ancillary data such as digitized soil and transportation network information to the LANDSAT-derived cover type map. Procedures were developed to manipulate the data base parameters to extract information applicable to user requirements. An agricultural information system combining such data can be used to determine the productive capacity of land to grow crops, fertilizer needs, chemical weed control rates, irrigation suitability, and trafficability of soil for planting.

  18. Mapping surface disturbance of energy-related infrastructure in southwest Wyoming--An assessment of methods

    USGS Publications Warehouse

    Germaine, Stephen S.; O'Donnell, Michael S.; Aldridge, Cameron L.; Baer, Lori; Fancher, Tammy; McBeth, Jamie; McDougal, Robert R.; Waltermire, Robert; Bowen, Zachary H.; Diffendorfer, James; Garman, Steven; Hanson, Leanne

    2012-01-01

    We evaluated how well three leading information-extraction software programs (eCognition, Feature Analyst, Feature Extraction) and manual hand digitization interpreted information from remotely sensed imagery of a visually complex gas field in Wyoming. Specifically, we compared how each mapped the area of and classified the disturbance features present on each of three remotely sensed images, including 30-meter-resolution Landsat, 10-meter-resolution SPOT (Satellite Pour l'Observation de la Terre), and 0.6-meter resolution pan-sharpened QuickBird scenes. Feature Extraction mapped the spatial area of disturbance features most accurately on the Landsat and QuickBird imagery, while hand digitization was most accurate on the SPOT imagery. Footprint non-overlap error was smallest on the Feature Analyst map of the Landsat imagery, the hand digitization map of the SPOT imagery, and the Feature Extraction map of the QuickBird imagery. When evaluating feature classification success against a set of ground-truthed control points, Feature Analyst, Feature Extraction, and hand digitization classified features with similar success on the QuickBird and SPOT imagery, while eCognition classified features poorly relative to the other methods. All maps derived from Landsat imagery classified disturbance features poorly. Using the hand digitized QuickBird data as a reference and making pixel-by-pixel comparisons, Feature Extraction classified features best overall on the QuickBird imagery, and Feature Analyst classified features best overall on the SPOT and Landsat imagery. Based on the entire suite of tasks we evaluated, Feature Extraction performed best overall on the Landsat and QuickBird imagery, while hand digitization performed best overall on the SPOT imagery, and eCognition performed worst overall on all three images. Error rates for both area measurements and feature classification were prohibitively high on Landsat imagery, while QuickBird was time and cost prohibitive for mapping large spatial extents. The SPOT imagery produced map products that were far more accurate than Landsat and did so at a far lower cost than QuickBird imagery. Consideration of degree of map accuracy required, costs associated with image acquisition, software, operator and computation time, and tradeoffs in the form of spatial extent versus resolution should all be considered when evaluating which combination of imagery and information-extraction method might best serve any given land use mapping project. When resources permit, attaining imagery that supports the highest classification and measurement accuracy possible is recommended.

  19. Tissues segmentation based on multi spectral medical images

    NASA Astrophysics Data System (ADS)

    Li, Ya; Wang, Ying

    2017-11-01

    Each band image contains the most obvious tissue feature according to the optical characteristics of different tissues in different specific bands for multispectral medical images. In this paper, the tissues were segmented by their spectral information at each multispectral medical images. Four Local Binary Patter descriptors were constructed to extract blood vessels based on the gray difference between the blood vessels and their neighbors. The segmented tissue in each band image was merged to a clear image.

  20. Project X: competitive intelligence data mining and analysis

    NASA Astrophysics Data System (ADS)

    Gilmore, John F.; Pagels, Michael A.; Palk, Justin

    2001-03-01

    Competitive Intelligence (CI) is a systematic and ethical program for gathering and analyzing information about your competitors' activities and general business trends to further your own company's goals. CI allows companies to gather extensive information on their competitors and to analyze what the competition is doing in order to maintain or gain a competitive edge. In commercial business this potentially translates into millions of dollars in annual savings or losses. The Internet provides an overwhelming portal of information for CI analysis. The problem is how a company can automate the translation of voluminous information into valuable and actionable knowledge. This paper describes Project X, an agent-based data mining system specifically developed for extracting and analyzing competitive information from the Internet. Project X gathers CI information from a variety of sources including online newspapers, corporate websites, industry sector reporting sites, speech archiving sites, video news casts, stock news sites, weather sites, and rumor sites. It uses individual industry specific (e.g., pharmaceutical, financial, aerospace, etc.) commercial sector ontologies to form the knowledge filtering and discovery structures/content required to filter and identify valuable competitive knowledge. Project X is described in detail and an example competitive intelligence case is shown demonstrating the system's performance and utility for business intelligence.

  1. The information content of cosmic microwave background anisotropies

    NASA Astrophysics Data System (ADS)

    Scott, Douglas; Contreras, Dagoberto; Narimani, Ali; Ma, Yin-Zhe

    2016-06-01

    The cosmic microwave background (CMB) contains perturbations that are close to Gaussian and isotropic. This means that its information content, in the sense of the ability to constrain cosmological models, is closely related to the number of modes probed in CMB power spectra. Rather than making forecasts for specific experimental setups, here we take a more pedagogical approach and ask how much information we can extract from the CMB if we are only limited by sample variance. We show that, compared with temperature measurements, the addition of E-mode polarization doubles the number of modes available out to a fixed maximum multipole, provided that all of the TT, TE, and EE power spectra are measured. However, the situation in terms of constraints on particular parameters is more complicated, as we explain and illustrate graphically. We also discuss the enhancements in information that can come from adding B-mode polarization and gravitational lensing. We show how well one could ever determine the basic cosmological parameters from CMB data compared with what has been achieved with Planck, which has already probed a substantial fraction of the TT information. Lastly, we look at constraints on neutrino mass as a specific example of how lensing information improves future prospects beyond the current 6-parameter model.

  2. The information content of cosmic microwave background anisotropies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scott, Douglas; Contreras, Dagoberto; Narimani, Ali

    The cosmic microwave background (CMB) contains perturbations that are close to Gaussian and isotropic. This means that its information content, in the sense of the ability to constrain cosmological models, is closely related to the number of modes probed in CMB power spectra. Rather than making forecasts for specific experimental setups, here we take a more pedagogical approach and ask how much information we can extract from the CMB if we are only limited by sample variance. We show that, compared with temperature measurements, the addition of E -mode polarization doubles the number of modes available out to a fixedmore » maximum multipole, provided that all of the TT , TE , and EE power spectra are measured. However, the situation in terms of constraints on particular parameters is more complicated, as we explain and illustrate graphically. We also discuss the enhancements in information that can come from adding B -mode polarization and gravitational lensing. We show how well one could ever determine the basic cosmological parameters from CMB data compared with what has been achieved with Planck , which has already probed a substantial fraction of the TT information. Lastly, we look at constraints on neutrino mass as a specific example of how lensing information improves future prospects beyond the current 6-parameter model.« less

  3. A contour-based shape descriptor for biomedical image classification and retrieval

    NASA Astrophysics Data System (ADS)

    You, Daekeun; Antani, Sameer; Demner-Fushman, Dina; Thoma, George R.

    2013-12-01

    Contours, object blobs, and specific feature points are utilized to represent object shapes and extract shape descriptors that can then be used for object detection or image classification. In this research we develop a shape descriptor for biomedical image type (or, modality) classification. We adapt a feature extraction method used in optical character recognition (OCR) for character shape representation, and apply various image preprocessing methods to successfully adapt the method to our application. The proposed shape descriptor is applied to radiology images (e.g., MRI, CT, ultrasound, X-ray, etc.) to assess its usefulness for modality classification. In our experiment we compare our method with other visual descriptors such as CEDD, CLD, Tamura, and PHOG that extract color, texture, or shape information from images. The proposed method achieved the highest classification accuracy of 74.1% among all other individual descriptors in the test, and when combined with CSD (color structure descriptor) showed better performance (78.9%) than using the shape descriptor alone.

  4. A mobile unit for memory retrieval in daily life based on image and sensor processing

    NASA Astrophysics Data System (ADS)

    Takesumi, Ryuji; Ueda, Yasuhiro; Nakanishi, Hidenobu; Nakamura, Atsuyoshi; Kakimori, Nobuaki

    2003-10-01

    We developed a Mobile Unit which purpose is to support memory retrieval of daily life. In this paper, we describe the two characteristic factors of this unit. (1)The behavior classification with an acceleration sensor. (2)Extracting the difference of environment with image processing technology. In (1), By analyzing power and frequency of an acceleration sensor which turns to gravity direction, the one's activities can be classified using some techniques to walk, stay, and so on. In (2), By extracting the difference between the beginning scene and the ending scene of a stay scene with image processing, the result which is done by user is recognized as the difference of environment. Using those 2 techniques, specific scenes of daily life can be extracted, and important information at the change of scenes can be realized to record. Especially we describe the effect to support retrieving important things, such as a thing left behind and a state of working halfway.

  5. An effective hand vein feature extraction method.

    PubMed

    Li, Haigang; Zhang, Qian; Li, Chengdong

    2015-01-01

    As a new authentication method developed years ago, vein recognition technology features the unique advantage of bioassay. This paper studies the specific procedure for the extraction of hand back vein characteristics. There are different positions used in the collecting process, so that a suitable intravenous regional orientation method is put forward, allowing the positioning area to be the same for all hand positions. In addition, to eliminate the pseudo vein area, the valley regional shape extraction operator can be improved and combined with multiple segmentation algorithms. The images should be segmented step by step, making the vein texture to appear clear and accurate. Lastly, the segmented images should be filtered, eroded, and refined. This process helps to filter the most of the pseudo vein information. Finally, a clear vein skeleton diagram is obtained, demonstrating the effectiveness of the algorithm. This paper presents a hand back vein region location method. This makes it possible to rotate and correct the image by working out the inclination degree of contour at the side of hand back.

  6. Photovoltaic panel extraction from very high-resolution aerial imagery using region-line primitive association analysis and template matching

    NASA Astrophysics Data System (ADS)

    Wang, Min; Cui, Qi; Sun, Yujie; Wang, Qiao

    2018-07-01

    In object-based image analysis (OBIA), object classification performance is jointly determined by image segmentation, sample or rule setting, and classifiers. Typically, as a crucial step to obtain object primitives, image segmentation quality significantly influences subsequent feature extraction and analyses. By contrast, template matching extracts specific objects from images and prevents shape defects caused by image segmentation. However, creating or editing templates is tedious and sometimes results in incomplete or inaccurate templates. In this study, we combine OBIA and template matching techniques to address these problems and aim for accurate photovoltaic panel (PVP) extraction from very high-resolution (VHR) aerial imagery. The proposed method is based on the previously proposed region-line primitive association framework, in which complementary information between region (segment) and line (straight line) primitives is utilized to achieve a more powerful performance than routine OBIA. Several novel concepts, including the mutual fitting ratio and best-fitting template based on region-line primitive association analyses, are proposed. Automatic template generation and matching method for PVP extraction from VHR imagery are designed for concept and model validation. Results show that the proposed method can successfully extract PVPs without any user-specified matching template or training sample. High user independency and accuracy are the main characteristics of the proposed method in comparison with routine OBIA and template matching techniques.

  7. The research of road and vehicle information extraction algorithm based on high resolution remote sensing image

    NASA Astrophysics Data System (ADS)

    Zhou, Tingting; Gu, Lingjia; Ren, Ruizhi; Cao, Qiong

    2016-09-01

    With the rapid development of remote sensing technology, the spatial resolution and temporal resolution of satellite imagery also have a huge increase. Meanwhile, High-spatial-resolution images are becoming increasingly popular for commercial applications. The remote sensing image technology has broad application prospects in intelligent traffic. Compared with traditional traffic information collection methods, vehicle information extraction using high-resolution remote sensing image has the advantages of high resolution and wide coverage. This has great guiding significance to urban planning, transportation management, travel route choice and so on. Firstly, this paper preprocessed the acquired high-resolution multi-spectral and panchromatic remote sensing images. After that, on the one hand, in order to get the optimal thresholding for image segmentation, histogram equalization and linear enhancement technologies were applied into the preprocessing results. On the other hand, considering distribution characteristics of road, the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) were used to suppress water and vegetation information of preprocessing results. Then, the above two processing result were combined. Finally, the geometric characteristics were used to completed road information extraction. The road vector extracted was used to limit the target vehicle area. Target vehicle extraction was divided into bright vehicles extraction and dark vehicles extraction. Eventually, the extraction results of the two kinds of vehicles were combined to get the final results. The experiment results demonstrated that the proposed algorithm has a high precision for the vehicle information extraction for different high resolution remote sensing images. Among these results, the average fault detection rate was about 5.36%, the average residual rate was about 13.60% and the average accuracy was approximately 91.26%.

  8. Shape information from glucose curves: Functional data analysis compared with traditional summary measures

    PubMed Central

    2013-01-01

    Background Plasma glucose levels are important measures in medical care and research, and are often obtained from oral glucose tolerance tests (OGTT) with repeated measurements over 2–3 hours. It is common practice to use simple summary measures of OGTT curves. However, different OGTT curves can yield similar summary measures, and information of physiological or clinical interest may be lost. Our mean aim was to extract information inherent in the shape of OGTT glucose curves, compare it with the information from simple summary measures, and explore the clinical usefulness of such information. Methods OGTTs with five glucose measurements over two hours were recorded for 974 healthy pregnant women in their first trimester. For each woman, the five measurements were transformed into smooth OGTT glucose curves by functional data analysis (FDA), a collection of statistical methods developed specifically to analyse curve data. The essential modes of temporal variation between OGTT glucose curves were extracted by functional principal component analysis. The resultant functional principal component (FPC) scores were compared with commonly used simple summary measures: fasting and two-hour (2-h) values, area under the curve (AUC) and simple shape index (2-h minus 90-min values, or 90-min minus 60-min values). Clinical usefulness of FDA was explored by regression analyses of glucose tolerance later in pregnancy. Results Over 99% of the variation between individually fitted curves was expressed in the first three FPCs, interpreted physiologically as “general level” (FPC1), “time to peak” (FPC2) and “oscillations” (FPC3). FPC1 scores correlated strongly with AUC (r=0.999), but less with the other simple summary measures (−0.42≤r≤0.79). FPC2 scores gave shape information not captured by simple summary measures (−0.12≤r≤0.40). FPC2 scores, but not FPC1 nor the simple summary measures, discriminated between women who did and did not develop gestational diabetes later in pregnancy. Conclusions FDA of OGTT glucose curves in early pregnancy extracted shape information that was not identified by commonly used simple summary measures. This information discriminated between women with and without gestational diabetes later in pregnancy. PMID:23327294

  9. Shape information from glucose curves: functional data analysis compared with traditional summary measures.

    PubMed

    Frøslie, Kathrine Frey; Røislien, Jo; Qvigstad, Elisabeth; Godang, Kristin; Bollerslev, Jens; Voldner, Nanna; Henriksen, Tore; Veierød, Marit B

    2013-01-17

    Plasma glucose levels are important measures in medical care and research, and are often obtained from oral glucose tolerance tests (OGTT) with repeated measurements over 2-3  hours. It is common practice to use simple summary measures of OGTT curves. However, different OGTT curves can yield similar summary measures, and information of physiological or clinical interest may be lost. Our mean aim was to extract information inherent in the shape of OGTT glucose curves, compare it with the information from simple summary measures, and explore the clinical usefulness of such information. OGTTs with five glucose measurements over two hours were recorded for 974 healthy pregnant women in their first trimester. For each woman, the five measurements were transformed into smooth OGTT glucose curves by functional data analysis (FDA), a collection of statistical methods developed specifically to analyse curve data. The essential modes of temporal variation between OGTT glucose curves were extracted by functional principal component analysis. The resultant functional principal component (FPC) scores were compared with commonly used simple summary measures: fasting and two-hour (2-h) values, area under the curve (AUC) and simple shape index (2-h minus 90-min values, or 90-min minus 60-min values). Clinical usefulness of FDA was explored by regression analyses of glucose tolerance later in pregnancy. Over 99% of the variation between individually fitted curves was expressed in the first three FPCs, interpreted physiologically as "general level" (FPC1), "time to peak" (FPC2) and "oscillations" (FPC3). FPC1 scores correlated strongly with AUC (r=0.999), but less with the other simple summary measures (-0.42≤r≤0.79). FPC2 scores gave shape information not captured by simple summary measures (-0.12≤r≤0.40). FPC2 scores, but not FPC1 nor the simple summary measures, discriminated between women who did and did not develop gestational diabetes later in pregnancy. FDA of OGTT glucose curves in early pregnancy extracted shape information that was not identified by commonly used simple summary measures. This information discriminated between women with and without gestational diabetes later in pregnancy.

  10. Research on Crowdsourcing Emergency Information Extraction of Based on Events' Frame

    NASA Astrophysics Data System (ADS)

    Yang, Bo; Wang, Jizhou; Ma, Weijun; Mao, Xi

    2018-01-01

    At present, the common information extraction method cannot extract the structured emergency event information accurately; the general information retrieval tool cannot completely identify the emergency geographic information; these ways also do not have an accurate assessment of these results of distilling. So, this paper proposes an emergency information collection technology based on event framework. This technique is to solve the problem of emergency information picking. It mainly includes emergency information extraction model (EIEM), complete address recognition method (CARM) and the accuracy evaluation model of emergency information (AEMEI). EIEM can be structured to extract emergency information and complements the lack of network data acquisition in emergency mapping. CARM uses a hierarchical model and the shortest path algorithm and allows the toponomy pieces to be joined as a full address. AEMEI analyzes the results of the emergency event and summarizes the advantages and disadvantages of the event framework. Experiments show that event frame technology can solve the problem of emergency information drawing and provides reference cases for other applications. When the emergency disaster is about to occur, the relevant departments query emergency's data that has occurred in the past. They can make arrangements ahead of schedule which defense and reducing disaster. The technology decreases the number of casualties and property damage in the country and world. This is of great significance to the state and society.

  11. Nucleic Acid Extraction from Synthetic Mars Analog Soils for in situ Life Detection

    NASA Astrophysics Data System (ADS)

    Mojarro, Angel; Ruvkun, Gary; Zuber, Maria T.; Carr, Christopher E.

    2017-08-01

    Biological informational polymers such as nucleic acids have the potential to provide unambiguous evidence of life beyond Earth. To this end, we are developing an automated in situ life-detection instrument that integrates nucleic acid extraction and nanopore sequencing: the Search for Extra-Terrestrial Genomes (SETG) instrument. Our goal is to isolate and determine the sequence of nucleic acids from extant or preserved life on Mars, if, for example, there is common ancestry to life on Mars and Earth. As is true of metagenomic analysis of terrestrial environmental samples, the SETG instrument must isolate nucleic acids from crude samples and then determine the DNA sequence of the unknown nucleic acids. Our initial DNA extraction experiments resulted in low to undetectable amounts of DNA due to soil chemistry-dependent soil-DNA interactions, namely adsorption to mineral surfaces, binding to divalent/trivalent cations, destruction by iron redox cycling, and acidic conditions. Subsequently, we developed soil-specific extraction protocols that increase DNA yields through a combination of desalting, utilization of competitive binders, and promotion of anaerobic conditions. Our results suggest that a combination of desalting and utilizing competitive binders may establish a "universal" nucleic acid extraction protocol suitable for analyzing samples from diverse soils on Mars.

  12. Characterization of Natural Dyes and Traditional Korean Silk Fabric by Surface Analytical Techniques.

    PubMed

    Lee, Jihye; Kang, Min Hwa; Lee, Kang-Bong; Lee, Yeonhee

    2013-05-15

    Time-of-flight secondary ion mass spectrometry (TOF-SIMS) and X-ray photoelectron spectroscopy (XPS) are well established surface techniques that provide both elemental and organic information from several monolayers of a sample surface, while also allowing depth profiling or image mapping to be carried out. The static TOF-SIMS with improved performances has expanded the application of TOF-SIMS to the study of a variety of organic, polymeric and biological materials. In this work, TOF-SIMS, XPS and Fourier Transform Infrared (FTIR) measurements were used to characterize commercial natural dyes and traditional silk fabric dyed with plant extracts dyes avoiding the time-consuming and destructive extraction procedures necessary for the spectrophotometric and chromatographic methods previously used. Silk textiles dyed with plant extracts were then analyzed for chemical and functional group identification of their dye components and mordants. TOF-SIMS spectra for the dyed silk fabric showed element ions from metallic mordants, specific fragment ions and molecular ions from plant-extracted dyes. The results of TOF-SIMS, XPS and FTIR are very useful as a reference database for comparison with data about traditional Korean silk fabric and to provide an understanding of traditional dyeing materials. Therefore, this study shows that surface techniques are useful for micro-destructive analysis of plant-extracted dyes and Korean dyed silk fabric.

  13. Nucleic Acid Extraction from Synthetic Mars Analog Soils for in situ Life Detection.

    PubMed

    Mojarro, Angel; Ruvkun, Gary; Zuber, Maria T; Carr, Christopher E

    2017-08-01

    Biological informational polymers such as nucleic acids have the potential to provide unambiguous evidence of life beyond Earth. To this end, we are developing an automated in situ life-detection instrument that integrates nucleic acid extraction and nanopore sequencing: the Search for Extra-Terrestrial Genomes (SETG) instrument. Our goal is to isolate and determine the sequence of nucleic acids from extant or preserved life on Mars, if, for example, there is common ancestry to life on Mars and Earth. As is true of metagenomic analysis of terrestrial environmental samples, the SETG instrument must isolate nucleic acids from crude samples and then determine the DNA sequence of the unknown nucleic acids. Our initial DNA extraction experiments resulted in low to undetectable amounts of DNA due to soil chemistry-dependent soil-DNA interactions, namely adsorption to mineral surfaces, binding to divalent/trivalent cations, destruction by iron redox cycling, and acidic conditions. Subsequently, we developed soil-specific extraction protocols that increase DNA yields through a combination of desalting, utilization of competitive binders, and promotion of anaerobic conditions. Our results suggest that a combination of desalting and utilizing competitive binders may establish a "universal" nucleic acid extraction protocol suitable for analyzing samples from diverse soils on Mars. Key Words: Life-detection instruments-Nucleic acids-Mars-Panspermia. Astrobiology 17, 747-760.

  14. Immunogenicity of Phleum pratense depigmented allergoid vaccines: experimental study in rabbits.

    PubMed

    Iraola, V; Gallego, M T; López-Matas, M A; Morales, M; Bel, I; García, N; Carnés, J

    2012-01-01

    Immunogenicity studies are based on accurate preclinical and clinical assessment of pharmaceutical products. The immunogenicity of modified allergen vaccines has not been fully elucidated, and the mechanisms involved are not well understood. Animal and human models have recently shown that depigmented allergoids induce specific immunoglobulin (Ig) G against individual allergens, thus supporting the clinical efficacy of these vaccines. The aim of this study was to investigate the production of specific IgG against individual antigens and their isoforms in rabbits injected with depigmented allergoid extracts of Phleum pratense pollen. Two New Zealand rabbits were immunized with depigmented-polymerized extracts adsorbed onto aluminum hydroxide (Depigoid) of P pratense. Rabbits were injected 3 times (35 microg Phl p 5). Specific IgG titers against native, depigmented, and depigmented-polymerized extracts and individual allergens (rPhl p 1 and rPhl p 5a) were analyzed by direct enzyme-linked immunosorbent assay. The capacity of these synthesized antibodies to recognize individual native and depigmented allergens and different isoforms was evaluated by immunoblot and 2-D analysis. All rabbits produced high titers of specific IgG against the 3 extracts. Rabbits injected with depigmented allergoids produced similar specific antibody titers against native, depigmented, and depigmented-polymerized extracts. Serum samples recognized individual allergens and their isoforms in the nonmodified extracts. Vaccines containing depigmented allergoid extracts of P pratense induce immunogenicity in vivo. The antibodies produced after injection of these extracts clearly recognized allergens and different isoforms in their native configuration.

  15. SPECIFICITY OF THE PRECIPITIN REACTION IN TOBACCO MOSAIC DISEASE.

    PubMed

    Beale, H P

    1931-09-30

    1. Leaf extracts of Sudan grass, Hippeastrum equestre Herb., lily, and Abutilon striatum Dicks. (A. Thompsoni hort.), each affected with its respective mosaic disease, and peach affected with yellows disease, were tested for their ability to precipitate antiserum for virus extract of tobacco mosaic disease. No precipitate occurred. 2. Nicotiana glutinosa L., N. rustica L., and Martynia louisiana Mill. were added to the list of hosts of tobacco mosaic virus which have been tested with antiserum for the same virus in N. tabacum L. var. Turkish. The object was to determine the presence or absence of material reacting with the specific precipitins such as that already demonstrated in extracts of tomato, pepper, and petunia affected with the same virus. The presence of specific substances was demonstrated in every case. 3. The viruses of ringspot and cucumber mosaic diseases were multiplied in Turkish tobacco and leaf extracts of the affected plants were used in turn as antigens in precipitin tests with antiserum for tobacco mosaic virus extract of Turkish tobacco. A slight precipitation resulted in the tubes containing undiluted antiserum and virus extract such as occurs when juice from normal tobacco is used with undiluted antiserum. No precipitate was demonstrable that was specific for virus extracts of tobacco affected with either ringspot or cucumber mosaic disease. 4. The results favor the interpretation that the specific antigenic substance in virus extract of tobacco mosaic disease is foreign antigenic material, possibly virus itself, not altered host protein.

  16. Fast fringe pattern phase demodulation using FIR Hilbert transformers

    NASA Astrophysics Data System (ADS)

    Gdeisat, Munther; Burton, David; Lilley, Francis; Arevalillo-Herráez, Miguel

    2016-01-01

    This paper suggests the use of FIR Hilbert transformers to extract the phase of fringe patterns. This method is computationally faster than any known spatial method that produces wrapped phase maps. Also, the algorithm does not require any parameters to be adjusted which are dependent upon the specific fringe pattern that is being processed, or upon the particular setup of the optical fringe projection system that is being used. It is therefore particularly suitable for full algorithmic automation. The accuracy and validity of the suggested method has been tested using both computer-generated and real fringe patterns. This novel algorithm has been proposed for its advantages in terms of computational processing speed as it is the fastest available method to extract the wrapped phase information from a fringe pattern.

  17. Playing biology's name game: identifying protein names in scientific text.

    PubMed

    Hanisch, Daniel; Fluck, Juliane; Mevissen, Heinz-Theodor; Zimmer, Ralf

    2003-01-01

    A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and an accompanying automatic curation procedure based on a simple token model of protein names. We designed an efficient search algorithm to analyze all abstracts in MEDLINE in a reasonable amount of time on standard computers. The parameters of our method are optimized using machine learning techniques. Used in conjunction, these ingredients lead to good search performance. A supplementary web page is available at http://cartan.gmd.de/ProMiner/.

  18. [A customized method for information extraction from unstructured text data in the electronic medical records].

    PubMed

    Bao, X Y; Huang, W J; Zhang, K; Jin, M; Li, Y; Niu, C Z

    2018-04-18

    There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing. Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method. For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94. This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.

  19. [The application of spectral geological profile in the alteration mapping].

    PubMed

    Li, Qing-Ting; Lin, Qi-Zhong; Zhang, Bing; Lu, Lin-Lin

    2012-07-01

    Geological section can help validating and understanding of the alteration information which is extracted from remote sensing images. In the paper, the concept of spectral geological profile was introduced based on the principle of geological section and the method of spectral information extraction. The spectral profile can realize the storage and vision of spectra along the geological profile, but the spectral geological spectral profile includes more information besides the information of spectral profile. The main object of spectral geological spectral profile is to obtain the distribution of alteration types and content of minerals along the profile which can be extracted from spectra measured by field spectrometer, especially for the spatial distribution and mode of alteration association. Technical method and work flow of alteration information extraction was studied for the spectral geological profile. The spectral geological profile was set up using the ground reflectance spectra and the alteration information was extracted from the remote sensing image with the help of typical spectra geological profile. At last the meaning and effect of the spectral geological profile was discussed.

  20. Quantifying hydrogen-deuterium exchange of meteoritic dicarboxylic acids during aqueous extraction

    NASA Astrophysics Data System (ADS)

    Fuller, M.; Huang, Y.

    2003-03-01

    Hydrogen isotope ratios of organic compounds in carbonaceous chondrites provide critical information about their origins and evolutionary history. However, because many of these compounds are obtained by aqueous extraction, the degree of hydrogen-deuterium (H/D) exchange that occurs during the process needs to be quantitatively evaluated. This study uses compound- specific hydrogen isotopic analysis to quantify the H/D exchange during aqueous extraction. Three common meteoritic dicarboxylic acids (succinic, glutaric, and 2-methyl glutaric acids) were refluxed under conditions simulating the extraction process. Changes in D values of the dicarboxylic acids were measured following the reflux experiments. A pseudo-first order rate law was used to model the H/D exchange rates which were then used to calculate the isotope exchange resulting from aqueous extraction. The degree of H/D exchange varies as a result of differences in molecular structure, the alkalinity of the extraction solution and presence/absence of meteorite powder. However, our model indicates that succinic, glutaric, and 2-methyl glutaric acids with a D of 1800 would experience isotope changes of 38, 10, and 6, respectively during the extraction process. Therefore, the overall change in D values of the dicarboxylic acids during the aqueous extraction process is negligible. We also demonstrate that H/D exchange occurs on the chiral -carbon in 2-methyl glutaric acid. The results suggest that the racemic mixture of 2-methyl glutaric acid in the Tagish Lake meteorite could result from post-synthesis aqueous alteration. The approach employed in this study can also be used to quantify H/D exchange for other important meteoritic compounds such as amino acids.

  1. Automated DNA extraction platforms offer solutions to challenges of assessing microbial biofouling in oil production facilities.

    PubMed

    Oldham, Athenia L; Drilling, Heather S; Stamps, Blake W; Stevenson, Bradley S; Duncan, Kathleen E

    2012-11-20

    The analysis of microbial assemblages in industrial, marine, and medical systems can inform decisions regarding quality control or mitigation. Modern molecular approaches to detect, characterize, and quantify microorganisms provide rapid and thorough measures unbiased by the need for cultivation. The requirement of timely extraction of high quality nucleic acids for molecular analysis is faced with specific challenges when used to study the influence of microorganisms on oil production. Production facilities are often ill equipped for nucleic acid extraction techniques, making the preservation and transportation of samples off-site a priority. As a potential solution, the possibility of extracting nucleic acids on-site using automated platforms was tested. The performance of two such platforms, the Fujifilm QuickGene-Mini80™ and Promega Maxwell®16 was compared to a widely used manual extraction kit, MOBIO PowerBiofilm™ DNA Isolation Kit, in terms of ease of operation, DNA quality, and microbial community composition. Three pipeline biofilm samples were chosen for these comparisons; two contained crude oil and corrosion products and the third transported seawater. Overall, the two more automated extraction platforms produced higher DNA yields than the manual approach. DNA quality was evaluated for amplification by quantitative PCR (qPCR) and end-point PCR to generate 454 pyrosequencing libraries for 16S rRNA microbial community analysis. Microbial community structure, as assessed by DGGE analysis and pyrosequencing, was comparable among the three extraction methods. Therefore, the use of automated extraction platforms should enhance the feasibility of rapidly evaluating microbial biofouling at remote locations or those with limited resources.

  2. Automated DNA extraction platforms offer solutions to challenges of assessing microbial biofouling in oil production facilities

    PubMed Central

    2012-01-01

    The analysis of microbial assemblages in industrial, marine, and medical systems can inform decisions regarding quality control or mitigation. Modern molecular approaches to detect, characterize, and quantify microorganisms provide rapid and thorough measures unbiased by the need for cultivation. The requirement of timely extraction of high quality nucleic acids for molecular analysis is faced with specific challenges when used to study the influence of microorganisms on oil production. Production facilities are often ill equipped for nucleic acid extraction techniques, making the preservation and transportation of samples off-site a priority. As a potential solution, the possibility of extracting nucleic acids on-site using automated platforms was tested. The performance of two such platforms, the Fujifilm QuickGene-Mini80™ and Promega Maxwell®16 was compared to a widely used manual extraction kit, MOBIO PowerBiofilm™ DNA Isolation Kit, in terms of ease of operation, DNA quality, and microbial community composition. Three pipeline biofilm samples were chosen for these comparisons; two contained crude oil and corrosion products and the third transported seawater. Overall, the two more automated extraction platforms produced higher DNA yields than the manual approach. DNA quality was evaluated for amplification by quantitative PCR (qPCR) and end-point PCR to generate 454 pyrosequencing libraries for 16S rRNA microbial community analysis. Microbial community structure, as assessed by DGGE analysis and pyrosequencing, was comparable among the three extraction methods. Therefore, the use of automated extraction platforms should enhance the feasibility of rapidly evaluating microbial biofouling at remote locations or those with limited resources. PMID:23168231

  3. Full-field optical coherence tomography used for security and document identity

    NASA Astrophysics Data System (ADS)

    Chang, Shoude; Mao, Youxin; Sherif, Sherif; Flueraru, Costel

    2006-09-01

    The optical coherence tomography (OCT) is an emerging technology for high-resolution cross-sectional imaging of 3D structures. In the past years, OCT systems have been used mainly for medical, especially ophthalmological diagnostics. Concerning the nature of OCT system being capable to explore the internal features of an object, we apply the OCT technology to directly retrieve the 2D information pre-stored in a multiple-layer information carrier. The standard depth-resolution of an OCT system is at micrometer level. If a 20mm by 20mm sampling area with a 1024 x 1024 CCD array is used in the OCT system having 10 μm, an information carrier having a volume of 20mm x 20mm x 2mm could contain 200 Mega-pixel images. Because of its tiny size and large information volume, the information carrier, with its OCT retrieving system, will have potential applications in documents security and object identification. In addition, as the information carrier can be made by low-scattering transparent material, the signal/noise ratio will be improved dramatically. As a consequence, the specific hardware and complicated software can also be greatly simplified. Owing to non-scanning along X-Y axis, the full-field OCT could be the simplest and most economic imaging system for extracting information from such a multilayer information carrier. In this paper, deign and implementation of a full-field OCT system is described and the related algorithms are introduced. In our experiments, a four layers information carrier is used, which contains 4 layers of image pattern, two text images and two fingerprint images. The extracted tomography images of each layer are also provided.

  4. Considering context: reliable entity networks through contextual relationship extraction

    NASA Astrophysics Data System (ADS)

    David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.

    2016-05-01

    Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.

  5. [The selective participation of brain-specific non-histone proteins of chromatin Np-3,5 during the reproduction of a defensive habit to food in edible snails].

    PubMed

    Kozyrev, S A; Nikitin, V P; Sherstnev, V V

    1991-01-01

    The role of brain-specific nonhistone proteins of chromatine Np-3.5 in the processes of reproduction of elaborated defensive habit to food was studied in previously learning snails. It was found, that gamma-globulines to Np-3.5 during tens of minutes inhibited behavioural and neuronal reactions elicited by a definite conditioned stimulus--carrot juice, without changing reactions to other conditioned stimulus--apple juice. gamma-globulines to other nonhistone proteins of chromatine did not influence the reproduction of food rejection habits. It was supposed that brain-specific nonhistone proteins of chromatine Np-3.5 were selectively involved in the molecular processes providing for neurophysiological mechanisms of information extraction from the long-term memory.

  6. 98 Specific IGE and IGG Binding to Allergoids of Phleum pratense

    PubMed Central

    Cases, Barbara; Fernandez-Caldas, Enrique; Tudela, Jose Ignacio; Fernandez, Eva Abel; Sanchez-Garcia, Silvia; Ibañez, M. Dolores; Escudero, Carmelo; Casanovas, Miguel

    2012-01-01

    Background Allergoids were first used in the decades of the 60s and 70s of the last century as an effective treatment of allergic respiratory diseases. Allergoids can be modified with formaldehyde or glutaraldehyde. Modified allergens, or allergoids, decrease the risk of adverse reactions while administering higher allergen doses. The objective of this study was to analyse specific IgE and IgG binding to glutaraldehyde modified and non-modified allergen extracts of Phleum pratense. Methods The sera of 69 patients sensitized to P. pratense were tested. All these patients had signs and symptoms of rhinoconjunctivitis with, or without, asthma in May and June of 2011. All these patients had positive skin prick tests to a standardized extract of P. pratense, and other grass species. Most patients were also sensitized to olive pollen. Specific IgE and IgG binding were analysed by direct ELISA against P. pratense native (non-modified) and allergoid extracts. Relative potencies were evaluated through ELISA inhibition assays, and the protein composition of non-modified and allergoid samples was determined by Mass Spectrometry (MS/MS). Results Mean Specific IgE levels against the native extract was 16.68 ± 11.65 Units (U) and against the allergoid: 7.26 ± 8.24 U (P < 0.0001; Mann-Whitney). On the other hand, mean specific IgG binding against the non-modified extract was 90.34 ± 75.57 U versus 76.19 ± 70.31 U against the allergoid (P = 0.16; Mann-Whitney). Linear regression coefficients obtained between immunoglobulin reactivity against both extracts were: r2 = 0.51 for specific IgE and r2 = 0.83 for specific IgG. An important decrease in the allergenic activity, measured by inhibition ELISA, was clearly observed. The MS/MS assay revealed the presence of the mayor allergen, and some isoforms, in non-modified and allergoid extracts. Conclusions Results obtained demonstrate that the glutaraldehyde polymerization process induces an important decrease in specific IgE binding to allergoids of P. pratense while there are no significant differences in specific IgG binding. The allergenic composition of the P. pratense allergoid was equivalent to the non-modified pollen extract.

  7. Automated Information Extraction on Treatment and Prognosis for Non-Small Cell Lung Cancer Radiotherapy Patients: Clinical Study.

    PubMed

    Zheng, Shuai; Jabbour, Salma K; O'Reilly, Shannon E; Lu, James J; Dong, Lihua; Ding, Lijuan; Xiao, Ying; Yue, Ning; Wang, Fusheng; Zou, Wei

    2018-02-01

    In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non-small cell lung cancer (NSCLC). We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies. ©Shuai Zheng, Salma K Jabbour, Shannon E O'Reilly, James J Lu, Lihua Dong, Lijuan Ding, Ying Xiao, Ning Yue, Fusheng Wang, Wei Zou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.02.2018.

  8. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus

    PubMed Central

    2015-01-01

    Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus. PMID:26099853

  9. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    PubMed

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.

  10. Reevaluation of microplastics extraction efficiency with the aim of Munich Plastic Sediment Separator.

    NASA Astrophysics Data System (ADS)

    Zobkov, Mikhail; Esiukova, Elena; Grave, Aleksei; Khatmullina, Liliya

    2017-04-01

    Invading of microplastics into marine environment is known as a global ecological threat. Specific density of microplastics can vary significantly depending on a polymer type, technological processes of its production, additives, weathering, and biofouling. Plastic particles can sink or float on the sea surface, but with time, most of drifting plastics become negatively buoyant and sink to the sea floor due to biofouling or adherence of denser particles. As a result, the seabed becomes the ultimate repository for microplastic particles and fibres. A study of microplastics content in aquatic sediments is an important source of information about ways of their migration, sink and accumulation zones. The Munich Plastic Sediment Separator (MPSS), proposed by Imhoff et al. (2012), is considered as the most effective tool for microplastic extraction. However, we observed that the numbers of marine microplastics extracted with this tool from different kinds of bottom sediments were significantly underestimated. We examined the extraction efficiency of the MPSS by adding artificial reference particles (ARPs) to marine sediment sample before the extraction procedure. Extraction was performed by two different methods: the modified NOAA method and using the MPSS. The separation solution with specific density 1.5 g/ml was used. Subsequent cleaning, drying and microscope detection procedures were identical. The microplastics content was determined in supernatant fraction, in the bulk of the extraction solution, in spoil dump fraction of MPSS and in instrument wash-out. While the extraction efficiency from natural sediments of ARPs by the MPSS was really high (100% in most cases), the extraction efficiency of marine microplastics was up to 10 times lower than that obtained with modified NOAA method for the same samples. Less than 40% of the total marine microplastics content has been successfully extracted with the MPSS. Large amounts of marine microplastics were found in the spoil dump and in the bulk solution fractions of the MPSS. Changes in stirring and separation periods had weak impact on the extraction efficiency of ARPs and marine microplastics. Until now, we are unable to find effective working procedures for adequate extraction of marine microplastics with the MPSS. The MPSS was found to be a useful tool for microplastics extraction from large sediment samples for qualitative analysis and to obtain examination specimens. Applying the MPSS for quantitative microplastics analysis requires further testing and elaboration of standardized extraction procedures. The research is supported by the Russian Science Foundation, grant number 15-17-10020 (project MARBLE). Imhof, H. K., Schmid, J., Niessner, R., Ivleva, N. P., Laforsch, C. 2012. A novel, highly efficient method for the separation and quantification of plastic particles in sediments of aquatic environments. Limnology and Oceanography: Methods, 10(7), 524-537. DOI 10.4319/lom.2012.10.524

  11. Fine-grained information extraction from German transthoracic echocardiography reports.

    PubMed

    Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank

    2015-11-12

    Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.

  12. Expressed information needs of patients with osteoporosis and/or fragility fractures: a systematic review.

    PubMed

    Raybould, Grace; Babatunde, Opeyemi; Evans, Amy L; Jordan, Joanne L; Paskins, Zoe

    2018-05-08

    This systematic review identified patients have unmet information needs about the nature of osteoporosis, medication, self-management and follow-up. Clinician knowledge and attitudes appear to be of key importance in determining whether these needs are met. Unmet information needs appear to have psychosocial consequences and result in poor treatment adherence. Patient education is an integral component of the management of osteoporosis, yet patients are dissatisfied with the information they receive and see this as an area of research priority. This study aimed to describe and summarise the specific expressed information needs of patients in previously published qualitative research. Using terms relating to osteoporosis, fragility fracture and information needs, seven databases were searched. Articles were screened using predefined inclusion and exclusion criteria. Full-text articles selected for inclusion underwent data extraction and quality appraisal. Findings were drawn together using narrative synthesis. The search identified 11,024 articles. Sixteen empirical studies were included in the review. Thematic analysis revealed three overarching themes relating to specific information needs, factors influencing whether information needs are met and the impact of unmet information needs. Specific information needs identified included the following: the nature of osteoporosis/fracture risk; medication; self-management and understanding the role of dual energy x-ray absorptiometry and follow-up. Perceived physician knowledge and attitudes, and the attitudes, beliefs and behaviours of patients were important factors in influencing whether information needs were met, in addition to contextual factors and the format of educational resources. Failure to elicit and address information needs appears to be associated with poor treatment adherence, deterioration of the doctor-patient relationship and important psychosocial consequences. This is the first study to describe the information needs of patients with osteoporosis and fracture, the impact of this information gap and possible solutions. Further research is needed to co-design and evaluate educational interventions with patients.

  13. Evaluation of certain food additives and contaminants.

    PubMed

    2004-01-01

    This report represents the conclusions of a Joint FAO/WHO Expert Committee convened to evaluate the safety of various food additives, with a view to recommending acceptable daily intakes (ADIs) and to prepare specifications for the identity and purity of food additives. The first part of the report contains a general discussion of the principles governing the toxicological evaluation of food additives (including flavouring agents) and contaminants, assessments of intake, and the establishment and revision of specifications for food additives. A summary follows of the Committee's evaluations of toxicological and intake data on various specific food additives (alpha-amylase from Bacillus lichenformis containing a genetically engineered alpha-amylase gene from B. licheniformis, annatto extracts, curcumin, diacetyl and fatty acid esters of glycerol, D-tagatose, laccase from Myceliophthora thermophila expressed in Aspergillus oryzae, mixed xylanase, beta-glucanase enzyme preparation produced by a strain of Humicola insolens, neotame, polyvinyl alcohol, quillaia extracts and xylanase from Thermomyces lanuginosus expressed in Fusarium venenatum), flavouring agents, a nutritional source of iron (ferrous glycinate, processed with citric acid), a disinfectant for drinking-water (sodium dichloroisocyanurate) and contaminants (cadmium and methylmercury). Annexed to the report are tables summarizing the Committee's recommendations for ADIs of the food additives, recommendations on the flavouring agents considered, and tolerable intakes of the contaminants considered, changes in the status of specifications and further information requested or desired.

  14. Statistical Design Model (SDM) of satellite thermal control subsystem

    NASA Astrophysics Data System (ADS)

    Mirshams, Mehran; Zabihian, Ehsan; Aarabi Chamalishahi, Mahdi

    2016-07-01

    Satellites thermal control, is a satellite subsystem that its main task is keeping the satellite components at its own survival and activity temperatures. Ability of satellite thermal control plays a key role in satisfying satellite's operational requirements and designing this subsystem is a part of satellite design. In the other hand due to the lack of information provided by companies and designers still doesn't have a specific design process while it is one of the fundamental subsystems. The aim of this paper, is to identify and extract statistical design models of spacecraft thermal control subsystem by using SDM design method. This method analyses statistical data with a particular procedure. To implement SDM method, a complete database is required. Therefore, we first collect spacecraft data and create a database, and then we extract statistical graphs using Microsoft Excel, from which we further extract mathematical models. Inputs parameters of the method are mass, mission, and life time of the satellite. For this purpose at first thermal control subsystem has been introduced and hardware using in the this subsystem and its variants has been investigated. In the next part different statistical models has been mentioned and a brief compare will be between them. Finally, this paper particular statistical model is extracted from collected statistical data. Process of testing the accuracy and verifying the method use a case study. Which by the comparisons between the specifications of thermal control subsystem of a fabricated satellite and the analyses results, the methodology in this paper was proved to be effective. Key Words: Thermal control subsystem design, Statistical design model (SDM), Satellite conceptual design, Thermal hardware

  15. Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing

    PubMed Central

    2013-01-01

    Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks. PMID:23742147

  16. Two-dimensional distributed-phase-reference protocol for quantum key distribution

    NASA Astrophysics Data System (ADS)

    Bacco, Davide; Christensen, Jesper Bjerge; Castaneda, Mario A. Usuga; Ding, Yunhong; Forchhammer, Søren; Rottwitt, Karsten; Oxenløwe, Leif Katsuo

    2016-12-01

    Quantum key distribution (QKD) and quantum communication enable the secure exchange of information between remote parties. Currently, the distributed-phase-reference (DPR) protocols, which are based on weak coherent pulses, are among the most practical solutions for long-range QKD. During the last 10 years, long-distance fiber-based DPR systems have been successfully demonstrated, although fundamental obstacles such as intrinsic channel losses limit their performance. Here, we introduce the first two-dimensional DPR-QKD protocol in which information is encoded in the time and phase of weak coherent pulses. The ability of extracting two bits of information per detection event, enables a higher secret key rate in specific realistic network scenarios. Moreover, despite the use of more dimensions, the proposed protocol remains simple, practical, and fully integrable.

  17. Integrating GPCR-specific information with full text articles

    PubMed Central

    2011-01-01

    Background With the continued growth in the volume both of experimental G protein-coupled receptor (GPCR) data and of the related peer-reviewed literature, the ability of GPCR researchers to keep up-to-date is becoming increasingly curtailed. Results We present work that integrates the biological data and annotations in the GPCR information system (GPCRDB) with next-generation methods for intelligently exploring, visualising and interacting with the scientific articles used to disseminate them. This solution automatically retrieves relevant information from GPCRDB and displays it both within and as an adjunct to an article. Conclusions This approach allows researchers to extract more knowledge more swiftly from literature. Importantly, it allows reinterpretation of data in articles published before GPCR structure data became widely available, thereby rescuing these valuable data from long-dormant sources. PMID:21910883

  18. Two-dimensional distributed-phase-reference protocol for quantum key distribution.

    PubMed

    Bacco, Davide; Christensen, Jesper Bjerge; Castaneda, Mario A Usuga; Ding, Yunhong; Forchhammer, Søren; Rottwitt, Karsten; Oxenløwe, Leif Katsuo

    2016-12-22

    Quantum key distribution (QKD) and quantum communication enable the secure exchange of information between remote parties. Currently, the distributed-phase-reference (DPR) protocols, which are based on weak coherent pulses, are among the most practical solutions for long-range QKD. During the last 10 years, long-distance fiber-based DPR systems have been successfully demonstrated, although fundamental obstacles such as intrinsic channel losses limit their performance. Here, we introduce the first two-dimensional DPR-QKD protocol in which information is encoded in the time and phase of weak coherent pulses. The ability of extracting two bits of information per detection event, enables a higher secret key rate in specific realistic network scenarios. Moreover, despite the use of more dimensions, the proposed protocol remains simple, practical, and fully integrable.

  19. Two-dimensional distributed-phase-reference protocol for quantum key distribution

    PubMed Central

    Bacco, Davide; Christensen, Jesper Bjerge; Castaneda, Mario A. Usuga; Ding, Yunhong; Forchhammer, Søren; Rottwitt, Karsten; Oxenløwe, Leif Katsuo

    2016-01-01

    Quantum key distribution (QKD) and quantum communication enable the secure exchange of information between remote parties. Currently, the distributed-phase-reference (DPR) protocols, which are based on weak coherent pulses, are among the most practical solutions for long-range QKD. During the last 10 years, long-distance fiber-based DPR systems have been successfully demonstrated, although fundamental obstacles such as intrinsic channel losses limit their performance. Here, we introduce the first two-dimensional DPR-QKD protocol in which information is encoded in the time and phase of weak coherent pulses. The ability of extracting two bits of information per detection event, enables a higher secret key rate in specific realistic network scenarios. Moreover, despite the use of more dimensions, the proposed protocol remains simple, practical, and fully integrable. PMID:28004821

  20. Global 21 cm Signal Extraction from Foreground and Instrumental Effects. I. Pattern Recognition Framework for Separation Using Training Sets

    NASA Astrophysics Data System (ADS)

    Tauscher, Keith; Rapetti, David; Burns, Jack O.; Switzer, Eric

    2018-02-01

    The sky-averaged (global) highly redshifted 21 cm spectrum from neutral hydrogen is expected to appear in the VHF range of ∼20–200 MHz and its spectral shape and strength are determined by the heating properties of the first stars and black holes, by the nature and duration of reionization, and by the presence or absence of exotic physics. Measurements of the global signal would therefore provide us with a wealth of astrophysical and cosmological knowledge. However, the signal has not yet been detected because it must be seen through strong foregrounds weighted by a large beam, instrumental calibration errors, and ionospheric, ground, and radio-frequency-interference effects, which we collectively refer to as “systematics.” Here, we present a signal extraction method for global signal experiments which uses Singular Value Decomposition of “training sets” to produce systematics basis functions specifically suited to each observation. Instead of requiring precise absolute knowledge of the systematics, our method effectively requires precise knowledge of how the systematics can vary. After calculating eigenmodes for the signal and systematics, we perform a weighted least square fit of the corresponding coefficients and select the number of modes to include by minimizing an information criterion. We compare the performance of the signal extraction when minimizing various information criteria and find that minimizing the Deviance Information Criterion most consistently yields unbiased fits. The methods used here are built into our widely applicable, publicly available Python package, pylinex, which analytically calculates constraints on signals and systematics from given data, errors, and training sets.

  1. Potency of Purple Sweet Potato’s Anthocyanin as Biosensor for Detection of Chemicals in Food Products

    NASA Astrophysics Data System (ADS)

    Wulandari, A.; Sunarti, TC; Fahma, F.; Noor, E.

    2018-05-01

    Bioactive compounds such as anthocyanin are a natural ingredient that produces color with typical specificity. Anthocyanin from Ayamurasaki purple sweet potato (Ipomoea batatas L.) was extracted in ethanol and used as crude anthocyanin extracts. The color of bioactive anthocyanin can be used as a biosensor to detect chemical of food products because it provides a unique color change. However, the each bioactive has a particular sensitivity and selectivity to a specific chemical, so it is necessary to select and test the selectivity. Six chemicals, which were sodium nitrite, sodium benzoate, sodium cyclamate (food additives), formalin, borax (illegal food preservatives), and residue fertilizer (urea) were tested and observed for its color change. The results showed that the bioactive anthocyanin of purple sweet potato with the concentration of ± 42.65 ppm had better selectivity and sensitivity to sodium nitrite with a detection limit of 100 ppm, where the color change response time ranged from 15-20 minutes. The selectivity and sensitivity of this bioactive can be used as the basic information for the development of biosensor.

  2. Classification of clinically useful sentences in clinical evidence resources.

    PubMed

    Morid, Mohammad Amin; Fiszman, Marcelo; Raja, Kalpana; Jonnalagadda, Siddhartha R; Del Fiol, Guilherme

    2016-04-01

    Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62). The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure.

    PubMed

    Garvin, Jennifer H; DuVall, Scott L; South, Brett R; Bray, Bruce E; Bolton, Daniel; Heavirland, Julia; Pickard, Steve; Heidenreich, Paul; Shen, Shuying; Weir, Charlene; Samore, Matthew; Goldstein, Mary K

    2012-01-01

    Left ventricular ejection fraction (EF) is a key component of heart failure quality measures used within the Department of Veteran Affairs (VA). Our goals were to build a natural language processing system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to validate the accuracy of the system using a comparison reference standard developed through human review. This project was a Translational Use Case Project within the VA Consortium for Healthcare Informatics. We created a set of regular expressions and rules to capture the EF using a random sample of 765 echocardiograms from seven VA medical centers. The documents were randomly assigned to two sets: a set of 275 used for training and a second set of 490 used for testing and validation. To establish the reference standard, two independent reviewers annotated all documents in both sets; a third reviewer adjudicated disagreements. System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9% (95% CI 87.7% to 90.0%), a positive predictive value of 95% (95% CI 94.2% to 95.9%), and an F measure of 91.9% (95% CI 91.2% to 92.7%). An EF value of <40% can be accurately identified in VA echocardiogram reports. An automated information extraction system can be used to accurately extract EF for quality measurement.

  4. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

    PubMed

    Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin

    2016-11-01

    Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.

  5. RadSearch: a RIS/PACS integrated query tool

    NASA Astrophysics Data System (ADS)

    Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

    2008-03-01

    Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.

  6. The uncertainty processing theory of motivation.

    PubMed

    Anselme, Patrick

    2010-04-02

    Most theories describe motivation using basic terminology (drive, 'wanting', goal, pleasure, etc.) that fails to inform well about the psychological mechanisms controlling its expression. This leads to a conception of motivation as a mere psychological state 'emerging' from neurophysiological substrates. However, the involvement of motivation in a large number of behavioural parameters (triggering, intensity, duration, and directedness) and cognitive abilities (learning, memory, decision, etc.) suggest that it should be viewed as an information processing system. The uncertainty processing theory (UPT) presented here suggests that motivation is the set of cognitive processes allowing organisms to extract information from the environment by reducing uncertainty about the occurrence of psychologically significant events. This processing of information is shown to naturally result in the highlighting of specific stimuli. The UPT attempts to solve three major problems: (i) how motivations can affect behaviour and cognition so widely, (ii) how motivational specificity for objects and events can result from nonspecific neuropharmacological causal factors (such as mesolimbic dopamine), and (iii) how motivational interactions can be conceived in psychological terms, irrespective of their biological correlates. The UPT is in keeping with the conceptual tradition of the incentive salience hypothesis while trying to overcome the shortcomings inherent to this view. Copyright 2009 Elsevier B.V. All rights reserved.

  7. Mining residential water and electricity demand data in Southern California to inform demand management strategies

    NASA Astrophysics Data System (ADS)

    Cominola, A.; Spang, E. S.; Giuliani, M.; Castelletti, A.; Loge, F. J.; Lund, J. R.

    2016-12-01

    Demand side management strategies are key to meet future water and energy demands in urban contexts, promote water and energy efficiency in the residential sector, provide customized services and communications to consumers, and reduce utilities' costs. Smart metering technologies allow gathering high temporal and spatial resolution water and energy consumption data and support the development of data-driven models of consumers' behavior. Modelling and predicting resource consumption behavior is essential to inform demand management. Yet, analyzing big, smart metered, databases requires proper data mining and modelling techniques, in order to extract useful information supporting decision makers to spot end uses towards which water and energy efficiency or conservation efforts should be prioritized. In this study, we consider the following research questions: (i) how is it possible to extract representative consumers' personalities out of big smart metered water and energy data? (ii) are residential water and energy consumption profiles interconnected? (iii) Can we design customized water and energy demand management strategies based on the knowledge of water- energy demand profiles and other user-specific psychographic information? To address the above research questions, we contribute a data-driven approach to identify and model routines in water and energy consumers' behavior. We propose a novel customer segmentation procedure based on data-mining techniques. Our procedure consists of three steps: (i) extraction of typical water-energy consumption profiles for each household, (ii) profiles clustering based on their similarity, and (iii) evaluation of the influence of candidate explanatory variables on the identified clusters. The approach is tested onto a dataset of smart metered water and energy consumption data from over 1000 households in South California. Our methodology allows identifying heterogeneous groups of consumers from the studied sample, as well as characterizing them with respect to consumption profiles features and socio- demographic information. Results show how such better understanding of the considered users' community allows spotting potentially interesting areas for water and energy demand management interventions.

  8. Cancer patients on Twitter: a novel patient community on social media.

    PubMed

    Sugawara, Yuya; Narimatsu, Hiroto; Hozawa, Atsushi; Shao, Li; Otani, Katsumi; Fukao, Akira

    2012-12-27

    Patients increasingly turn to the Internet for information on medical conditions, including clinical news and treatment options. In recent years, an online patient community has arisen alongside the rapidly expanding world of social media, or "Web 2.0." Twitter provides real-time dissemination of news, information, personal accounts and other details via a highly interactive form of social media, and has become an important online tool for patients. This medium is now considered to play an important role in the modern social community of online, "wired" cancer patients. Fifty-one highly influential "power accounts" belonging to cancer patients were extracted from a dataset of 731 Twitter accounts with cancer terminology in their profiles. In accordance with previously established methodology, "power accounts" were defined as those Twitter accounts with 500 or more followers. We extracted data on the cancer patient (female) with the most followers to study the specific relationships that existed between the user and her followers, and found that the majority of the examined tweets focused on greetings, treatment discussions, and other instances of psychological support. These findings went against our hypothesis that cancer patients' tweets would be centered on the dissemination of medical information and similar "newsy" details. At present, there exists a rapidly evolving network of cancer patients engaged in information exchange via Twitter. This network is valuable in the sharing of psychological support among the cancer community.

  9. The Roles for Prior Visual Experience and Age on the Extraction of Egocentric Distance.

    PubMed

    Wallin, Courtney P; Gajewski, Daniel A; Teplitz, Rebeca W; Mihelic Jaidzeka, Sandra; Philbeck, John W

    2017-01-01

    In a well-lit room, observers can generate well-constrained estimates of the distance to an object on the floor even with just a fleeting glimpse. Performance under these conditions is typically characterized by some underestimation but improves when observers have previewed the room. Such evidence suggests that information extracted from longer durations may be stored to contribute to the perception of distance at limited time frames. Here, we examined the possibility that this stored information is used differentially across age. Specifically, we posited that older adults would rely more than younger adults on information gathered and stored at longer glimpses to judge the distance of briefly glimpsed objects. We collected distance judgments from younger and older adults after brief target glimpses. Half of the participants were provided 20-s previews of the testing room in advance; the other half received no preview. Performance benefits were observed for all individuals with prior visual experience, and these were moderately more pronounced for the older adults. The results suggest that observers store contextual information gained from longer viewing durations to aid in the perception of distance at brief glimpses, and that this memory becomes more important with age. © The Author 2016. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Tuberculosis diagnosis support analysis for precarious health information systems.

    PubMed

    Orjuela-Cañón, Alvaro David; Camargo Mendoza, Jorge Eliécer; Awad García, Carlos Enrique; Vergara Vela, Erika Paola

    2018-04-01

    Pulmonary tuberculosis is a world emergency for the World Health Organization. Techniques and new diagnosis tools are important to battle this bacterial infection. There have been many advances in all those fields, but in developing countries such as Colombia, where the resources and infrastructure are limited, new fast and less expensive strategies are increasingly needed. Artificial neural networks are computational intelligence techniques that can be used in this kind of problems and offer additional support in the tuberculosis diagnosis process, providing a tool to medical staff to make decisions about management of subjects under suspicious of tuberculosis. A database extracted from 105 subjects with precarious information of people under suspect of pulmonary tuberculosis was used in this study. Data extracted from sex, age, diabetes, homeless, AIDS status and a variable with clinical knowledge from the medical personnel were used. Models based on artificial neural networks were used, exploring supervised learning to detect the disease. Unsupervised learning was used to create three risk groups based on available information. Obtained results are comparable with traditional techniques for detection of tuberculosis, showing advantages such as fast and low implementation costs. Sensitivity of 97% and specificity of 71% where achieved. Used techniques allowed to obtain valuable information that can be useful for physicians who treat the disease in decision making processes, especially under limited infrastructure and data. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Geographical Information Extraction With Remote Sensing. Part III - Appendices.

    DTIC Science & Technology

    1998-08-01

    images available. TNO report B.72 FEL-98-A077 Appendix B DA020 Barren ground ( kale grond) Definition: Land surface void of vegetation or other specific...BH 100I/BH 140 B13 dam BI020LJDBO90 Bat area lake/ponds BHO80 Ba2 inundation BH090 E Eal area barren ground DA020 Vegetation /soil Ea2 cropland EAO1O...various categories like private, context military, agricultural etc (see FACC 1 ). VEG vegetation characteristics type of plant or plantings, like

  12. Direct profiling of environmental microbial populations by thermal dissociation analysis of native rRNAs hybridized to oligonucleotide microarrays

    NASA Technical Reports Server (NTRS)

    El Fantroussi, Said; Urakawa, Hidetoshi; Bernhard, Anne E.; Kelly, John J.; Noble, Peter A.; Smidt, H.; Yershov, G. M.; Stahl, David A.

    2003-01-01

    Oligonucleotide microarrays were used to profile directly extracted rRNA from environmental microbial populations without PCR amplification. In our initial inspection of two distinct estuarine study sites, the hybridization patterns were reproducible and varied between estuarine sediments of differing salinities. The determination of a thermal dissociation curve (i.e., melting profile) for each probe-target duplex provided information on hybridization specificity, which is essential for confirming adequate discrimination between target and nontarget sequences.

  13. A BPM calibration procedure using TBT data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, M.J.; Crisp, J.; Prieto, P.

    2007-06-01

    Accurate BPM calibration is crucial for lattice analysis. It is also reassuring when the calibration can be independently verified. This paper outlines a procedure that can extract BPM calibration information from TBT orbit data. The procedure is developed as an extension to the Turn-By-Turn lattice analysis [1]. Its application to data from both Recycler Ring and Main Injector (MI) at Fermilab have produced very encouraging results. Some specifics in hardware design will be mentioned to contrast that of analysis results.

  14. Natural Language Processing as a Discipline at LLNL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Firpo, M A

    The field of Natural Language Processing (NLP) is described as it applies to the needs of LLNL in handling free-text. The state of the practice is outlined with the emphasis placed on two specific aspects of NLP: Information Extraction and Discourse Integration. A brief description is included of the NLP applications currently being used at LLNL. A gap analysis provides a look at where the technology needs work in order to meet the needs of LLNL. Finally, recommendations are made to meet these needs.

  15. Final Report on the Creation of the Wind Integration National Dataset (WIND) Toolkit and API: October 1, 2013 - September 30, 2015

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hodge, Bri-Mathias

    2016-04-08

    The primary objective of this work was to create a state-of-the-art national wind resource data set and to provide detailed wind plant output data for specific sites based on that data set. Corresponding retrospective wind forecasts were also included at all selected locations. The combined information from these activities was used to create the Wind Integration National Dataset (WIND), and an extraction tool was developed to allow web-based data access.

  16. Whole-Transcriptome Analysis of Verocytotoxigenic Escherichia coli O157:H7 (Sakai) Suggests Plant-Species-Specific Metabolic Responses on Exposure to Spinach and Lettuce Extracts

    PubMed Central

    Crozier, Louise; Hedley, Pete E.; Morris, Jenny; Wagstaff, Carol; Andrews, Simon C.; Toth, Ian; Jackson, Robert W.; Holden, Nicola J.

    2016-01-01

    Verocytotoxigenic Escherichia coli (VTEC) can contaminate crop plants, potentially using them as secondary hosts, which can lead to food-borne infection. Currently, little is known about the influence of the specific plant species on the success of bacterial colonization. As such, we compared the ability of the VTEC strain, E. coli O157:H7 ‘Sakai,’ to colonize the roots and leaves of four leafy vegetables: spinach (Spinacia oleracea), lettuce (Lactuca sativa), vining green pea (Pisum sativum), and prickly lettuce (Lactuca serriola), a wild relative of domesticated lettuce. Also, to determine the drivers of the initial response on interaction with plant tissue, the whole transcriptome of E. coli O157:H7 Sakai was analyzed following exposure to plant extracts of varying complexity (spinach leaf lysates or root exudates, and leaf cell wall polysaccharides from spinach or lettuce). Plant extracts were used to reduce heterogeneity inherent in plant–microbe interactions and remove the effect of plant immunity. This dual approach provided information on the initial adaptive response of E. coli O157:H7 Sakai to the plant environment together with the influence of the living plant during bacterial establishment and colonization. Results showed that both the plant tissue type and the plant species strongly influence the short-term (1 h) transcriptional response to extracts as well as longer-term (10 days) plant colonization or persistence. We show that propagation temperature (37 vs. 18°C) has a major impact on the expression profile and therefore pre-adaptation of bacteria to a plant-relevant temperature is necessary to avoid misleading temperature-dependent wholescale gene-expression changes in response to plant material. For each of the plant extracts tested, the largest group of (annotated) differentially regulated genes were associated with metabolism. However, large-scale differences in the metabolic and biosynthetic pathways between treatment types indicate specificity in substrate utilization. Induction of stress-response genes reflected the apparent physiological status of the bacterial genes in each extract, as a result of glutamate-dependent acid resistance, nutrient stress, or translational stalling. A large proportion of differentially regulated genes are uncharacterized (annotated as hypothetical), which could indicate yet to be described functional roles associated with plant interaction for E. coli O157:H7 Sakai. PMID:27462311

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Liping; Zhu, Fulong, E-mail: zhufulong@hust.edu.cn; Duan, Ke

    Ultrasonic waves are widely used, with applications including the medical, military, and chemical fields. However, there are currently no effective methods for ultrasonic power measurement. Previously, ultrasonic power measurement has been reliant on mechanical methods such as hydrophones and radiation force balances. This paper deals with ultrasonic power measurement based on an unconventional method: acousto-optic interaction. Compared with mechanical methods, the optical method has a greater ability to resist interference and also has reduced environmental requirements. Therefore, this paper begins with an experimental determination of the acoustic power in water contained in a glass tank using a set of opticalmore » devices. Because the light intensity of the diffraction image generated by acousto-optic interaction contains the required ultrasonic power information, specific software was written to extract the light intensity information from the image through a combination of filtering, binarization, contour extraction, and other image processing operations. The power value can then be obtained rapidly by processing the diffraction image using a computer. The results of this work show that the optical method offers advantages that include accuracy, speed, and a noncontact measurement method.« less

  18. Low-Dimensional Feature Representation for Instrument Identification

    NASA Astrophysics Data System (ADS)

    Ihara, Mizuki; Maeda, Shin-Ichi; Ikeda, Kazushi; Ishii, Shin

    For monophonic music instrument identification, various feature extraction and selection methods have been proposed. One of the issues toward instrument identification is that the same spectrum is not always observed even in the same instrument due to the difference of the recording condition. Therefore, it is important to find non-redundant instrument-specific features that maintain information essential for high-quality instrument identification to apply them to various instrumental music analyses. For such a dimensionality reduction method, the authors propose the utilization of linear projection methods: local Fisher discriminant analysis (LFDA) and LFDA combined with principal component analysis (PCA). After experimentally clarifying that raw power spectra are actually good for instrument classification, the authors reduced the feature dimensionality by LFDA or by PCA followed by LFDA (PCA-LFDA). The reduced features achieved reasonably high identification performance that was comparable or higher than those by the power spectra and those achieved by other existing studies. These results demonstrated that our LFDA and PCA-LFDA can successfully extract low-dimensional instrument features that maintain the characteristic information of the instruments.

  19. Measures of Microbial Biomass for Soil Carbon Decomposition Models

    NASA Astrophysics Data System (ADS)

    Mayes, M. A.; Dabbs, J.; Steinweg, J. M.; Schadt, C. W.; Kluber, L. A.; Wang, G.; Jagadamma, S.

    2014-12-01

    Explicit parameterization of the decomposition of plant inputs and soil organic matter by microbes is becoming more widely accepted in models of various complexity, ranging from detailed process models to global-scale earth system models. While there are multiple ways to measure microbial biomass, chloroform fumigation-extraction (CFE) is commonly used to parameterize models.. However CFE is labor- and time-intensive, requires toxic chemicals, and it provides no specific information about the composition or function of the microbial community. We investigated correlations between measures of: CFE; DNA extraction yield; QPCR base-gene copy numbers for Bacteria, Fungi and Archaea; phospholipid fatty acid analysis; and direct cell counts to determine the potential for use as proxies for microbial biomass. As our ultimate goal is to develop a reliable, more informative, and faster methods to predict microbial biomass for use in models, we also examined basic soil physiochemical characteristics including texture, organic matter content, pH, etc. to identify multi-factor predictive correlations with one or more measures of the microbial community. Our work will have application to both microbial ecology studies and the next generation of process and earth system models.

  20. Ultrasonic power measurement system based on acousto-optic interaction.

    PubMed

    He, Liping; Zhu, Fulong; Chen, Yanming; Duan, Ke; Lin, Xinxin; Pan, Yongjun; Tao, Jiaquan

    2016-05-01

    Ultrasonic waves are widely used, with applications including the medical, military, and chemical fields. However, there are currently no effective methods for ultrasonic power measurement. Previously, ultrasonic power measurement has been reliant on mechanical methods such as hydrophones and radiation force balances. This paper deals with ultrasonic power measurement based on an unconventional method: acousto-optic interaction. Compared with mechanical methods, the optical method has a greater ability to resist interference and also has reduced environmental requirements. Therefore, this paper begins with an experimental determination of the acoustic power in water contained in a glass tank using a set of optical devices. Because the light intensity of the diffraction image generated by acousto-optic interaction contains the required ultrasonic power information, specific software was written to extract the light intensity information from the image through a combination of filtering, binarization, contour extraction, and other image processing operations. The power value can then be obtained rapidly by processing the diffraction image using a computer. The results of this work show that the optical method offers advantages that include accuracy, speed, and a noncontact measurement method.

  1. Ultrasonic power measurement system based on acousto-optic interaction

    NASA Astrophysics Data System (ADS)

    He, Liping; Zhu, Fulong; Chen, Yanming; Duan, Ke; Lin, Xinxin; Pan, Yongjun; Tao, Jiaquan

    2016-05-01

    Ultrasonic waves are widely used, with applications including the medical, military, and chemical fields. However, there are currently no effective methods for ultrasonic power measurement. Previously, ultrasonic power measurement has been reliant on mechanical methods such as hydrophones and radiation force balances. This paper deals with ultrasonic power measurement based on an unconventional method: acousto-optic interaction. Compared with mechanical methods, the optical method has a greater ability to resist interference and also has reduced environmental requirements. Therefore, this paper begins with an experimental determination of the acoustic power in water contained in a glass tank using a set of optical devices. Because the light intensity of the diffraction image generated by acousto-optic interaction contains the required ultrasonic power information, specific software was written to extract the light intensity information from the image through a combination of filtering, binarization, contour extraction, and other image processing operations. The power value can then be obtained rapidly by processing the diffraction image using a computer. The results of this work show that the optical method offers advantages that include accuracy, speed, and a noncontact measurement method.

  2. A torque balance measurement of anisotropy of the magnetic susceptibility in white matter.

    PubMed

    van Gelderen, Peter; Mandelkow, Hendrik; de Zwart, Jacco A; Duyn, Jeff H

    2015-11-01

    Recent MRI studies have suggested that the magnetic susceptibility of white matter (WM) in the human brain is anisotropic, providing a new contrast mechanism for the visualization of fiber bundles and allowing the extraction of cellular compartment-specific information. This study provides an independent confirmation and quantification of this anisotropy. Anisotropic magnetic susceptibility results in a torque exerted on WM when placed in a uniform magnetic field, tending to align the WM fibers with the field. To quantify the effect, excised spinal cord samples were placed in a torque balance inside the magnet of a 7 T MRI system and the magnetic torque was measured as function of orientation. All tissue samples (n = 5) showed orienting effects, confirming the presence of anisotropic susceptibility. Analysis of the magnetic torque resulted in reproducible values for the WM volume anisotropy that ranged from 13.6 to 19.2 ppb. The independently determined anisotropy values confirm estimates inferred from MRI experiments and validate the use of anisotropy to extract novel information about brain fiber structure and myelination. © 2014 Wiley Periodicals, Inc.

  3. Energy availabilities for state and local development: projected energy patterns for 1980 and 1985

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vogt, D. P.; Rice, P. L.; Pai, V. P.

    1978-06-01

    This report presents projections of the supply, demand, and net imports of seven fuel types and four final consuming sectors for BEAs, states, census regions, and the nation for 1980 and 1985. The data are formatted to present regional energy availability from primary extraction, as well as from regional transformation processes. As constructed, the tables depict energy balances between availability and use for each of the specific fuels. The objective of the program is to provide a consistent base of historic and projected energy information within a standard format. Such a framework should aid regional policymakers in their consideration ofmore » regional growth issues that may be influenced by the regional energy system. This basic data must be supplemented by region-specific information which only the local policy analyst can bring to bear in his assessment of the energy conditions which characterize each region. The energy data, coupled with specific knowledge of projected economic growth and employment patterns, can assist EDA in developing its grant-in-aid investment strategy.« less

  4. CRL/Brandeis: Description of the DIDEROT System as Used for MUC-5

    DTIC Science & Technology

    1993-01-01

    been evaluated in the 4th Message Understanding Conference (MUC-4 ) where it was required to extract information from 200 texts on South American...Email : jamesp@cs.brandeis .edu Abstract This report describes the major developments over the last six months in completing th e Diderot information ...extraction system for the MUC-5 evaluation . Diderot is an information extraction system built at CRL and Brandeis University over th e past two

  5. Extracting important information from Chinese Operation Notes with natural language processing methods.

    PubMed

    Wang, Hui; Zhang, Weide; Zeng, Qiang; Li, Zuofeng; Feng, Kaiyan; Liu, Lei

    2014-04-01

    Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. TRENCADIS--a WSRF grid MiddleWare for managing DICOM structured reporting objects.

    PubMed

    Blanquer, Ignacio; Hernandez, Vicente; Segrelles, Damià

    2006-01-01

    The adoption of the digital processing of medical data, especially on radiology, has leaded to the availability of millions of records (images and reports). However, this information is mainly used at patient level, being the extraction of information, organised according to administrative criteria, which make the extraction of knowledge difficult. Moreover, legal constraints make the direct integration of information systems complex or even impossible. On the other side, the widespread of the DICOM format has leaded to the inclusion of other information different from just radiological images. The possibility of coding radiology reports in a structured form, adding semantic information about the data contained in the DICOM objects, eases the process of structuring images according to content. DICOM Structured Reporting (DICOM-SR) is a specification of tags and sections to code and integrate radiology reports, with seamless references to findings and regions of interests of the associated images, movies, waveforms, signals, etc. The work presented in this paper aims at developing of a framework to efficiently and securely share medical images and radiology reports, as well as to provide high throughput processing services. This system is based on a previously developed architecture in the framework of the TRENCADIS project, and uses other components such as the security system and the Grid processing service developed in previous activities. The work presented here introduces a semantic structuring and an ontology framework, to organise medical images considering standard terminology and disease coding formats (SNOMED, ICD9, LOINC..).

  7. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations

    PubMed Central

    2017-01-01

    Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations. PMID:28644863

  8. Cleavage Entropy as Quantitative Measure of Protease Specificity

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Margreiter, Michael A.; Spitzer, Gudrun M.; Wallnoefer, Hannes G.; Liedl, Klaus R.

    2013-01-01

    A purely information theory-guided approach to quantitatively characterize protease specificity is established. We calculate an entropy value for each protease subpocket based on sequences of cleaved substrates extracted from the MEROPS database. We compare our results with known subpocket specificity profiles for individual proteases and protease groups (e.g. serine proteases, metallo proteases) and reflect them quantitatively. Summation of subpocket-wise cleavage entropy contributions yields a measure for overall protease substrate specificity. This total cleavage entropy allows ranking of different proteases with respect to their specificity, separating unspecific digestive enzymes showing high total cleavage entropy from specific proteases involved in signaling cascades. The development of a quantitative cleavage entropy score allows an unbiased comparison of subpocket-wise and overall protease specificity. Thus, it enables assessment of relative importance of physicochemical and structural descriptors in protease recognition. We present an exemplary application of cleavage entropy in tracing substrate specificity in protease evolution. This highlights the wide range of substrate promiscuity within homologue proteases and hence the heavy impact of a limited number of mutations on individual substrate specificity. PMID:23637583

  9. Single molecule optical measurements of orientation and rotations of biological macromolecules

    PubMed Central

    Shroder, Deborah Y; Lippert, Lisa G; Goldman, Yale E

    2016-01-01

    The subdomains of macromolecules often undergo large orientation changes during their catalytic cycles that are essential for their activity. Tracking these rearrangements in real time opens a powerful window into the link between protein structure and functional output. Site-specific labeling of individual molecules with polarized optical probes and measuring their spatial orientation can give insight into the crucial conformational changes, dynamics, and fluctuations of macromolecules. Here we describe the range of single molecule optical technologies that can extract orientation information from these probes, we review the relevant types of probes and labeling techniques, and we highlight the advantages and disadvantages of these technologies for addressing specific inquiries. PMID:28192292

  10. [Possibilities of differentiation of antinuclear antibodies].

    PubMed

    Müller, W; Rosenthal, M; Stojan, B

    1975-10-15

    Antinuclear antibodies can give diagnostic informations according to their titre values, the belonging to different classes of immune globulins and on the basis of different patterns of immunofluorescence connection. The determination of granulocyte-specific antibodies which frequently appear in progressive chronic polyarthritis further contributes to the differential-diagnostic classification of diseases of the connective tissue. An antibody against extractable nuclear antigen is specific for the so-called mixed connective tissue disease, an antimitochondrial antibody for the pseudo-LE-syndrome. Moreover, the own examinations resulted in a particularly high and frequent ability of complement fixation of the antinuclear factors in systematic lupus erythematosus and sclerodermy. In contrast to this in the progressive chronic polyarthritis the complement fixation was clearly more insignificant.

  11. GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research

    PubMed Central

    Zhang, Hao; van Diepeningen, Anne D.; van der Lee, Theo A. J.; Waalwijk, Cees; de Hoog, G. Sybren

    2016-01-01

    GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/). PMID:27308864

  12. GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.

    PubMed

    Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren

    2016-06-01

    GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).

  13. Microbial metabolism in soil at low temperatures: Mechanisms unraveled by position-specific 13C labeling

    NASA Astrophysics Data System (ADS)

    Bore, Ezekiel

    2016-04-01

    Microbial transformation of organic substances in soil is the most important process of the C cycle. Most of the current studies base their information about transformation of organic substances on incubation studies under laboratory conditions and thus, we have a profound knowledge on SOM transformations at ambient temperatures. However, metabolic pathway activities at low temperature are not well understood, despite the fact that the processes are relevant for many soils globally and seasonally. To analyze microbial metabolism at low soil temperatures, isotopomeres of position-specifically 13C labeled glucose were incubated at three temperature; 5, -5 -20 oC. Soils were sampled after 1, 3 and 10 days and additionally after 30 days for samples at -20 °C. The 13C from individual molecule position was quantifed in respired CO2, bulk soil, extractable organic C and extractable microbial biomass by chloroform fumigation extraction (CFE) and cell membranes of microbial communities classified by 13C phospholipid fatty acid (PLFA) analysis. 13CO2 released showed a dominance of the flux from C-1 position at 5 °C. Consequently, at 5 °C, pentose phosphate pathway activity is a dominant metabolic pathway of glucose metabolization. In contrast to -5 °C and -20 oC, metabolic behaviors completely switched towards a preferential respiration of the glucose C-4 position. With decreasing temperature, microorganism strongly shifted towards metabolization of glucose via glycolysis which indicates a switch to cellular maintenance. High recoveries of 13C in extractable microbial biomass at -5 °C indicates optimal growth condition for the microorganisms. PLFA analysis showed high incorporation of 13C into Gram negative bacteria at 5 °C but decreased with temperature. Gram positive bacteria out-competed Gram negatives with decreasing temperature. This study revealed a remarkable microbial activity at temperatures below 0 °C, differing significantly from that at ambient temperatures. These metabolic pathways, can be unraveled based on position-specific labeling.

  14. Smart Extraction and Analysis System for Clinical Research.

    PubMed

    Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung

    2017-05-01

    With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.

  15. An efficient and scalable extraction and quantification method for algal derived biofuel.

    PubMed

    Lohman, Egan J; Gardner, Robert D; Halverson, Luke; Macur, Richard E; Peyton, Brent M; Gerlach, Robin

    2013-09-01

    Microalgae are capable of synthesizing a multitude of compounds including biofuel precursors and other high value products such as omega-3-fatty acids. However, accurate analysis of the specific compounds produced by microalgae is important since slight variations in saturation and carbon chain length can affect the quality, and thus the value, of the end product. We present a method that allows for fast and reliable extraction of lipids and similar compounds from a range of algae, followed by their characterization using gas chromatographic analysis with a focus on biodiesel-relevant compounds. This method determines which range of biologically synthesized compounds is likely responsible for each fatty acid methyl ester (FAME) produced; information that is fundamental for identifying preferred microalgae candidates as a biodiesel source. Traditional methods of analyzing these precursor molecules are time intensive and prone to high degrees of variation between species and experimental conditions. Here we detail a new method which uses microwave energy as a reliable, single-step cell disruption technique to extract lipids from live cultures of microalgae. After extractable lipid characterization (including lipid type (free fatty acids, mono-, di- or tri-acylglycerides) and carbon chain length determination) by GC-FID, the same lipid extracts are transesterified into FAMEs and directly compared to total biodiesel potential by GC-MS. This approach provides insight into the fraction of total FAMEs derived from extractable lipids compared to FAMEs derived from the residual fraction (i.e. membrane bound phospholipids, sterols, etc.). This approach can also indicate which extractable lipid compound, based on chain length and relative abundance, is responsible for each FAME. This method was tested on three species of microalgae; the marine diatom Phaeodactylum tricornutum, the model Chlorophyte Chlamydomonas reinhardtii, and the freshwater green alga Chlorella vulgaris. The method is shown to be robust, highly reproducible, and fast, allowing for multiple samples to be analyzed throughout the time course of culturing, thus providing time-resolved information regarding lipid quantity and quality. Total time from harvesting to obtaining analytical results is less than 2h. © 2013.

  16. Human face detection using motion and color information

    NASA Astrophysics Data System (ADS)

    Kim, Yang-Gyun; Bang, Man-Won; Park, Soon-Young; Choi, Kyoung-Ho; Hwang, Jeong-Hyun

    2008-02-01

    In this paper, we present a hardware implementation of a face detector for surveillance applications. To come up with a computationally cheap and fast algorithm with minimal memory requirement, motion and skin color information are fused successfully. More specifically, a newly appeared object is extracted first by comparing average Hue and Saturation values of background image and a current image. Then, the result of skin color filtering of the current image is combined with the result of a newly appeared object. Finally, labeling is performed to locate a true face region. The proposed system is implemented on Altera Cyclone2 using Quartus II 6.1 and ModelSim 6.1. For hardware description language (HDL), Verilog-HDL is used.

  17. Search Analytics: Automated Learning, Analysis, and Search with Open Source

    NASA Astrophysics Data System (ADS)

    Hundman, K.; Mattmann, C. A.; Hyon, J.; Ramirez, P.

    2016-12-01

    The sheer volume of unstructured scientific data makes comprehensive human analysis impossible, resulting in missed opportunities to identify relationships, trends, gaps, and outliers. As the open source community continues to grow, tools like Apache Tika, Apache Solr, Stanford's DeepDive, and Data-Driven Documents (D3) can help address this challenge. With a focus on journal publications and conference abstracts often in the form of PDF and Microsoft Office documents, we've initiated an exploratory NASA Advanced Concepts project aiming to use the aforementioned open source text analytics tools to build a data-driven justification for the HyspIRI Decadal Survey mission. We call this capability Search Analytics, and it fuses and augments these open source tools to enable the automatic discovery and extraction of salient information. In the case of HyspIRI, a hyperspectral infrared imager mission, key findings resulted from the extractions and visualizations of relationships from thousands of unstructured scientific documents. The relationships include links between satellites (e.g. Landsat 8), domain-specific measurements (e.g. spectral coverage) and subjects (e.g. invasive species). Using the above open source tools, Search Analytics mined and characterized a corpus of information that would be infeasible for a human to process. More broadly, Search Analytics offers insights into various scientific and commercial applications enabled through missions and instrumentation with specific technical capabilities. For example, the following phrases were extracted in close proximity within a publication: "In this study, hyperspectral images…with high spatial resolution (1 m) were analyzed to detect cutleaf teasel in two areas. …Classification of cutleaf teasel reached a users accuracy of 82 to 84%." Without reading a single paper we can use Search Analytics to automatically identify that a 1 m spatial resolution provides a cutleaf teasel detection users accuracy of 82-84%, which could have tangible, direct downstream implications for crop protection. Automatically assimilating this information expedites and supplements human analysis, and, ultimately, Search Analytics and its foundation of open source tools will result in more efficient scientific investment and research.

  18. Deep Learning for Automated Extraction of Primary Sites from Cancer Pathology Reports

    DOE PAGES

    Qiu, John; Yoon, Hong-Jun; Fearn, Paul A.; ...

    2017-05-03

    Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. Here in this study we investigated deep learning and a convolutional neural network (CNN), for extracting ICDO- 3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations asmore » the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro and macro-F score increases of up to 0.132 and 0.226 respectively when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on CNN method and cancer site. Finally, these encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.« less

  19. Deep Learning for Automated Extraction of Primary Sites from Cancer Pathology Reports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qiu, John; Yoon, Hong-Jun; Fearn, Paul A.

    Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. Here in this study we investigated deep learning and a convolutional neural network (CNN), for extracting ICDO- 3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations asmore » the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro and macro-F score increases of up to 0.132 and 0.226 respectively when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on CNN method and cancer site. Finally, these encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.« less

  20. Infrared moving small target detection based on saliency extraction and image sparse representation

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaomin; Ren, Kan; Gao, Jin; Li, Chaowei; Gu, Guohua; Wan, Minjie

    2016-10-01

    Moving small target detection in infrared image is a crucial technique of infrared search and tracking system. This paper present a novel small target detection technique based on frequency-domain saliency extraction and image sparse representation. First, we exploit the features of Fourier spectrum image and magnitude spectrum of Fourier transform to make a rough extract of saliency regions and use a threshold segmentation system to classify the regions which look salient from the background, which gives us a binary image as result. Second, a new patch-image model and over-complete dictionary were introduced to the detection system, then the infrared small target detection was converted into a problem solving and optimization process of patch-image information reconstruction based on sparse representation. More specifically, the test image and binary image can be decomposed into some image patches follow certain rules. We select the target potential area according to the binary patch-image which contains salient region information, then exploit the over-complete infrared small target dictionary to reconstruct the test image blocks which may contain targets. The coefficients of target image patch satisfy sparse features. Finally, for image sequence, Euclidean distance was used to reduce false alarm ratio and increase the detection accuracy of moving small targets in infrared images due to the target position correlation between frames.

  1. Structural health monitoring feature design by genetic programming

    NASA Astrophysics Data System (ADS)

    Harvey, Dustin Y.; Todd, Michael D.

    2014-09-01

    Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and other high-capital or life-safety critical structures. Conventional data processing involves pre-processing and extraction of low-dimensional features from in situ time series measurements. The features are then input to a statistical pattern recognition algorithm to perform the relevant classification or regression task necessary to facilitate decisions by the SHM system. Traditional design of signal processing and feature extraction algorithms can be an expensive and time-consuming process requiring extensive system knowledge and domain expertise. Genetic programming, a heuristic program search method from evolutionary computation, was recently adapted by the authors to perform automated, data-driven design of signal processing and feature extraction algorithms for statistical pattern recognition applications. The proposed method, called Autofead, is particularly suitable to handle the challenges inherent in algorithm design for SHM problems where the manifestation of damage in structural response measurements is often unclear or unknown. Autofead mines a training database of response measurements to discover information-rich features specific to the problem at hand. This study provides experimental validation on three SHM applications including ultrasonic damage detection, bearing damage classification for rotating machinery, and vibration-based structural health monitoring. Performance comparisons with common feature choices for each problem area are provided demonstrating the versatility of Autofead to produce significant algorithm improvements on a wide range of problems.

  2. Information transfer and information modification to identify the structure of cardiovascular and cardiorespiratory networks.

    PubMed

    Faes, Luca; Nollo, Giandomenico; Krohova, Jana; Czippelova, Barbora; Turianikova, Zuzana; Javorka, Michal

    2017-07-01

    To fully elucidate the complex physiological mechanisms underlying the short-term autonomic regulation of heart period (H), systolic and diastolic arterial pressure (S, D) and respiratory (R) variability, the joint dynamics of these variables need to be explored using multivariate time series analysis. This study proposes the utilization of information-theoretic measures to measure causal interactions between nodes of the cardiovascular/cardiorespiratory network and to assess the nature (synergistic or redundant) of these directed interactions. Indexes of information transfer and information modification are extracted from the H, S, D and R series measured from healthy subjects in a resting state and during postural stress. Computations are performed in the framework of multivariate linear regression, using bootstrap techniques to assess on a single-subject basis the statistical significance of each measure and of its transitions across conditions. We find patterns of information transfer and modification which are related to specific cardiovascular and cardiorespiratory mechanisms in resting conditions and to their modification induced by the orthostatic stress.

  3. Applications of aerospace technology to petroleum extraction and reservoir engineering

    NASA Technical Reports Server (NTRS)

    Jaffe, L. D.; Back, L. H.; Berdahl, C. M.; Collins, E. E., Jr.; Gordon, P. G.; Houseman, J.; Humphrey, M. F.; Hsu, G. C.; Ham, J. D.; Marte, J. E.; hide

    1977-01-01

    Through contacts with the petroleum industry, the petroleum service industry, universities and government agencies, important petroleum extraction problems were identified. For each problem, areas of aerospace technology that might aid in its solution were also identified, where possible. Some of the problems were selected for further consideration. Work on these problems led to the formulation of specific concepts as candidate for development. Each concept is addressed to the solution of specific extraction problems and makes use of specific areas of aerospace technology.

  4. 30 CFR 702.10 - Information collection.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 30 Mineral Resources 3 2012-07-01 2012-07-01 false Information collection. 702.10 Section 702.10... EXEMPTION FOR COAL EXTRACTION INCIDENTAL TO THE EXTRACTION OF OTHER MINERALS § 702.10 Information collection. The collections of information contained in §§ 702.11, 702.12, 702.13, 702.15 and 702.18 of this part...

  5. 30 CFR 702.10 - Information collection.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 30 Mineral Resources 3 2011-07-01 2011-07-01 false Information collection. 702.10 Section 702.10... EXEMPTION FOR COAL EXTRACTION INCIDENTAL TO THE EXTRACTION OF OTHER MINERALS § 702.10 Information collection. The collections of information contained in §§ 702.11, 702.12, 702.13, 702.15 and 702.18 of this part...

  6. 30 CFR 702.10 - Information collection.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Information collection. 702.10 Section 702.10... EXEMPTION FOR COAL EXTRACTION INCIDENTAL TO THE EXTRACTION OF OTHER MINERALS § 702.10 Information collection. The collections of information contained in §§ 702.11, 702.12, 702.13, 702.15 and 702.18 of this part...

  7. Integrating Information Extraction Agents into a Tourism Recommender System

    NASA Astrophysics Data System (ADS)

    Esparcia, Sergio; Sánchez-Anguix, Víctor; Argente, Estefanía; García-Fornes, Ana; Julián, Vicente

    Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system.

  8. The use of experimental structures to model protein dynamics.

    PubMed

    Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L

    2015-01-01

    The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.

  9. Multi-scale image segmentation method with visual saliency constraints and its application

    NASA Astrophysics Data System (ADS)

    Chen, Yan; Yu, Jie; Sun, Kaimin

    2018-03-01

    Object-based image analysis method has many advantages over pixel-based methods, so it is one of the current research hotspots. It is very important to get the image objects by multi-scale image segmentation in order to carry out object-based image analysis. The current popular image segmentation methods mainly share the bottom-up segmentation principle, which is simple to realize and the object boundaries obtained are accurate. However, the macro statistical characteristics of the image areas are difficult to be taken into account, and fragmented segmentation (or over-segmentation) results are difficult to avoid. In addition, when it comes to information extraction, target recognition and other applications, image targets are not equally important, i.e., some specific targets or target groups with particular features worth more attention than the others. To avoid the problem of over-segmentation and highlight the targets of interest, this paper proposes a multi-scale image segmentation method with visually saliency graph constraints. Visual saliency theory and the typical feature extraction method are adopted to obtain the visual saliency information, especially the macroscopic information to be analyzed. The visual saliency information is used as a distribution map of homogeneity weight, where each pixel is given a weight. This weight acts as one of the merging constraints in the multi- scale image segmentation. As a result, pixels that macroscopically belong to the same object but are locally different can be more likely assigned to one same object. In addition, due to the constraint of visual saliency model, the constraint ability over local-macroscopic characteristics can be well controlled during the segmentation process based on different objects. These controls will improve the completeness of visually saliency areas in the segmentation results while diluting the controlling effect for non- saliency background areas. Experiments show that this method works better for texture image segmentation than traditional multi-scale image segmentation methods, and can enable us to give priority control to the saliency objects of interest. This method has been used in image quality evaluation, scattered residential area extraction, sparse forest extraction and other applications to verify its validation. All applications showed good results.

  10. The Use of Experimental Structures to Model Protein Dynamics

    PubMed Central

    Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.

    2014-01-01

    Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965

  11. Application of multispectral remote sensing techniques for dismissed mine sites monitoring and rehabilitation

    NASA Astrophysics Data System (ADS)

    Bonifazi, Giuseppe; Serranti, Silvia

    2007-09-01

    Mining activities, expecially those operated in open air (open pit), present a deep impact on the sourrondings. Such an impact, and the related problems, are directly related to the correct operation of the activities, and usually strongly interact with the environment. Impact can be mainly related to the following issues: high volumes of handled material, ii) generation of dust, noise and vibrations, water pollution, visual impact and, finally, mining area recovery at the end of exploitation activities. All these aspects can be considered very important, and must be properly evaluated and monitored. Environmental impact control is usually carried out during and after the end of the mining activities, adopting methods related to the detection, collection, analysis of specific environmental indicators and with their further comparison with reference thresholding values stated by official regulations. Aim of the study was to investigate, and critically evaluate, the problems related to development of an integrated set of procedures based on the collection and the analysis of remote sensed data in order to evaluate the effect of rehabilitation of land contaminated by extractive industry activities. Starting from the results of these analyses, a monitoring and registration of the environmental impact of such operations was performed by the application and the integration of modern information technologies, as the previous mentioned Earth Observation (EO), with Geographic Information Systems (GIS). The study was developed with reference to different dismissed mine sites in India, Thailand and China. The results of the study have been utilized as input for the construction of a knowledge based decision support system finalized to help in the identification of the appropriate rehabilitation technologies for all those dismissed area previously interested by extractive industry activities. The work was financially supported within the framework of the Project ASIA IT&C - CN/ASIA IT&C/006 (89870) Extract-It "Application of Information Technologies for the Sustainable Management of Extractive Industry Activities" of the European Union.

  12. Spatiotemporal conceptual platform for querying archaeological information systems

    NASA Astrophysics Data System (ADS)

    Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

    2015-04-01

    Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but graphically select an area of interest. In addition, according to the application at hand, novel spatiotemporal attributes and relations can be supported, towards the understanding of historical settlement patterns.

  13. Conception of Self-Construction Production Scheduling System

    NASA Astrophysics Data System (ADS)

    Xue, Hai; Zhang, Xuerui; Shimizu, Yasuhiro; Fujimura, Shigeru

    With the high speed innovation of information technology, many production scheduling systems have been developed. However, a lot of customization according to individual production environment is required, and then a large investment for development and maintenance is indispensable. Therefore now the direction to construct scheduling systems should be changed. The final objective of this research aims at developing a system which is built by it extracting the scheduling technique automatically through the daily production scheduling work, so that an investment will be reduced. This extraction mechanism should be applied for various production processes for the interoperability. Using the master information extracted by the system, production scheduling operators can be supported to accelerate the production scheduling work easily and accurately without any restriction of scheduling operations. By installing this extraction mechanism, it is easy to introduce scheduling system without a lot of expense for customization. In this paper, at first a model for expressing a scheduling problem is proposed. Then the guideline to extract the scheduling information and use the extracted information is shown and some applied functions are also proposed based on it.

  14. Information Gain Based Dimensionality Selection for Classifying Text Documents

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dumidu Wijayasekara; Milos Manic; Miles McQueen

    2013-06-01

    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less

  15. Data Fusion for Enhanced Aircraft Engine Prognostics and Health Management

    NASA Technical Reports Server (NTRS)

    Volponi, Al

    2005-01-01

    Aircraft gas-turbine engine data is available from a variety of sources, including on-board sensor measurements, maintenance histories, and component models. An ultimate goal of Propulsion Health Management (PHM) is to maximize the amount of meaningful information that can be extracted from disparate data sources to obtain comprehensive diagnostic and prognostic knowledge regarding the health of the engine. Data fusion is the integration of data or information from multiple sources for the achievement of improved accuracy and more specific inferences than can be obtained from the use of a single sensor alone. The basic tenet underlying the data/ information fusion concept is to leverage all available information to enhance diagnostic visibility, increase diagnostic reliability and reduce the number of diagnostic false alarms. This report describes a basic PHM data fusion architecture being developed in alignment with the NASA C-17 PHM Flight Test program. The challenge of how to maximize the meaningful information extracted from disparate data sources to obtain enhanced diagnostic and prognostic information regarding the health and condition of the engine is the primary goal of this endeavor. To address this challenge, NASA Glenn Research Center, NASA Dryden Flight Research Center, and Pratt & Whitney have formed a team with several small innovative technology companies to plan and conduct a research project in the area of data fusion, as it applies to PHM. Methodologies being developed and evaluated have been drawn from a wide range of areas including artificial intelligence, pattern recognition, statistical estimation, and fuzzy logic. This report will provide a chronology and summary of the work accomplished under this research contract.

  16. Evidence-based information needs of public health workers: a systematized review.

    PubMed

    Barr-Walker, Jill

    2017-01-01

    This study assessed public health workers' evidence-based information needs, based on a review of the literature using a systematic search strategy. This study is based on a thesis project conducted as part of the author's master's in public health coursework and is considered a systematized review. Four databases were searched for English-language articles published between 2005 and 2015: PubMed, Web of Science, Library Literature & Information Science Index, and Library, Information Science & Technology Abstracts (LISTA). Studies were excluded if there was no primary data collection, the population in the study was not identified as public health workers, "information" was not defined according to specific criteria, or evidence-based information and public health workers were not the major focus. Studies included in the final analysis underwent data extraction, critical appraisal using CASP and STROBE checklists, and thematic analysis. Thirty-three research studies were identified in the search, including twenty-one using quantitative methods and twelve using qualitative methods. Critical appraisal revealed many potential biases, particularly in the validity of research. Thematic analysis revealed five common themes: (1) definition of information needs, (2) current information-seeking behavior and use, (3) definition of evidence-based information, (4) barriers to information needs, and (5) public health-specific issues. Recommendations are given for how librarians can increase the use of evidence-based information in public health research, practice, and policy making. Further research using rigorous methodologies and transparent reporting practices in a wider variety of settings is needed to further evaluate public health workers' information needs.

  17. Landmark-based deep multi-instance learning for brain disease diagnosis.

    PubMed

    Liu, Mingxia; Zhang, Jun; Adeli, Ehsan; Shen, Dinggang

    2018-01-01

    In conventional Magnetic Resonance (MR) image based methods, two stages are often involved to capture brain structural information for disease diagnosis, i.e., 1) manually partitioning each MR image into a number of regions-of-interest (ROIs), and 2) extracting pre-defined features from each ROI for diagnosis with a certain classifier. However, these pre-defined features often limit the performance of the diagnosis, due to challenges in 1) defining the ROIs and 2) extracting effective disease-related features. In this paper, we propose a landmark-based deep multi-instance learning (LDMIL) framework for brain disease diagnosis. Specifically, we first adopt a data-driven learning approach to discover disease-related anatomical landmarks in the brain MR images, along with their nearby image patches. Then, our LDMIL framework learns an end-to-end MR image classifier for capturing both the local structural information conveyed by image patches located by landmarks and the global structural information derived from all detected landmarks. We have evaluated our proposed framework on 1526 subjects from three public datasets (i.e., ADNI-1, ADNI-2, and MIRIAD), and the experimental results show that our framework can achieve superior performance over state-of-the-art approaches. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language.

    PubMed

    Rinaldi, Fabio; Ellendorff, Tilia Renate; Madan, Sumit; Clematide, Simon; van der Lek, Adrian; Mevissen, Theo; Fluck, Juliane

    2016-01-01

    Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text. © The Author(s) 2016. Published by Oxford University Press.

  19. User-centered evaluation of Arizona BioPathway: an information extraction, integration, and visualization system.

    PubMed

    Quiñones, Karin D; Su, Hua; Marshall, Byron; Eggers, Shauna; Chen, Hsinchun

    2007-09-01

    Explosive growth in biomedical research has made automated information extraction, knowledge integration, and visualization increasingly important and critically needed. The Arizona BioPathway (ABP) system extracts and displays biological regulatory pathway information from the abstracts of journal articles. This study uses relations extracted from more than 200 PubMed abstracts presented in a tabular and graphical user interface with built-in search and aggregation functionality. This paper presents a task-centered assessment of the usefulness and usability of the ABP system focusing on its relation aggregation and visualization functionalities. Results suggest that our graph-based visualization is more efficient in supporting pathway analysis tasks and is perceived as more useful and easier to use as compared to a text-based literature-viewing method. Relation aggregation significantly contributes to knowledge-acquisition efficiency. Together, the graphic and tabular views in the ABP Visualizer provide a flexible and effective interface for pathway relation browsing and analysis. Our study contributes to pathway-related research and biological information extraction by assessing the value of a multiview, relation-based interface that supports user-controlled exploration of pathway information across multiple granularities.

  20. Acquiring 3-D information about thick objects from differential interference contrast images using texture extraction

    NASA Astrophysics Data System (ADS)

    Sierra, Heidy; Brooks, Dana; Dimarzio, Charles

    2010-07-01

    The extraction of 3-D morphological information about thick objects is explored in this work. We extract this information from 3-D differential interference contrast (DIC) images by applying a texture detection method. Texture extraction methods have been successfully used in different applications to study biological samples. A 3-D texture image is obtained by applying a local entropy-based texture extraction method. The use of this method to detect regions of blastocyst mouse embryos that are used in assisted reproduction techniques such as in vitro fertilization is presented as an example. Results demonstrate the potential of using texture detection methods to improve morphological analysis of thick samples, which is relevant to many biomedical and biological studies. Fluorescence and optical quadrature microscope phase images are used for validation.

  1. Study on identifying deciduous forest by the method of feature space transformation

    NASA Astrophysics Data System (ADS)

    Zhang, Xuexia; Wu, Pengfei

    2009-10-01

    The thematic remotely sensed information extraction is always one of puzzling nuts which the remote sensing science faces, so many remote sensing scientists devotes diligently to this domain research. The methods of thematic information extraction include two kinds of the visual interpretation and the computer interpretation, the developing direction of which is intellectualization and comprehensive modularization. The paper tries to develop the intelligent extraction method of feature space transformation for the deciduous forest thematic information extraction in Changping district of Beijing city. The whole Chinese-Brazil resources satellite images received in 2005 are used to extract the deciduous forest coverage area by feature space transformation method and linear spectral decomposing method, and the result from remote sensing is similar to woodland resource census data by Chinese forestry bureau in 2004.

  2. Deaths and cardiovascular injuries due to device-assisted implantable cardioverter–defibrillator and pacemaker lead extraction

    PubMed Central

    Hauser, Robert G.; Katsiyiannis, William T.; Gornick, Charles C.; Almquist, Adrian K.; Kallinen, Linda M.

    2010-01-01

    Aims An estimated 10 000–15 000 pacemaker and implantable cardioverter–defibrillator (ICD) leads are extracted annually worldwide using specialized tools that disrupt encapsulating fibrous tissue. Additional information is needed regarding the safety of the devices that have been approved for lead extraction. The aim of this study was to determine whether complications due to device-assisted lead extraction might be more hazardous than published data suggest, and whether procedural safety precautions are effective. Methods and results We searched the US Food and Drug Administration's (FDA) Manufacturers and User Defined Experience (MAUDE) database from 1995 to 2008 using the search terms ‘lead extraction and death’ and ‘lead extraction and injury’. Additional product specific searches were performed for the terms ‘death’ and ‘injury’. Between 1995 and 2008, 57 deaths and 48 serious cardiovascular injuries associated with device-assisted lead extraction were reported to the FDA. Owing to underreporting, the FDA database does not contain all adverse events that occurred during this period. Of the 105 events, 27 deaths and 13 injuries occurred in 2007–2008. During these 2 years, 23 deaths were linked with excimer laser or mechanical dilator sheath extractions. The majority of deaths and injuries involved ICD leads, and most were caused by lacerations of the right atrium, superior vena cava, or innominate vein. Overall, 62 patients underwent emergency surgical repair of myocardial perforations and venous lacerations and 35 (56%) survived. Conclusion These findings suggest that device-assisted lead extraction is a high-risk procedure and that serious complications including death may not be mitigated by emergency surgery. However, skilled standby cardiothoracic surgery is essential when performing pacemaker and ICD lead extractions. Although the incidence of these complications is unknown, the results of our study imply that device-assisted lead extractions should be performed by highly qualified physicians and their teams in specialized centres. PMID:19946113

  3. [Specific immunotherapy with depigmented allergoids].

    PubMed

    Klimek, L; Thorn, C; Pfaar, O

    2010-01-01

    Specific immunotherapy is the only available causative treatment for IgE-mediated allergic conditions. The state of the art is treatment via the subcutaneous route with crude extracts in a water solution, with physically linked (semidepot) extracts or chemically modified semidepot extracts (allergoids). A relatively new purification method combines depigmentation followed by polymerization with glutaraldehyde. This modification results in increased tolerance with a reduction in both local and systemic adverse effects. As controlled clinical trials have shown, the effectiveness is comparable to that of specific immunotherapy with crude allergen extracts. Recent data suggest that the modified polymerized allergoids allow a safe rush titration in a few days or even in 1 day (ultra-rush titration).

  4. [Study on infrared spectrum change of Ganoderma lucidum and its extracts].

    PubMed

    Chen, Zao-Xin; Xu, Yong-Qun; Chen, Xiao-Kang; Huang, Dong-Lan; Lu, Wen-Guan

    2013-05-01

    From the determination of the infrared spectra of four substances (original ganoderma lucidum and ganoderma lucidum water extract, 95% ethanol extract and petroleum ether extract), it was found that the infrared spectrum can carry systematic chemical information and basically reflects the distribution of each component of the analyte. Ganoderma lucidum and its extracts can be distinguished according to the absorption peak area ratio of 3 416-3 279, 1 541 and 723 cm(-1) to 2 935-2 852 cm(-1). A method of calculating the information entropy of the sample set with Euclidean distance was proposed, the relationship between the information entropy and the amount of chemical information carried by the sample set was discussed, and the authors come to a conclusion that sample set of original ganoderma lucidum carry the most abundant chemical information. The infrared spectrum set of original ganoderma lucidum has better clustering effect on ganoderma atrum, Cyan ganoderma, ganoderma multiplicatum and ganoderma lucidum when making hierarchical cluster analysis of 4 sample set. The results show that infrared spectrum carries the chemical information of the material structure and closely relates to the chemical composition of the system. The higher the value of information entropy, the much richer the chemical information and the more the benefit for pattern recognition. This study has a guidance function to the construction of the sample set in pattern recognition.

  5. Identification of functional groups of Opuntia ficus-indica involved in coagulation process after its active part extraction.

    PubMed

    Bouaouine, Omar; Bourven, Isabelle; Khalil, Fouad; Baudu, Michel

    2018-04-01

    Opuntia ficus-indica that belongs to the Cactaceae family and is a member of Opuntia kind has received increasing research interest for wastewater treatment by flocculation. The objectives of this study were (i) to provide more information regarding the active constituents of Opuntia spp. and (ii) to improve the extracting and using conditions of the flocculant molecules for water treatment. A classic approach by jar test experiments was used with raw and extracted material by solubilization and precipitation. The surface properties of solid material were characterized by FTIR, SEM, zeta potential measurement, and surface titration. The splitting based on the solubility of the material with pH and the titration of functional groups completed the method. The optimal pH value for a coagulation-flocculation process using cactus solid material (CSM) was 10.0 and a processing rate of 35 mg L -1 . The alkaline pH of flocculation suggests an adsorption mechanism with bridging effect between particles by water-soluble extracted molecules. To validate this mechanism, an extraction water was carried out at pH = 10 (optimum of flocculation) and the solution was acidified (pH = 7) to allow precipitation of so considered active flocculant molecules. The strong flocculant property of this extract was verified, and titration of this solution showed at least one specific pKa of 9.0 ± 0.6. This pKa corresponds to phenol groups, which could be assigned to lignin and tannin.

  6. A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.

    PubMed

    Segura-Bedmar, Isabel; Martínez, Paloma; de Pablo-Sánchez, César

    2011-03-29

    A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.

  7. Dual origin of gut proteases in Formosan subterranean termites (Coptotermes formosanus Shiraki) (Isoptera: Rhinotermitidae).

    PubMed

    Sethi, Amit; Xue, Qing-Gang; La Peyre, Jerome F; Delatte, Jennifer; Husseneder, Claudia

    2011-07-01

    Cellulose digestion in lower termites, mediated by carbohydrases originating from both termite and endosymbionts, is well characterized. In contrast, limited information exists on gut proteases of lower termites, their origins and roles in termite nutrition. The objective of this study was to characterize gut proteases of the Formosan subterranean termite (Coptotermes formosanus Shiraki) (Isoptera: Rhinotermitidae). The protease activity of extracts from gut tissues (fore-, mid- and hindgut) and protozoa isolated from hindguts of termite workers was quantified using hide powder azure as a substrate and further characterized by zymography with gelatin SDS-PAGE. Midgut extracts showed the highest protease activity followed by the protozoa extracts. High level of protease activity was also detected in protozoa culture supernatants after 24 h incubation. Incubation of gut and protozoa extracts with class-specific protease inhibitors revealed that most of the proteases were serine proteases. All proteolytic bands identified after gelatin SDS-PAGE were also inhibited by serine protease inhibitors. Finally, incubation with chromogenic substrates indicated that extracts from fore- and hindgut tissues possessed proteases with almost exclusively trypsin-like activity while both midgut and protozoa extracts possessed proteases with trypsin-like and subtilisin/chymotrypsin-like activities. However, protozoa proteases were distinct from midgut proteases (with different molecular mass). Our results suggest that the Formosan subterranean termite not only produces endogenous proteases in its gut tissues, but also possesses proteases originating from its protozoan symbionts. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. Research on Remote Sensing Geological Information Extraction Based on Object Oriented Classification

    NASA Astrophysics Data System (ADS)

    Gao, Hui

    2018-04-01

    The northern Tibet belongs to the Sub cold arid climate zone in the plateau. It is rarely visited by people. The geological working conditions are very poor. However, the stratum exposures are good and human interference is very small. Therefore, the research on the automatic classification and extraction of remote sensing geological information has typical significance and good application prospect. Based on the object-oriented classification in Northern Tibet, using the Worldview2 high-resolution remote sensing data, combined with the tectonic information and image enhancement, the lithological spectral features, shape features, spatial locations and topological relations of various geological information are excavated. By setting the threshold, based on the hierarchical classification, eight kinds of geological information were classified and extracted. Compared with the existing geological maps, the accuracy analysis shows that the overall accuracy reached 87.8561 %, indicating that the classification-oriented method is effective and feasible for this study area and provides a new idea for the automatic extraction of remote sensing geological information.

  9. Novel method for the rapid and specific extraction of multiple β2 -agonist residues in food by tailor-made Monolith-MIPs extraction disks and detection by gas chromatography with mass spectrometry.

    PubMed

    Liu, Haibo; Gan, Ning; Chen, Yinji; Ding, Qingqing; Huang, Jie; Lin, Saichai; Cao, Yuting; Li, Tianhua

    2016-09-01

    A quick and specific pretreatment method based on a series of extraction clean-up disks, consisting of molecularly imprinted polymer monoliths and C18 adsorbent, was developed for the specific enrichment of salbutamol and clenbuterol residues in food. The molecularly imprinted monolithic polymer disk was synthesized using salbutamol as a template through a one-step synthesis process. It can simultaneously and specifically recognize salbutamol and clenbuterol. The monolithic polymer disk and series of C18 disks were assembled with a syringe to form a set of tailor-made devices for the extraction of target molecules. In a single run, salbutamol and clenbuterol can be specifically extracted, cleaned, and eluted by methanol/acetic acid/H2 O. The target molecules, after a silylation derivatization reaction were detected by gas chromatography-mass spectrometry. The parameters including solvent desorption, sample pH, and the cycles of reloading were investigated and discussed. Under the optimized extraction and clean-up conditions, the limits of detection and quantitation were determined as 0.018-0.022 and 0.042-0.049 ng/g for salbutamol and clenbuterol, respectively. The assay described was convenient, rapid, and specific; thereby potentially efficient in the high-throughput analysis of β2 -agonists residues in real food samples. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Ultra-sensitive detection of leukemia by graphene

    NASA Astrophysics Data System (ADS)

    Akhavan, Omid; Ghaderi, Elham; Hashemi, Ehsan; Rahighi, Reza

    2014-11-01

    Graphene oxide nanoplatelets (GONPs) with extremely sharp edges (lateral dimensions ~20-200 nm and thicknesses <2 nm) were applied in extraction of the overexpressed guanine synthesized in the cytoplasm of leukemia cells. The blood serums containing the extracted guanine were used in differential pulse voltammetry (DPV) with reduced graphene oxide nanowall (rGONW) electrodes to develop fast and ultra-sensitive electrochemical detection of leukemia cells at leukemia fractions (LFs) of ~10-11 (as the lower detection limit). The stability of the DPV signals obtained by oxidation of the extracted guanine on the rGONWs was studied after 20 cycles. Without the guanine extraction, the DPV peaks relating to guanine oxidation of normal and abnormal cells overlapped at LFs <10-9, and consequently, the performance of rGONWs alone was limited at this level. As a benchmark, the DPV using glassy carbon electrodes was able to detect only LFs ~ 10-2. The ultra-sensitivity obtained by this combination method (guanine extraction by GONPs and then guanine oxidation by rGONWs) is five orders of magnitude better than the sensitivity of the best current technologies (e.g., specific mutations by polymerase chain reaction) which not only are expensive, but also require a few days for diagnosis.Graphene oxide nanoplatelets (GONPs) with extremely sharp edges (lateral dimensions ~20-200 nm and thicknesses <2 nm) were applied in extraction of the overexpressed guanine synthesized in the cytoplasm of leukemia cells. The blood serums containing the extracted guanine were used in differential pulse voltammetry (DPV) with reduced graphene oxide nanowall (rGONW) electrodes to develop fast and ultra-sensitive electrochemical detection of leukemia cells at leukemia fractions (LFs) of ~10-11 (as the lower detection limit). The stability of the DPV signals obtained by oxidation of the extracted guanine on the rGONWs was studied after 20 cycles. Without the guanine extraction, the DPV peaks relating to guanine oxidation of normal and abnormal cells overlapped at LFs <10-9, and consequently, the performance of rGONWs alone was limited at this level. As a benchmark, the DPV using glassy carbon electrodes was able to detect only LFs ~ 10-2. The ultra-sensitivity obtained by this combination method (guanine extraction by GONPs and then guanine oxidation by rGONWs) is five orders of magnitude better than the sensitivity of the best current technologies (e.g., specific mutations by polymerase chain reaction) which not only are expensive, but also require a few days for diagnosis. Electronic supplementary information (ESI) available. See DOI: 10.1039/C4NR04589K

  11. Sensitivity of different Trypanosoma vivax specific primers for the diagnosis of livestock trypanosomosis using different DNA extraction methods.

    PubMed

    Gonzales, J L; Loza, A; Chacon, E

    2006-03-15

    There are several T. vivax specific primers developed for PCR diagnosis. Most of these primers were validated under different DNA extraction methods and study designs leading to heterogeneity of results. The objective of the present study was to validate PCR as a diagnostic test for T. vivax trypanosomosis by means of determining the test sensitivity of different published specific primers with different sample preparations. Four different DNA extraction methods were used to test the sensitivity of PCR with four different primer sets. DNA was extracted directly from whole blood samples, blood dried on filter papers or blood dried on FTA cards. The results showed that the sensitivity of PCR with each primer set was highly dependant of the sample preparation and DNA extraction method. The highest sensitivities for all the primers tested were determined using DNA extracted from whole blood samples, while the lowest sensitivities were obtained when DNA was extracted from filter paper preparations. To conclude, the obtained results are discussed and a protocol for diagnosis and surveillance for T. vivax trypanosomosis is recommended.

  12. Key Relation Extraction from Biomedical Publications.

    PubMed

    Huang, Lan; Wang, Ye; Gong, Leiguang; Kulikowski, Casimir; Bai, Tian

    2017-01-01

    Within the large body of biomedical knowledge, recent findings and discoveries are most often presented as research articles. Their number has been increasing sharply since the turn of the century, presenting ever-growing challenges for search and discovery of knowledge and information related to specific topics of interest, even with the help of advanced online search tools. This is especially true when the goal of a search is to find or discover key relations between important concepts or topic words. We have developed an innovative method for extracting key relations between concepts from abstracts of articles. The method focuses on relations between keywords or topic words in the articles. Early experiments with the method on PubMed publications have shown promising results in searching and discovering keywords and their relationships that are strongly related to the main topic of an article.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cavanaugh, J.E.; McQuarrie, A.D.; Shumway, R.H.

    Conventional methods for discriminating between earthquakes and explosions at regional distances have concentrated on extracting specific features such as amplitude and spectral ratios from the waveforms of the P and S phases. We consider here an optimum nonparametric classification procedure derived from the classical approach to discriminating between two Gaussian processes with unequal spectra. Two robust variations based on the minimum discrimination information statistic and Renyi's entropy are also considered. We compare the optimum classification procedure with various amplitude and spectral ratio discriminants and show that its performance is superior when applied to a small population of 8 land-based earthquakesmore » and 8 mining explosions recorded in Scandinavia. Several parametric characterizations of the notion of complexity based on modeling earthquakes and explosions as autoregressive or modulated autoregressive processes are also proposed and their performance compared with the nonparametric and feature extraction approaches.« less

  14. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

    PubMed

    Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I

    2015-02-21

    Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

  15. Supercritical fluid extraction of fat from ground beef: effects of water on gravimetric and GC-FAME fat determinations.

    PubMed

    Eller, F J; King, J W

    2001-10-01

    This study investigated the supercritical carbon dioxide (SC-CO(2)) extraction of fat from ground beef and the effects of several factors on the gravimetric determination of fat. The use of ethanol modifier with the SC-CO(2) was not necessary for efficient fat extraction; however, the ethanol did increase the coextraction of water. This coextraction of water caused a significant overestimation of gravimetric fat. Oven-drying ground beef samples prior to extraction inhibited the subsequent extraction of fat, whereas oven-drying the extract after collection decreased the subsequent gas chromatographic fatty acid methyl ester (GC-FAME) fat determination. None of the drying agents tested were able to completely prevent the coextraction of water, and silica gel and molecular sieves inhibited the complete extraction of fat. Measurements of collection vial mass indicated that CO(2) extraction/collection causes an initial increase in mass due to the density of CO(2) (relative to displaced air) followed by a decrease in vial mass due to the removal of adsorbed water from the collection vial. Microwave-drying of the empty collection vials removes approximately 3 mg of adsorbed water, approximately 15-20 min is required for readsorption of the displaced water. For collection vials containing collected fat, microwave-drying effectively removed coextracted water, and the vials reached equilibration after approximately 10-15 min. Silanizing collection vials did not significantly affect weight loss during microwave-drying. SC-CO(2) can be used to accurately determine fat gravimetrically for ground beef, and the presented method can also be followed by GC-FAME analysis to provide specific fatty acid information as well.

  16. Extraction of Graph Information Based on Image Contents and the Use of Ontology

    ERIC Educational Resources Information Center

    Kanjanawattana, Sarunya; Kimura, Masaomi

    2016-01-01

    A graph is an effective form of data representation used to summarize complex information. Explicit information such as the relationship between the X- and Y-axes can be easily extracted from a graph by applying human intelligence. However, implicit knowledge such as information obtained from other related concepts in an ontology also resides in…

  17. Ontology-Based Information Extraction for Business Intelligence

    NASA Astrophysics Data System (ADS)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  18. CMS-2 Reverse Engineering and ENCORE/MODEL Integration

    DTIC Science & Technology

    1992-05-01

    Automated extraction of design information from an existing software system written in CMS-2 can be used to document that system as-built, and that I The...extracted information is provided by a commer- dally available CASE tool. * Information describing software system design is automatically extracted...the displays in Figures 1, 2, and 3. T achiev ths GE 11 b iuo w as rjcs CM-2t Aa nsltr(M2da 1 n Joia Reverse EwngiernTcnlg 5RT [2GRE] . Two xampe fD

  19. DTIC (Defense Technical Information Center) Model Action Plan for Incorporating DGIS (DOD Gateway Information System) Capabilities.

    DTIC Science & Technology

    1986-05-01

    Information System (DGIS) is being developed to provide the DD crmjnj t with a modern tool to access diverse dtabaiees and extract information products...this community with a modern tool for accessing these databases and extracting information products from them. Since the Defense Technical Information...adjunct to DROLS xesults. The study , thereor. centerd around obtaining background information inside the unit on that unit’s users who request DROLS

  20. The Quickest, Lowest-cost Lunar Resource Assessment Program: Integrated High-tech Earth-based Astronomy

    NASA Technical Reports Server (NTRS)

    Pieters, Carle M.

    1992-01-01

    Science and technology applications for the Moon have not fully kept pace with technical advancements in sensor development and analytical information extraction capabilities. Appropriate unanswered questions for the Moon abound, but until recently there has been little motivation to link sophisticated technical capabilities with specific measurement and analysis projects. Over the last decade enormous technical progress has been made in the development of (1) CCD photometric array detectors; (2) visible to near-infrared imaging spectrometers; (3)infrared spectroscopy; (4) high-resolution dual-polarization radar imaging at 3.5, 12, and 70 cm; and equally important (5) data analysis and information extraction techniques using compact powerful computers. Parts of each of these have been tested separately, but there has been no programmatic effort to develop and optimize instruments to meet lunar science and resource assessment needs (e.g., specific wavelength range, resolution, etc.) nor to coordinate activities so that the symbiotic relation between different kinds of data can be fully realized. No single type of remotely acquired data completely characterizes the lunar environment, but there has been little opportunity for integration of diverse advanced sensor data for the Moon. Two examples of technology concepts for lunar measurements are given. Using VIS/near-IR spectroscopy, the mineral composition of surface material can be derived from visible and near-infrared radiation reflected from the surface. The surface and subsurface scattering properties of the Moon can be analyzed using radar backscattering imaging.

Top