Sample records for information extraction tools

  1. Can we replace curation with information extraction software?

    PubMed

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.

  2. Tool Wear Feature Extraction Based on Hilbert Marginal Spectrum

    NASA Astrophysics Data System (ADS)

    Guan, Shan; Song, Weijie; Pang, Hongyang

    2017-09-01

    In the metal cutting process, the signal contains a wealth of tool wear state information. A tool wear signal’s analysis and feature extraction method based on Hilbert marginal spectrum is proposed. Firstly, the tool wear signal was decomposed by empirical mode decomposition algorithm and the intrinsic mode functions including the main information were screened out by the correlation coefficient and the variance contribution rate. Secondly, Hilbert transform was performed on the main intrinsic mode functions. Hilbert time-frequency spectrum and Hilbert marginal spectrum were obtained by Hilbert transform. Finally, Amplitude domain indexes were extracted on the basis of the Hilbert marginal spectrum and they structured recognition feature vector of tool wear state. The research results show that the extracted features can effectively characterize the different wear state of the tool, which provides a basis for monitoring tool wear condition.

  3. DTIC (Defense Technical Information Center) Model Action Plan for Incorporating DGIS (DOD Gateway Information System) Capabilities.

    DTIC Science & Technology

    1986-05-01

    Information System (DGIS) is being developed to provide the DD crmjnj t with a modern tool to access diverse dtabaiees and extract information products...this community with a modern tool for accessing these databases and extracting information products from them. Since the Defense Technical Information...adjunct to DROLS xesults. The study , thereor. centerd around obtaining background information inside the unit on that unit’s users who request DROLS

  4. Systematically Extracting Metal- and Solvent-Related Occupational Information from Free-Text Responses to Lifetime Occupational History Questionnaires

    PubMed Central

    Friesen, Melissa C.; Locke, Sarah J.; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A.; Purdue, Mark; Colt, Joanne S.

    2014-01-01

    Objectives: Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants’ jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Methods: Our study population comprised 2408 subjects, reporting 11991 jobs, from a case–control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert’s independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Results: Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44–51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9–14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Conclusions: Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. PMID:24590110

  5. Ontology-Based Information Extraction for Business Intelligence

    NASA Astrophysics Data System (ADS)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  6. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction.

    PubMed

    Jonnalagadda, Siddhartha; Gonzalez, Graciela

    2010-11-13

    BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net.

  7. Systematically extracting metal- and solvent-related occupational information from free-text responses to lifetime occupational history questionnaires.

    PubMed

    Friesen, Melissa C; Locke, Sarah J; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A; Purdue, Mark; Colt, Joanne S

    2014-06-01

    Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants' jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Our study population comprised 2408 subjects, reporting 11991 jobs, from a case-control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert's independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44-51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9-14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.

  8. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).

    PubMed

    Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S

    2016-10-01

    Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

  9. IPAT: a freely accessible software tool for analyzing multiple patent documents with inbuilt landscape visualizer.

    PubMed

    Ajay, Dara; Gangwal, Rahul P; Sangamwar, Abhay T

    2015-01-01

    Intelligent Patent Analysis Tool (IPAT) is an online data retrieval tool, operated based on text mining algorithm to extract specific patent information in a predetermined pattern into an Excel sheet. The software is designed and developed to retrieve and analyze technology information from multiple patent documents and generate various patent landscape graphs and charts. The software is C# coded in visual studio 2010, which extracts the publicly available patent information from the web pages like Google Patent and simultaneously study the various technology trends based on user-defined parameters. In other words, IPAT combined with the manual categorization will act as an excellent technology assessment tool in competitive intelligence and due diligence for predicting the future R&D forecast.

  10. HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records

    PubMed Central

    Aggarwal, Anshul; Garhwal, Sunita

    2018-01-01

    Objectives One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. Methods A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. Results The HEDEA system is working, covering a large set of formats, to extract and analyse health information. Conclusions This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes. PMID:29770248

  11. HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records.

    PubMed

    Aggarwal, Anshul; Garhwal, Sunita; Kumar, Ajay

    2018-04-01

    One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. The HEDEA system is working, covering a large set of formats, to extract and analyse health information. This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.

  12. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org

  13. Electronic processing of informed consents in a global pharmaceutical company environment.

    PubMed

    Vishnyakova, Dina; Gobeill, Julien; Oezdemir-Zaech, Fatma; Kreim, Olivier; Vachon, Therese; Clade, Thierry; Haenning, Xavier; Mikhailov, Dmitri; Ruch, Patrick

    2014-01-01

    We present an electronic capture tool to process informed consents, which are mandatory recorded when running a clinical trial. This tool aims at the extraction of information expressing the duration of the consent given by the patient to authorize the exploitation of biomarker-related information collected during clinical trials. The system integrates a language detection module (LDM) to route a document into the appropriate information extraction module (IEM). The IEM is based on language-specific sets of linguistic rules for the identification of relevant textual facts. The achieved accuracy of both the LDM and IEM is 99%. The architecture of the system is described in detail.

  14. Apache Clinical Text and Knowledge Extraction System (cTAKES) | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    The tool extracts deep phenotypic information from the clinical narrative at the document-, episode-, and patient-level. The final output is FHIR compliant patient-level phenotypic summary which can be consumed by research warehouses or the DeepPhe native visualization tool.

  15. Information extraction for enhanced access to disease outbreak reports.

    PubMed

    Grishman, Ralph; Huttunen, Silja; Yangarber, Roman

    2002-08-01

    Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.

  16. Information Extraction for System-Software Safety Analysis: Calendar Year 2008 Year-End Report

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.

    2009-01-01

    This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.

  17. CMS-2 Reverse Engineering and ENCORE/MODEL Integration

    DTIC Science & Technology

    1992-05-01

    Automated extraction of design information from an existing software system written in CMS-2 can be used to document that system as-built, and that I The...extracted information is provided by a commer- dally available CASE tool. * Information describing software system design is automatically extracted...the displays in Figures 1, 2, and 3. T achiev ths GE 11 b iuo w as rjcs CM-2t Aa nsltr(M2da 1 n Joia Reverse EwngiernTcnlg 5RT [2GRE] . Two xampe fD

  18. GuidosToolbox: universal digital image object analysis

    Treesearch

    Peter Vogt; Kurt Riitters

    2017-01-01

    The increased availability of mapped environmental data calls for better tools to analyze the spatial characteristics and information contained in those maps. Publicly available, userfriendly and universal tools are needed to foster the interdisciplinary development and application of methodologies for the extraction of image object information properties contained in...

  19. A mask quality control tool for the OSIRIS multi-object spectrograph

    NASA Astrophysics Data System (ADS)

    López-Ruiz, J. C.; Vaz Cedillo, Jacinto Javier; Ederoclite, Alessandro; Bongiovanni, Ángel; González Escalera, Víctor

    2012-09-01

    OSIRIS multi object spectrograph uses a set of user-customised-masks, which are manufactured on-demand. The manufacturing process consists of drilling the specified slits on the mask with the required accuracy. Ensuring that slits are on the right place when observing is of vital importance. We present a tool for checking the quality of the process of manufacturing the masks which is based on analyzing the instrument images obtained with the manufactured masks on place. The tool extracts the slit information from these images, relates specifications with the extracted slit information, and finally communicates to the operator if the manufactured mask fulfills the expectations of the mask designer. The proposed tool has been built using scripting languages and using standard libraries such as opencv, pyraf and scipy. The software architecture, advantages and limits of this tool in the lifecycle of a multiobject acquisition are presented.

  20. Development of the major trauma case review tool.

    PubMed

    Curtis, Kate; Mitchell, Rebecca; McCarthy, Amy; Wilson, Kellie; Van, Connie; Kennedy, Belinda; Tall, Gary; Holland, Andrew; Foster, Kim; Dickinson, Stuart; Stelfox, Henry T

    2017-02-28

    As many as half of all patients with major traumatic injuries do not receive the recommended care, with variance in preventable mortality reported across the globe. This variance highlights the need for a comprehensive process for monitoring and reviewing patient care, central to which is a consistent peer-review process that includes trauma system safety and human factors. There is no published, evidence-informed standardised tool that considers these factors for use in adult or paediatric trauma case peer-review. The aim of this research was to develop and validate a trauma case review tool to facilitate clinical review of paediatric trauma patient care in extracting information to facilitate monitoring, inform change and enable loop closure. Development of the trauma case review tool was multi-faceted, beginning with a review of the trauma audit tool literature. Data were extracted from the literature to inform iterative tool development using a consensus approach. Inter-rater agreement was assessed for both the pilot and finalised versions of the tool. The final trauma case review tool contained ten sections, including patient factors (such as pre-existing conditions), presenting problem, a timeline of events, factors contributing to the care delivery problem (including equipment, work environment, staff action, organizational factors), positive aspects of care and the outcome of panel discussion. After refinement, the inter-rater reliability of the human factors and outcome components of the tool improved with an average 86% agreement between raters. This research developed an evidence-informed tool for use in paediatric trauma case review that considers both system safety and human factors to facilitate clinical review of trauma patient care. This tool can be used to identify opportunities for improvement in trauma care and guide quality assurance activities. Validation is required in the adult population.

  1. Automation for System Safety Analysis

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Fleming, Land; Throop, David; Thronesbery, Carroll; Flores, Joshua; Bennett, Ted; Wennberg, Paul

    2009-01-01

    This presentation describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.

  2. QTLTableMiner++: semantic mining of QTL tables in scientific articles.

    PubMed

    Singh, Gurnoor; Kuzniar, Arnold; van Mulligen, Erik M; Gavai, Anand; Bachem, Christian W; Visser, Richard G F; Finkers, Richard

    2018-05-25

    A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner ++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.

  3. Model of experts for decision support in the diagnosis of leukemia patients.

    PubMed

    Corchado, Juan M; De Paz, Juan F; Rodríguez, Sara; Bajo, Javier

    2009-07-01

    Recent advances in the field of biomedicine, specifically in the field of genomics, have led to an increase in the information available for conducting expression analysis. Expression analysis is a technique used in transcriptomics, a branch of genomics that deals with the study of messenger ribonucleic acid (mRNA) and the extraction of information contained in the genes. This increase in information is reflected in the exon arrays, which require the use of new techniques in order to extract the information. The purpose of this study is to provide a tool based on a mixture of experts model that allows the analysis of the information contained in the exon arrays, from which automatic classifications for decision support in diagnoses of leukemia patients can be made. The proposed model integrates several cooperative algorithms characterized for their efficiency for data processing, filtering, classification and knowledge extraction. The Cancer Institute of the University of Salamanca is making an effort to develop tools to automate the evaluation of data and to facilitate de analysis of information. This proposal is a step forward in this direction and the first step toward the development of a mixture of experts tool that integrates different cognitive and statistical approaches to deal with the analysis of exon arrays. The mixture of experts model presented within this work provides great capacities for learning and adaptation to the characteristics of the problem in consideration, using novel algorithms in each of the stages of the analysis process that can be easily configured and combined, and provides results that notably improve those provided by the existing methods for exon arrays analysis. The material used consists of data from exon arrays provided by the Cancer Institute that contain samples from leukemia patients. The methodology used consists of a system based on a mixture of experts. Each one of the experts incorporates novel artificial intelligence techniques that improve the process of carrying out various tasks such as pre-processing, filtering, classification and extraction of knowledge. This article will detail the manner in which individual experts are combined so that together they generate a system capable of extracting knowledge, thus permitting patients to be classified in an automatic and efficient manner that is also comprehensible for medical personnel. The system has been tested in a real setting and has been used for classifying patients who suffer from different forms of leukemia at various stages. Personnel from the Cancer Institute supervised and participated throughout the testing period. Preliminary results are promising, notably improving the results obtained with previously used tools. The medical staff from the Cancer Institute considers the tools that have been developed to be positive and very useful in a supporting capacity for carrying out their daily tasks. Additionally the mixture of experts supplies a tool for the extraction of necessary information in order to explain the associations that have been made in simple terms. That is, it permits the extraction of knowledge for each classification made and generalized in order to be used in subsequent classifications. This allows for a large amount of learning and adaptation within the proposed system.

  4. Models Extracted from Text for System-Software Safety Analyses

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.

    2010-01-01

    This presentation describes extraction and integration of requirements information and safety information in visualizations to support early review of completeness, correctness, and consistency of lengthy and diverse system safety analyses. Software tools have been developed and extended to perform the following tasks: 1) extract model parts and safety information from text in interface requirements documents, failure modes and effects analyses and hazard reports; 2) map and integrate the information to develop system architecture models and visualizations for safety analysts; and 3) provide model output to support virtual system integration testing. This presentation illustrates the methods and products with a rocket motor initiation case.

  5. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature

    PubMed Central

    Jimeno Yepes, Antonio; Verspoor, Karin

    2014-01-01

    As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature. PMID:25285203

  6. Automated generation of individually customized visualizations of diagnosis-specific medical information using novel techniques of information extraction

    NASA Astrophysics Data System (ADS)

    Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang

    2005-04-01

    Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.

  7. Oxygen octahedra picker: A software tool to extract quantitative information from STEM images.

    PubMed

    Wang, Yi; Salzberger, Ute; Sigle, Wilfried; Eren Suyolcu, Y; van Aken, Peter A

    2016-09-01

    In perovskite oxide based materials and hetero-structures there are often strong correlations between oxygen octahedral distortions and functionality. Thus, atomistic understanding of the octahedral distortion, which requires accurate measurements of atomic column positions, will greatly help to engineer their properties. Here, we report the development of a software tool to extract quantitative information of the lattice and of BO6 octahedral distortions from STEM images. Center-of-mass and 2D Gaussian fitting methods are implemented to locate positions of individual atom columns. The precision of atomic column distance measurements is evaluated on both simulated and experimental images. The application of the software tool is demonstrated using practical examples. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  8. Analysis of the electromagnetic wave resistivity tool in deviated well drilling

    NASA Astrophysics Data System (ADS)

    Zhang, Yumei; Xu, Lijun; Cao, Zhang

    2014-04-01

    Electromagnetic wave resistivity (EWR) tools are used to provide real-time measurements of resistivity in the formation around the tool in Logging While Drilling (LWD). In this paper, the acquired resistivity information in the formation is analyzed to extract more information, including dipping angle and azimuth direction of the drill. A finite element (FM) model of EWR tool working in layered earth formations is established. Numerical analysis and FM simulations are employed to analyze the amplitude ratio and phase difference between the voltages measured at the two receivers of the EWR tool in deviated well drilling.

  9. Natural Language Processing.

    ERIC Educational Resources Information Center

    Chowdhury, Gobinda G.

    2003-01-01

    Discusses issues related to natural language processing, including theoretical developments; natural language understanding; tools and techniques; natural language text processing systems; abstracting; information extraction; information retrieval; interfaces; software; Internet, Web, and digital library applications; machine translation for…

  10. Novel texture-based descriptors for tool wear condition monitoring

    NASA Astrophysics Data System (ADS)

    Antić, Aco; Popović, Branislav; Krstanović, Lidija; Obradović, Ratko; Milošević, Mijodrag

    2018-01-01

    All state-of-the-art tool condition monitoring systems (TCM) in the tool wear recognition task, especially those that use vibration sensors, heavily depend on the choice of descriptors containing information about the tool wear state which are extracted from the particular sensor signals. All other post-processing techniques do not manage to increase the recognition precision if those descriptors are not discriminative enough. In this work, we propose a tool wear monitoring strategy which relies on the novel texture based descriptors. We consider the module of the Short Term Discrete Fourier Transform (STDFT) spectra obtained from the particular vibration sensors signal utterance as the 2D textured image. This is done by identifying the time scale of STDFT as the first dimension, and the frequency scale as the second dimension of the particular textured image. The obtained textured image is then divided into particular 2D texture patches, covering a part of the frequency range of interest. After applying the appropriate filter bank, 2D textons are extracted for each predefined frequency band. By averaging in time, we extract from the textons for each band of interest the information regarding the Probability Density Function (PDF) in the form of lower order moments, thus obtaining robust tool wear state descriptors. We validate the proposed features by the experiments conducted on the real TCM system, obtaining the high recognition accuracy.

  11. A rapid extraction of landslide disaster information research based on GF-1 image

    NASA Astrophysics Data System (ADS)

    Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na

    2015-08-01

    In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.

  12. DEXTER: Disease-Expression Relation Extraction from Text.

    PubMed

    Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K

    2018-01-01

    Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.

  13. Enhancing biomedical text summarization using semantic relation extraction.

    PubMed

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  14. Research on Optimal Observation Scale for Damaged Buildings after Earthquake Based on Optimal Feature Space

    NASA Astrophysics Data System (ADS)

    Chen, J.; Chen, W.; Dou, A.; Li, W.; Sun, Y.

    2018-04-01

    A new information extraction method of damaged buildings rooted in optimal feature space is put forward on the basis of the traditional object-oriented method. In this new method, ESP (estimate of scale parameter) tool is used to optimize the segmentation of image. Then the distance matrix and minimum separation distance of all kinds of surface features are calculated through sample selection to find the optimal feature space, which is finally applied to extract the image of damaged buildings after earthquake. The overall extraction accuracy reaches 83.1 %, the kappa coefficient 0.813. The new information extraction method greatly improves the extraction accuracy and efficiency, compared with the traditional object-oriented method, and owns a good promotional value in the information extraction of damaged buildings. In addition, the new method can be used for the information extraction of different-resolution images of damaged buildings after earthquake, then to seek the optimal observation scale of damaged buildings through accuracy evaluation. It is supposed that the optimal observation scale of damaged buildings is between 1 m and 1.2 m, which provides a reference for future information extraction of damaged buildings.

  15. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    PubMed

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  16. Concept maps: A tool for knowledge management and synthesis in web-based conversational learning.

    PubMed

    Joshi, Ankur; Singh, Satendra; Jaswal, Shivani; Badyal, Dinesh Kumar; Singh, Tejinder

    2016-01-01

    Web-based conversational learning provides an opportunity for shared knowledge base creation through collaboration and collective wisdom extraction. Usually, the amount of generated information in such forums is very huge, multidimensional (in alignment with the desirable preconditions for constructivist knowledge creation), and sometimes, the nature of expected new information may not be anticipated in advance. Thus, concept maps (crafted from constructed data) as "process summary" tools may be a solution to improve critical thinking and learning by making connections between the facts or knowledge shared by the participants during online discussion This exploratory paper begins with the description of this innovation tried on a web-based interacting platform (email list management software), FAIMER-Listserv, and generated qualitative evidence through peer-feedback. This process description is further supported by a theoretical construct which shows how social constructivism (inclusive of autonomy and complexity) affects the conversational learning. The paper rationalizes the use of concept map as mid-summary tool for extracting information and further sense making out of this apparent intricacy.

  17. Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources.

    PubMed

    Mao, Jin; Moore, Lisa R; Blank, Carrine E; Wu, Elvis Hsin-Hui; Ackerman, Marcia; Ranade, Sonali; Cui, Hong

    2016-12-13

    The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.

  18. Acquiring geographical data with web harvesting

    NASA Astrophysics Data System (ADS)

    Dramowicz, K.

    2016-04-01

    Many websites contain very attractive and up to date geographical information. This information can be extracted, stored, analyzed and mapped using web harvesting techniques. Poorly organized data from websites are transformed with web harvesting into a more structured format, which can be stored in a database and analyzed. Almost 25% of web traffic is related to web harvesting, mostly while using search engines. This paper presents how to harvest geographic information from web documents using the free tool called the Beautiful Soup, one of the most commonly used Python libraries for pulling data from HTML and XML files. It is a relatively easy task to process one static HTML table. The more challenging task is to extract and save information from tables located in multiple and poorly organized websites. Legal and ethical aspects of web harvesting are discussed as well. The paper demonstrates two case studies. The first one shows how to extract various types of information about the Good Country Index from the multiple web pages, load it into one attribute table and map the results. The second case study shows how script tools and GIS can be used to extract information from one hundred thirty six websites about Nova Scotia wines. In a little more than three minutes a database containing one hundred and six liquor stores selling these wines is created. Then the availability and spatial distribution of various types of wines (by grape types, by wineries, and by liquor stores) are mapped and analyzed.

  19. Simulation Tools for Forest Health Analysis: An Application in the Red River Watershed, Idaho

    Treesearch

    Andrew J. McMahan; Eric L. Smith

    2006-01-01

    Software tools for landscape analyses--including FVS model extensions, and a number of FVS-related pre- and post-processing “tools”--are presented, using an analysis in the Red River Watershed, Nez Perce National Forest as an example. We present (1) a discussion of pre-simulation data analysis; (2) the Physiographic Information Extraction System (PIES), a tool that can...

  20. Information Extraction for System-Software Safety Analysis: Calendar Year 2007 Year-End Report

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.

    2008-01-01

    This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis on the models to identify possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations; 4) perform discrete-time-based simulation on the models to investigate scenarios where these paths may play a role in failures and mishaps; and 5) identify resulting candidate scenarios for software integration testing. This paper describes new challenges in a NASA abort system case, and enhancements made to develop the integrated tool set.

  1. Experimental resource pulses influence social-network dynamics and the potential for information flow in tool-using crows

    PubMed Central

    St Clair, James J. H.; Burns, Zackory T.; Bettaney, Elaine M.; Morrissey, Michael B.; Otis, Brian; Ryder, Thomas B.; Fleischer, Robert C.; James, Richard; Rutz, Christian

    2015-01-01

    Social-network dynamics have profound consequences for biological processes such as information flow, but are notoriously difficult to measure in the wild. We used novel transceiver technology to chart association patterns across 19 days in a wild population of the New Caledonian crow—a tool-using species that may socially learn, and culturally accumulate, tool-related information. To examine the causes and consequences of changing network topology, we manipulated the environmental availability of the crows' preferred tool-extracted prey, and simulated, in silico, the diffusion of information across field-recorded time-ordered networks. Here we show that network structure responds quickly to environmental change and that novel information can potentially spread rapidly within multi-family communities, especially when tool-use opportunities are plentiful. At the same time, we report surprisingly limited social contact between neighbouring crow communities. Such scale dependence in information-flow dynamics is likely to influence the evolution and maintenance of material cultures. PMID:26529116

  2. Grasping the Affordances, Understanding the Reasoning: Toward a Dialectical Theory of Human Tool Use

    ERIC Educational Resources Information Center

    Osiurak, Francois; Jarry, Christophe; Le Gall, Didier

    2010-01-01

    One of the most exciting issues in psychology is, What are the psychological mechanisms underlying human tool use? The computational approach assumes that the use of a tool (e.g., a hammer) requires the extraction of sensory information about object properties (heavy, rigid), which can then be translated into appropriate motor outputs (grasping,…

  3. Everglades Depth Estimation Network (EDEN) Applications: Tools to View, Extract, Plot, and Manipulate EDEN Data

    USGS Publications Warehouse

    Telis, Pamela A.; Henkel, Heather

    2009-01-01

    The Everglades Depth Estimation Network (EDEN) is an integrated system of real-time water-level monitoring, ground-elevation data, and water-surface elevation modeling to provide scientists and water managers with current on-line water-depth information for the entire freshwater part of the greater Everglades. To assist users in applying the EDEN data to their particular needs, a series of five EDEN tools, or applications (EDENapps), were developed. Using EDEN's tools, scientists can view the EDEN datasets of daily water-level and ground elevations, compute and view daily water depth and hydroperiod surfaces, extract data for user-specified locations, plot transects of water level, and animate water-level transects over time. Also, users can retrieve data from the EDEN datasets for analysis and display in other analysis software programs. As scientists and managers attempt to restore the natural volume, timing, and distribution of sheetflow in the wetlands, such information is invaluable. Information analyzed and presented with these tools is used to advise policy makers, planners, and decision makers of the potential effects of water management and restoration scenarios on the natural resources of the Everglades.

  4. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text

    PubMed Central

    Divita, Guy; Carter, Marjorie E.; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H.; Gundlapalli, Adi V.

    2016-01-01

    Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records. PMID:27683667

  5. MOLECULAR MODELING AS A TOOL FOR UNDERSTANDING HUMAN HEALTH RISKS

    EPA Science Inventory

    A GENERIC STEP IN MANY MECHANISMS FOR CHEMICAL TOXICITY IS THE INTERACTION BETWEEN A SMALL MOLECULE AND A BIOLOGICAL MACROMOLECULE. THE INFORMATION THAT IS GATHERED FROM THIS STUDY WILL THEN BE USED TO EXTRACT RELATIONSHIPS AMONG THE INFORMATION DOMAINS.

  6. Enhancing Biomedical Text Summarization Using Semantic Relation Extraction

    PubMed Central

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336

  7. Array data extractor (ADE): a LabVIEW program to extract and merge gene array data.

    PubMed

    Kurtenbach, Stefan; Kurtenbach, Sarah; Zoidl, Georg

    2013-12-01

    Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Although existing software allows for complex data analyses, the LabVIEW based program presented here, "Array Data Extractor (ADE)", provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge.

  8. Proactive Response to Potential Material Shortages Arising from Environmental Restrictions Using Automatic Discovery and Extraction of Information from Technical Documents

    DTIC Science & Technology

    2012-12-21

    material data and other key information in a UIMA environment. In the course of this project, the tools and methods developed were used to extract and...Architecture ( UIMA ) library from the Apache Software Foundation. Using this architecture, a given document is run through several “annotators” to...material taxonomy developed for the XSB, Inc. Coherent View™ database. In order to integrate this technology into the Java-based UIMA annotation

  9. The Collaborative Lecture Annotation System (CLAS): A New TOOL for Distributed Learning

    ERIC Educational Resources Information Center

    Risko, E. F.; Foulsham, T.; Dawson, S.; Kingstone, A.

    2013-01-01

    In the context of a lecture, the capacity to readily recognize and synthesize key concepts is crucial for comprehension and overall educational performance. In this paper, we introduce a tool, the Collaborative Lecture Annotation System (CLAS), which has been developed to make the extraction of important information a more collaborative and…

  10. The LifeWatch approach to the exploration of distributed species information

    PubMed Central

    Fuentes, Daniel; Fiore, Nicola

    2014-01-01

    Abstract This paper introduces a new method of automatically extracting, integrating and presenting information regarding species from the most relevant online taxonomic resources. First, the information is extracted and joined using data wrappers and integration solutions. Then, an analytical tool is used to provide a visual representation of the data. The information is then integrated into a user friendly content management system. The proposal has been implemented using data from the Global Biodiversity Information Facility (GBIF), the Catalogue of Life (CoL), the World Register of Marine Species (WoRMS), the Integrated Taxonomic Information System (ITIS) and the Global Names Index (GNI). The approach improves data quality, avoiding taxonomic and nomenclature errors whilst increasing the availability and accessibility of the information. PMID:25589865

  11. TextHunter – A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research

    PubMed Central

    Jackson MSc, Richard G.; Ball, Michael; Patel, Rashmi; Hayes, Richard D.; Dobson, Richard J.B.; Stewart, Robert

    2014-01-01

    Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises both increased sample size and data richness - therefore unprecedented study power. However, in many medical domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data, construction of concept extraction machine learning models and their application to documents. Using confidence thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99% in real world use cases. PMID:25954379

  12. Research on Crowdsourcing Emergency Information Extraction of Based on Events' Frame

    NASA Astrophysics Data System (ADS)

    Yang, Bo; Wang, Jizhou; Ma, Weijun; Mao, Xi

    2018-01-01

    At present, the common information extraction method cannot extract the structured emergency event information accurately; the general information retrieval tool cannot completely identify the emergency geographic information; these ways also do not have an accurate assessment of these results of distilling. So, this paper proposes an emergency information collection technology based on event framework. This technique is to solve the problem of emergency information picking. It mainly includes emergency information extraction model (EIEM), complete address recognition method (CARM) and the accuracy evaluation model of emergency information (AEMEI). EIEM can be structured to extract emergency information and complements the lack of network data acquisition in emergency mapping. CARM uses a hierarchical model and the shortest path algorithm and allows the toponomy pieces to be joined as a full address. AEMEI analyzes the results of the emergency event and summarizes the advantages and disadvantages of the event framework. Experiments show that event frame technology can solve the problem of emergency information drawing and provides reference cases for other applications. When the emergency disaster is about to occur, the relevant departments query emergency's data that has occurred in the past. They can make arrangements ahead of schedule which defense and reducing disaster. The technology decreases the number of casualties and property damage in the country and world. This is of great significance to the state and society.

  13. BilKristal 2.0: A tool for pattern information extraction from crystal structures

    NASA Astrophysics Data System (ADS)

    Okuyan, Erhan; Güdükbay, Uğur

    2014-01-01

    We present a revised version of the BilKristal tool of Okuyan et al. (2007). We converted the development environment into Microsoft Visual Studio 2005 in order to resolve compatibility issues. We added multi-core CPU support and improvements are made to graphics functions in order to improve performance. Discovered bugs are fixed and exporting functionality to a material visualization tool is added.

  14. Quality tools and resources to support organisational improvement integral to high-quality primary care: a systematic review of published and grey literature.

    PubMed

    Janamian, Tina; Upham, Susan J; Crossland, Lisa; Jackson, Claire L

    2016-04-18

    To conduct a systematic review of the literature to identify existing online primary care quality improvement tools and resources to support organisational improvement related to the seven elements in the Primary Care Practice Improvement Tool (PC-PIT), with the identified tools and resources to progress to a Delphi study for further assessment of relevance and utility. Systematic review of the international published and grey literature. CINAHL, Embase and PubMed databases were searched in March 2014 for articles published between January 2004 and December 2013. GreyNet International and other relevant websites and repositories were also searched in March-April 2014 for documents dated between 1992 and 2012. All citations were imported into a bibliographic database. Published and unpublished tools and resources were included in the review if they were in English, related to primary care quality improvement and addressed any of the seven PC-PIT elements of a high-performing practice. Tools and resources that met the eligibility criteria were then evaluated for their accessibility, relevance, utility and comprehensiveness using a four-criteria appraisal framework. We used a data extraction template to systematically extract information from eligible tools and resources. A content analysis approach was used to explore the tools and resources and collate relevant information: name of the tool or resource, year and country of development, author, name of the organisation that provided access and its URL, accessibility information or problems, overview of each tool or resource and the quality improvement element(s) it addresses. If available, a copy of the tool or resource was downloaded into the bibliographic database, along with supporting evidence (published or unpublished) on its use in primary care. This systematic review identified 53 tools and resources that can potentially be provided as part of a suite of tools and resources to support primary care practices in improving the quality of their practice, to achieve improved health outcomes.

  15. Considering context: reliable entity networks through contextual relationship extraction

    NASA Astrophysics Data System (ADS)

    David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.

    2016-05-01

    Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.

  16. Array data extractor (ADE): a LabVIEW program to extract and merge gene array data

    PubMed Central

    2013-01-01

    Background Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Findings Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Conclusions Although existing software allows for complex data analyses, the LabVIEW based program presented here, “Array Data Extractor (ADE)”, provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge. PMID:24289243

  17. XID+: Next generation XID development

    NASA Astrophysics Data System (ADS)

    Hurley, Peter

    2017-04-01

    XID+ is a prior-based source extraction tool which carries out photometry in the Herschel SPIRE (Spectral and Photometric Imaging Receiver) maps at the positions of known sources. It uses a probabilistic Bayesian framework that provides a natural framework in which to include prior information, and uses the Bayesian inference tool Stan to obtain the full posterior probability distribution on flux estimates.

  18. An Overview of Biomolecular Event Extraction from Scientific Documents

    PubMed Central

    Vanegas, Jorge A.; Matos, Sérgio; González, Fabio; Oliveira, José L.

    2015-01-01

    This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed. PMID:26587051

  19. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  20. Speech information retrieval: a review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hafen, Ryan P.; Henry, Michael J.

    Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding andmore » to provide direction for further study.« less

  1. Positioning matrix of economic efficiency and complexity: a case study in a university hospital.

    PubMed

    Ippolito, Adelaide; Viggiani, Vincenzo

    2014-01-01

    At the end of 2010, the Federico II University Hospital in Naples, Italy, initiated a series of discussions aimed at designing and applying a positioning matrix to its departments. This analysis was developed to create a tool able to extract meaningful information both to increase knowledge about individual departments and to inform the choices of general management during strategic planning. The name given to this tool was the positioning matrix of economic efficiency and complexity. In the matrix, the x-axis measures the ratio between revenues and costs, whereas the y-axis measures the index of complexity, thus showing "profitability" while bearing in mind the complexity of activities. By using the positioning matrix, it was possible to conduct a critical analysis of the characteristics of the Federico II University Hospital and to extract useful information for general management to use during strategic planning at the end of 2010 when defining medium-term objectives. Copyright © 2013 John Wiley & Sons, Ltd.

  2. Semantic Storyboard of Judicial Debates: A Novel Multimedia Summarization Environment

    ERIC Educational Resources Information Center

    Fersini, E.; Sartori, F.

    2012-01-01

    Purpose: The need of tools for content analysis, information extraction and retrieval of multimedia objects in their native form is strongly emphasized into the judicial domain: digital videos represent a fundamental informative source of events occurring during judicial proceedings that should be stored, organized and retrieved in short time and…

  3. The Registry of Knowledge Translation Methods and Tools: a resource to support evidence-informed public health.

    PubMed

    Peirson, Leslea; Catallo, Cristina; Chera, Sunita

    2013-08-01

    This paper examines the development of a globally accessible online Registry of Knowledge Translation Methods and Tools to support evidence-informed public health. A search strategy, screening and data extraction tools, and writing template were developed to find, assess, and summarize relevant methods and tools. An interactive website and searchable database were designed to house the registry. Formative evaluation was undertaken to inform refinements. Over 43,000 citations were screened; almost 700 were full-text reviewed, 140 of which were included. By November 2012, 133 summaries were available. Between January 1 and November 30, 2012 over 32,945 visitors from more than 190 countries accessed the registry. Results from 286 surveys and 19 interviews indicated the registry is valued and useful, but would benefit from a more intuitive indexing system and refinements to the summaries. User stories and promotional activities help expand the reach and uptake of knowledge translation methods and tools in public health contexts. The National Collaborating Centre for Methods and Tools' Registry of Methods and Tools is a unique and practical resource for public health decision makers worldwide.

  4. Figure Text Extraction in Biomedical Literature

    PubMed Central

    Kim, Daehyun; Yu, Hong

    2011-01-01

    Background Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. Methodology We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. Results/Conclusions The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search. PMID:21249186

  5. Figure text extraction in biomedical literature.

    PubMed

    Kim, Daehyun; Yu, Hong

    2011-01-13

    Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search.

  6. Concept recognition for extracting protein interaction relations from biomedical text

    PubMed Central

    Baumgartner, William A; Lu, Zhiyong; Johnson, Helen L; Caporaso, J Gregory; Paquette, Jesse; Lindemann, Anna; White, Elizabeth K; Medvedeva, Olga; Cohen, K Bretonnel; Hunter, Lawrence

    2008-01-01

    Background: Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing. Results: Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist. Conclusion: Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet . PMID:18834500

  7. Knowledge representation and management: transforming textual information into useful knowledge.

    PubMed

    Rassinoux, A-M

    2010-01-01

    To summarize current outstanding research in the field of knowledge representation and management. Synopsis of the articles selected for the IMIA Yearbook 2010. Four interesting papers, dealing with structured knowledge, have been selected for the section knowledge representation and management. Combining the newest techniques in computational linguistics and natural language processing with the latest methods in statistical data analysis, machine learning and text mining has proved to be efficient for turning unstructured textual information into meaningful knowledge. Three of the four selected papers for the section knowledge representation and management corroborate this approach and depict various experiments conducted to .extract meaningful knowledge from unstructured free texts such as extracting cancer disease characteristics from pathology reports, or extracting protein-protein interactions from biomedical papers, as well as extracting knowledge for the support of hypothesis generation in molecular biology from the Medline literature. Finally, the last paper addresses the level of formally representing and structuring information within clinical terminologies in order to render such information easily available and shareable among the health informatics community. Delivering common powerful tools able to automatically extract meaningful information from the huge amount of electronically unstructured free texts is an essential step towards promoting sharing and reusability across applications, domains, and institutions thus contributing to building capacities worldwide.

  8. Gstruct: a system for extracting schemas from GML documents

    NASA Astrophysics Data System (ADS)

    Chen, Hui; Zhu, Fubao; Guan, Jihong; Zhou, Shuigeng

    2008-10-01

    Geography Markup Language (GML) becomes the de facto standard for geographic information representation on the internet. GML schema provides a way to define the structure, content, and semantic of GML documents. It contains useful structural information of GML documents and plays an important role in storing, querying and analyzing GML data. However, GML schema is not mandatory, and it is common that a GML document contains no schema. In this paper, we present Gstruct, a tool for GML schema extraction. Gstruct finds the features in the input GML documents, identifies geometry datatypes as well as simple datatypes, then integrates all these features and eliminates improper components to output the optimal schema. Experiments demonstrate that Gstruct is effective in extracting semantically meaningful schemas from GML documents.

  9. Client-side Skype forensics: an overview

    NASA Astrophysics Data System (ADS)

    Meißner, Tina; Kröger, Knut; Creutzburg, Reiner

    2013-03-01

    IT security and computer forensics are important components in the information technology. In the present study, a client-side Skype forensics is performed. It is designed to explain which kind of user data are stored on a computer and which tools allow the extraction of those data for a forensic investigation. There are described both methods - a manual analysis and an analysis with (mainly) open source tools, respectively.

  10. An Expertise Recommender using Web Mining

    NASA Technical Reports Server (NTRS)

    Joshi, Anupam; Chandrasekaran, Purnima; ShuYang, Michelle; Ramakrishnan, Ramya

    2001-01-01

    This report explored techniques to mine web pages of scientists to extract information regarding their expertise, build expertise chains and referral webs, and semi automatically combine this information with directory information services to create a recommender system that permits query by expertise. The approach included experimenting with existing techniques that have been reported in research literature in recent past , and adapted them as needed. In addition, software tools were developed to capture and use this information.

  11. Compressive Information Extraction: A Dynamical Systems Approach

    DTIC Science & Technology

    2016-01-24

    sparsely encoded in very large data streams. (a) Target tracking in an urban canyon; (b) and (c) sample frames showing contextually abnormal events: onset...extraction to identify contextually abnormal se- quences (see section 2.2.3). Formally, the problem of interest can be stated as establishing whether a noisy...relaxations with optimality guarantees can be obtained using tools from semi-algebraic geometry. 2.2 Application: Detecting Contextually Abnormal Events

  12. Retrieval of radiology reports citing critical findings with disease-specific customization.

    PubMed

    Lacson, Ronilda; Sugarbaker, Nathanael; Prevedello, Luciano M; Ivan, Ip; Mar, Wendy; Andriole, Katherine P; Khorasani, Ramin

    2012-01-01

    Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. THIS PAPER: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications - an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) - to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application's performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks.

  13. Retrieval of Radiology Reports Citing Critical Findings with Disease-Specific Customization

    PubMed Central

    Lacson, Ronilda; Sugarbaker, Nathanael; Prevedello, Luciano M; Ivan, IP; Mar, Wendy; Andriole, Katherine P; Khorasani, Ramin

    2012-01-01

    Background: Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. Purpose: This paper: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications – an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) – to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application’s performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Results: Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Conclusion: Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks. PMID:22934127

  14. Automated Modular Magnetic Resonance Imaging Clinical Decision Support System (MIROR): An Application in Pediatric Cancer Diagnosis

    PubMed Central

    Zarinabad, Niloufar; Meeus, Emma M; Manias, Karen; Foster, Katharine

    2018-01-01

    Background Advances in magnetic resonance imaging and the introduction of clinical decision support systems has underlined the need for an analysis tool to extract and analyze relevant information from magnetic resonance imaging data to aid decision making, prevent errors, and enhance health care. Objective The aim of this study was to design and develop a modular medical image region of interest analysis tool and repository (MIROR) for automatic processing, classification, evaluation, and representation of advanced magnetic resonance imaging data. Methods The clinical decision support system was developed and evaluated for diffusion-weighted imaging of body tumors in children (cohort of 48 children, with 37 malignant and 11 benign tumors). Mevislab software and Python have been used for the development of MIROR. Regions of interests were drawn around benign and malignant body tumors on different diffusion parametric maps, and extracted information was used to discriminate the malignant tumors from benign tumors. Results Using MIROR, the various histogram parameters derived for each tumor case when compared with the information in the repository provided additional information for tumor characterization and facilitated the discrimination between benign and malignant tumors. Clinical decision support system cross-validation showed high sensitivity and specificity in discriminating between these tumor groups using histogram parameters. Conclusions MIROR, as a diagnostic tool and repository, allowed the interpretation and analysis of magnetic resonance imaging images to be more accessible and comprehensive for clinicians. It aims to increase clinicians’ skillset by introducing newer techniques and up-to-date findings to their repertoire and make information from previous cases available to aid decision making. The modular-based format of the tool allows integration of analyses that are not readily available clinically and streamlines the future developments. PMID:29720361

  15. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    PubMed

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily adapted to novel NGS assays. Examples, tutorials, and extensive documentation can be found at https://plastid.readthedocs.io .

  16. Automated extraction of radiation dose information for CT examinations.

    PubMed

    Cook, Tessa S; Zimmerman, Stefan; Maidment, Andrew D A; Kim, Woojin; Boonn, William W

    2010-11-01

    Exposure to radiation as a result of medical imaging is currently in the spotlight, receiving attention from Congress as well as the lay press. Although scanner manufacturers are moving toward including effective dose information in the Digital Imaging and Communications in Medicine headers of imaging studies, there is a vast repository of retrospective CT data at every imaging center that stores dose information in an image-based dose sheet. As such, it is difficult for imaging centers to participate in the ACR's Dose Index Registry. The authors have designed an automated extraction system to query their PACS archive and parse CT examinations to extract the dose information stored in each dose sheet. First, an open-source optical character recognition program processes each dose sheet and converts the information to American Standard Code for Information Interchange (ASCII) text. Each text file is parsed, and radiation dose information is extracted and stored in a database which can be queried using an existing pathology and radiology enterprise search tool. Using this automated extraction pipeline, it is possible to perform dose analysis on the >800,000 CT examinations in the PACS archive and generate dose reports for all of these patients. It is also possible to more effectively educate technologists, radiologists, and referring physicians about exposure to radiation from CT by generating report cards for interpreted and performed studies. The automated extraction pipeline enables compliance with the ACR's reporting guidelines and greater awareness of radiation dose to patients, thus resulting in improved patient care and management. Copyright © 2010 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  17. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

    PubMed Central

    Roos, Marco; Marshall, M Scott; Gibson, Andrew P; Schuemie, Martijn; Meij, Edgar; Katrenko, Sophia; van Hage, Willem Robert; Krommydas, Konstantinos; Adriaans, Pieter W

    2009-01-01

    Background Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation. PMID:19796406

  18. Criteria for assessing the quality of mHealth apps: a systematic review.

    PubMed

    Nouri, Rasool; R Niakan Kalhori, Sharareh; Ghazisaeedi, Marjan; Marchand, Guillaume; Yasini, Mobin

    2018-05-16

    Review the existing studies including an assessment tool/method to assess the quality of mHealth apps; extract their criteria; and provide a classification of the collected criteria. In accordance with the PRISMA statement, a literature search was conducted in MEDLINE, EMBase, ISI and Scopus for English language citations published from January 1, 2008 to December 22, 2016 for studies including tools or methods for quality assessment of mHealth apps. Two researchers screened the titles and abstracts of all retrieved citations against the inclusion and exclusion criteria. The full text of relevant papers was then individually examined by the same researchers. A senior researcher resolved eventual disagreements and confirmed the relevance of all included papers. The authors, date of publication, subject fields of target mHealth apps, development method, and assessment criteria were extracted from each paper. The extracted assessment criteria were then reviewed, compared, and classified by an expert panel of two medical informatics specialists and two health information management specialists. Twenty-three papers were included in the review. Thirty-eight main classes of assessment criteria were identified. These were reorganized by expert panel into 7 main classes (Design, Information/Content, Usability, Functionality, Ethical Issues, Security and Privacy, and User-perceived value) with 37 sub-classes of criteria. There is a wide heterogeneity in assessment criteria for mHealth apps. It is necessary to define the exact meanings and degree of distinctness of each criterion. This will help to improve the existing tools and may lead to achieve a better comprehensive mHealth app assessment tool.

  19. Unconventional Tools for an Unconventional Resource: Community and Landscape Planning for Shale in the Marcellus Region

    NASA Astrophysics Data System (ADS)

    Murtha, T., Jr.; Orland, B.; Goldberg, L.; Hammond, R.

    2014-12-01

    Deep shale natural gas deposits made accessible by new technologies are quickly becoming a considerable share of North America's energy portfolio. Unlike traditional deposits and extraction footprints, shale gas offers dispersed and complex landscape and community challenges. These challenges are both cultural and environmental. This paper describes the development and application of creative geospatial tools as a means to engage communities along the northern tier counties of Pennsylvania, experiencing Marcellus shale drilling in design and planning. Uniquely combining physical landscape models with predictive models of exploration activities, including drilling, pipeline construction and road reconstruction, the tools quantify the potential impacts of drilling activities for communities and landscapes in the commonwealth of Pennsylvania. Dividing the state into 9836 watershed sub-basins, we first describe the current state of Marcellus related activities through 2014. We then describe and report the results of three scaled predictive models designed to investigate probable sub-basins where future activities will be focused. Finally, the core of the paper reports on the second level of tools we have now developed to engage communities in planning for unconventional gas extraction in Pennsylvania. Using a geodesign approach we are working with communities to transfer information for comprehensive landscape planning and informed decision making. These tools not only quantify physical landscape impacts, but also quantify potential visual, aesthetic and cultural resource implications.

  20. Fine-grained information extraction from German transthoracic echocardiography reports.

    PubMed

    Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank

    2015-11-12

    Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.

  1. NPIDB: Nucleic acid-Protein Interaction DataBase.

    PubMed

    Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

    2013-01-01

    The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.

  2. PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction.

    PubMed

    Krallinger, Martin; Rodriguez-Penagos, Carlos; Tendulkar, Ashish; Valencia, Alfonso

    2009-07-01

    There is an increasing interest in using literature mining techniques to complement information extracted from annotation databases or generated by bioinformatics applications. Here we present PLAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana. Our system facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development. Beyond single entities, also predefined pairs of entities can be provided as queries for which literature-derived relations together with textual evidences are returned. PLAN2L does not require registration and is freely accessible at http://zope.bioinfo.cnio.es/plan2l.

  3. MedXN: an open source medication extraction and normalization tool for clinical text

    PubMed Central

    Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang

    2014-01-01

    Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954

  4. Road and Roadside Feature Extraction Using Imagery and LIDAR Data for Transportation Operation

    NASA Astrophysics Data System (ADS)

    Ural, S.; Shan, J.; Romero, M. A.; Tarko, A.

    2015-03-01

    Transportation agencies require up-to-date, reliable, and feasibly acquired information on road geometry and features within proximity to the roads as input for evaluating and prioritizing new or improvement road projects. The information needed for a robust evaluation of road projects includes road centerline, width, and extent together with the average grade, cross-sections, and obstructions near the travelled way. Remote sensing is equipped with a large collection of data and well-established tools for acquiring the information and extracting aforementioned various road features at various levels and scopes. Even with many remote sensing data and methods available for road extraction, transportation operation requires more than the centerlines. Acquiring information that is spatially coherent at the operational level for the entire road system is challenging and needs multiple data sources to be integrated. In the presented study, we established a framework that used data from multiple sources, including one-foot resolution color infrared orthophotos, airborne LiDAR point clouds, and existing spatially non-accurate ancillary road networks. We were able to extract 90.25% of a total of 23.6 miles of road networks together with estimated road width, average grade along the road, and cross sections at specified intervals. Also, we have extracted buildings and vegetation within a predetermined proximity to the extracted road extent. 90.6% of 107 existing buildings were correctly identified with 31% false detection rate.

  5. Impact of translation on named-entity recognition in radiology texts

    PubMed Central

    Pedro, Vasco

    2017-01-01

    Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455

  6. Data warehousing as a tool for quality management in oncology.

    PubMed

    Hölzer, S; Tafazzoli, A G; Altmann, U; Wächter, W; Dudeck, J

    1999-01-01

    At present, physicians are constrained by their limited skills to integrate and understand the growing amount of electronic medical information. To handle, extract, integrate, analyse and take advantage of the gathered information regarding the quality of patient care, the concept of a data warehouse seems to be especially interesting in medicine. Medical data warehousing allows the physicians to take advantage of all the operational data they have been collecting over the years. Our purpose is to build a data warehouse in order to use all available information about cancer patients. We think that with the sensible use of this tool, there are economic benefits for the Society and an improvement of quality of medical care for patients.

  7. Characterizing rainfall in the Tenerife island

    NASA Astrophysics Data System (ADS)

    Díez-Sierra, Javier; del Jesus, Manuel; Losada Rodriguez, Inigo

    2017-04-01

    In many locations, rainfall data are collected through networks of meteorological stations. The data collection process is nowadays automated in many places, leading to the development of big databases of rainfall data covering extensive areas of territory. However, managers, decision makers and engineering consultants tend not to extract most of the information contained in these databases due to the lack of specific software tools for their exploitation. Here we present the modeling and development effort put in place in the Tenerife island in order to develop MENSEI-L, a software tool capable of automatically analyzing a complete rainfall database to simplify the extraction of information from observations. MENSEI-L makes use of weather type information derived from atmospheric conditions to separate the complete time series into homogeneous groups where statistical distributions are fitted. Normal and extreme regimes are obtained in this manner. MENSEI-L is also able to complete missing data in the time series and to generate synthetic stations by using Kriging techniques. These techniques also serve to generate the spatial regimes of precipitation, both normal and extreme ones. MENSEI-L makes use of weather type information to also provide a stochastic three-day probability forecast for rainfall.

  8. PRECOG: a tool for automated extraction and visualization of fitness components in microbial growth phenomics.

    PubMed

    Fernandez-Ricaud, Luciano; Kourtchenko, Olga; Zackrisson, Martin; Warringer, Jonas; Blomberg, Anders

    2016-06-23

    Phenomics is a field in functional genomics that records variation in organismal phenotypes in the genetic, epigenetic or environmental context at a massive scale. For microbes, the key phenotype is the growth in population size because it contains information that is directly linked to fitness. Due to technical innovations and extensive automation our capacity to record complex and dynamic microbial growth data is rapidly outpacing our capacity to dissect and visualize this data and extract the fitness components it contains, hampering progress in all fields of microbiology. To automate visualization, analysis and exploration of complex and highly resolved microbial growth data as well as standardized extraction of the fitness components it contains, we developed the software PRECOG (PREsentation and Characterization Of Growth-data). PRECOG allows the user to quality control, interact with and evaluate microbial growth data with ease, speed and accuracy, also in cases of non-standard growth dynamics. Quality indices filter high- from low-quality growth experiments, reducing false positives. The pre-processing filters in PRECOG are computationally inexpensive and yet functionally comparable to more complex neural network procedures. We provide examples where data calibration, project design and feature extraction methodologies have a clear impact on the estimated growth traits, emphasising the need for proper standardization in data analysis. PRECOG is a tool that streamlines growth data pre-processing, phenotypic trait extraction, visualization, distribution and the creation of vast and informative phenomics databases.

  9. SPECTRa-T: machine-based data extraction and semantic searching of chemistry e-theses.

    PubMed

    Downing, Jim; Harvey, Matt J; Morgan, Peter B; Murray-Rust, Peter; Rzepa, Henry S; Stewart, Diana C; Tonge, Alan P; Townsend, Joe A

    2010-02-22

    The SPECTRa-T project has developed text-mining tools to extract named chemical entities (NCEs), such as chemical names and terms, and chemical objects (COs), e.g., experimental spectral assignments and physical chemistry properties, from electronic theses (e-theses). Although NCEs were readily identified within the two major document formats studied, only the use of structured documents enabled identification of chemical objects and their association with the relevant chemical entity (e.g., systematic chemical name). A corpus of theses was analyzed and it is shown that a high degree of semantic information can be extracted from structured documents. This integrated information has been deposited in a persistent Resource Description Framework (RDF) triple-store that allows users to conduct semantic searches. The strength and weaknesses of several document formats are reviewed.

  10. GeoDeepDive: Towards a Machine Reading-Ready Digital Library and Information Integration Resource

    NASA Astrophysics Data System (ADS)

    Husson, J. M.; Peters, S. E.; Livny, M.; Ross, I.

    2015-12-01

    Recent developments in machine reading and learning approaches to text and data mining hold considerable promise for accelerating the pace and quality of literature-based data synthesis, but these advances have outpaced even basic levels of access to the published literature. For many geoscience domains, particularly those based on physical samples and field-based descriptions, this limitation is significant. Here we describe a general infrastructure to support published literature-based machine reading and learning approaches to information integration and knowledge base creation. This infrastructure supports rate-controlled automated fetching of original documents, along with full bibliographic citation metadata, from remote servers, the secure storage of original documents, and the utilization of considerable high-throughput computing resources for the pre-processing of these documents by optical character recognition, natural language parsing, and other document annotation and parsing software tools. New tools and versions of existing tools can be automatically deployed against original documents when they are made available. The products of these tools (text/XML files) are managed by MongoDB and are available for use in data extraction applications. Basic search and discovery functionality is provided by ElasticSearch, which is used to identify documents of potential relevance to a given data extraction task. Relevant files derived from the original documents are then combined into basic starting points for application building; these starting points are kept up-to-date as new relevant documents are incorporated into the digital library. Currently, our digital library stores contains more than 360K documents supplied by Elsevier and the USGS and we are actively seeking additional content providers. By focusing on building a dependable infrastructure to support the retrieval, storage, and pre-processing of published content, we are establishing a foundation for complex, and continually improving, information integration and data extraction applications. We have developed one such application, which we present as an example, and invite new collaborations to develop other such applications.

  11. SS-mPMG and SS-GA: tools for finding pathways and dynamic simulation of metabolic networks.

    PubMed

    Katsuragi, Tetsuo; Ono, Naoaki; Yasumoto, Keiichi; Altaf-Ul-Amin, Md; Hirai, Masami Y; Sriyudthsak, Kansuporn; Sawada, Yuji; Yamashita, Yui; Chiba, Yukako; Onouchi, Hitoshi; Fujiwara, Toru; Naito, Satoshi; Shiraishi, Fumihide; Kanaya, Shigehiko

    2013-05-01

    Metabolomics analysis tools can provide quantitative information on the concentration of metabolites in an organism. In this paper, we propose the minimum pathway model generator tool for simulating the dynamics of metabolite concentrations (SS-mPMG) and a tool for parameter estimation by genetic algorithm (SS-GA). SS-mPMG can extract a subsystem of the metabolic network from the genome-scale pathway maps to reduce the complexity of the simulation model and automatically construct a dynamic simulator to evaluate the experimentally observed behavior of metabolites. Using this tool, we show that stochastic simulation can reproduce experimentally observed dynamics of amino acid biosynthesis in Arabidopsis thaliana. In this simulation, SS-mPMG extracts the metabolic network subsystem from published databases. The parameters needed for the simulation are determined using a genetic algorithm to fit the simulation results to the experimental data. We expect that SS-mPMG and SS-GA will help researchers to create relevant metabolic networks and carry out simulations of metabolic reactions derived from metabolomics data.

  12. Construction of a database for published phase II/III drug intervention clinical trials for the period 2009-2014 comprising 2,326 records, 90 disease categories, and 939 drug entities.

    PubMed

    Jeong, Sohyun; Han, Nayoung; Choi, Boyoon; Sohn, Minji; Song, Yun-Kyoung; Chung, Myeon-Woo; Na, Han-Sung; Ji, Eunhee; Kim, Hyunah; Rhew, Ki Yon; Kim, Therasa; Kim, In-Wha; Oh, Jung Mi

    2016-06-01

    To construct a database of published clinical drug trials suitable for use 1) as a research tool in accessing clinical trial information and 2) in evidence-based decision-making by regulatory professionals, clinical research investigators, and medical practitioners. Comprehensive information obtained from a search of design elements and results of clinical trials in peer reviewed journals using PubMed (http://www.ncbi.nlm.ih.gov/pubmed). The methodology to develop a structured database was devised by a panel composed of experts in medical, pharmaceutical, information technology, and members of Ministry of Food and Drug Safety (MFDS) using a step by step approach. A double-sided system consisting of user mode and manager mode served as the framework for the database; elements of interest from each trial were entered via secure manager mode enabling the input information to be accessed in a user-friendly manner (user mode). Information regarding methodology used and results of drug treatment were extracted as detail elements of each data set and then inputted into the web-based database system. Comprehensive information comprising 2,326 clinical trial records, 90 disease states, and 939 drugs entities and concerning study objectives, background, methods used, results, and conclusion could be extracted from published information on phase II/III drug intervention clinical trials appearing in SCI journals within the last 10 years. The extracted data was successfully assembled into a clinical drug trial database with easy access suitable for use as a research tool. The clinically most important therapeutic categories, i.e., cancer, cardiovascular, respiratory, neurological, metabolic, urogenital, gastrointestinal, psychological, and infectious diseases were covered by the database. Names of test and control drugs, details on primary and secondary outcomes and indexed keywords could also be retrieved and built into the database. The construction used in the database enables the user to sort and download targeted information as a Microsoft Excel spreadsheet. Because of the comprehensive and standardized nature of the clinical drug trial database and its ease of access it should serve as valuable information repository and research tool for accessing clinical trial information and making evidence-based decisions by regulatory professionals, clinical research investigators, and medical practitioners.

  13. Quantum algorithms for topological and geometric analysis of data

    PubMed Central

    Lloyd, Seth; Garnerone, Silvano; Zanardi, Paolo

    2016-01-01

    Extracting useful information from large data sets can be a daunting task. Topological methods for analysing data sets provide a powerful technique for extracting such information. Persistent homology is a sophisticated tool for identifying topological features and for determining how such features persist as the data is viewed at different scales. Here we present quantum machine learning algorithms for calculating Betti numbers—the numbers of connected components, holes and voids—in persistent homology, and for finding eigenvectors and eigenvalues of the combinatorial Laplacian. The algorithms provide an exponential speed-up over the best currently known classical algorithms for topological data analysis. PMID:26806491

  14. TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.

    PubMed

    Fimereli, Danai; Detours, Vincent; Konopka, Tomasz

    2013-04-01

    High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.

  15. Search Analytics: Automated Learning, Analysis, and Search with Open Source

    NASA Astrophysics Data System (ADS)

    Hundman, K.; Mattmann, C. A.; Hyon, J.; Ramirez, P.

    2016-12-01

    The sheer volume of unstructured scientific data makes comprehensive human analysis impossible, resulting in missed opportunities to identify relationships, trends, gaps, and outliers. As the open source community continues to grow, tools like Apache Tika, Apache Solr, Stanford's DeepDive, and Data-Driven Documents (D3) can help address this challenge. With a focus on journal publications and conference abstracts often in the form of PDF and Microsoft Office documents, we've initiated an exploratory NASA Advanced Concepts project aiming to use the aforementioned open source text analytics tools to build a data-driven justification for the HyspIRI Decadal Survey mission. We call this capability Search Analytics, and it fuses and augments these open source tools to enable the automatic discovery and extraction of salient information. In the case of HyspIRI, a hyperspectral infrared imager mission, key findings resulted from the extractions and visualizations of relationships from thousands of unstructured scientific documents. The relationships include links between satellites (e.g. Landsat 8), domain-specific measurements (e.g. spectral coverage) and subjects (e.g. invasive species). Using the above open source tools, Search Analytics mined and characterized a corpus of information that would be infeasible for a human to process. More broadly, Search Analytics offers insights into various scientific and commercial applications enabled through missions and instrumentation with specific technical capabilities. For example, the following phrases were extracted in close proximity within a publication: "In this study, hyperspectral images…with high spatial resolution (1 m) were analyzed to detect cutleaf teasel in two areas. …Classification of cutleaf teasel reached a users accuracy of 82 to 84%." Without reading a single paper we can use Search Analytics to automatically identify that a 1 m spatial resolution provides a cutleaf teasel detection users accuracy of 82-84%, which could have tangible, direct downstream implications for crop protection. Automatically assimilating this information expedites and supplements human analysis, and, ultimately, Search Analytics and its foundation of open source tools will result in more efficient scientific investment and research.

  16. Automatic Extraction of JPF Options and Documentation

    NASA Technical Reports Server (NTRS)

    Luks, Wojciech; Tkachuk, Oksana; Buschnell, David

    2011-01-01

    Documenting existing Java PathFinder (JPF) projects or developing new extensions is a challenging task. JPF provides a platform for creating new extensions and relies on key-value properties for their configuration. Keeping track of all possible options and extension mechanisms in JPF can be difficult. This paper presents jpf-autodoc-options, a tool that automatically extracts JPF projects options and other documentation-related information, which can greatly help both JPF users and developers of JPF extensions.

  17. Development of an information retrieval tool for biomedical patents.

    PubMed

    Alves, Tiago; Rodrigues, Rúben; Costa, Hugo; Rocha, Miguel

    2018-06-01

    The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipeline's main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. An integrated, open-source set of tools for urban vulnerability monitoring from Earth observation data

    NASA Astrophysics Data System (ADS)

    De Vecchi, Daniele; Harb, Mostapha; Dell'Acqua, Fabio; Aurelio Galeazzo, Daniel

    2015-04-01

    Aim: The paper introduces an integrated set of open-source tools designed to process medium and high-resolution imagery with the aim to extract vulnerability indicators [1]. Problem: In the context of risk monitoring [2], a series of vulnerability proxies can be defined, such as the extension of a built-up area or buildings regularity [3]. Different open-source C and Python libraries are already available for image processing and geospatial information (e.g. OrfeoToolbox, OpenCV and GDAL). They include basic processing tools but not vulnerability-oriented workflows. Therefore, it is of significant importance to provide end-users with a set of tools capable to return information at a higher level. Solution: The proposed set of python algorithms is a combination of low-level image processing and geospatial information handling tools along with high-level workflows. In particular, two main products are released under the GPL license: source code, developers-oriented, and a QGIS plugin. These tools were produced within the SENSUM project framework (ended December 2014) where the main focus was on earthquake and landslide risk. Further development and maintenance is guaranteed by the decision to include them in the platform designed within the FP 7 RASOR project . Conclusion: With the lack of a unified software suite for vulnerability indicators extraction, the proposed solution can provide inputs for already available models like the Global Earthquake Model. The inclusion of the proposed set of algorithms within the RASOR platforms can guarantee support and enlarge the community of end-users. Keywords: Vulnerability monitoring, remote sensing, optical imagery, open-source software tools References [1] M. Harb, D. De Vecchi, F. Dell'Acqua, "Remote sensing-based vulnerability proxies in the EU FP7 project SENSUM", Symposium on earthquake and landslide risk in Central Asia and Caucasus: exploiting remote sensing and geo-spatial information management, 29-30th January 2014, Bishkek, Kyrgyz Republic. [2] UNISDR, "Living with Risk", Geneva, Switzerland, 2004. [3] P. Bisch, E. Carvalho, H. Degree, P. Fajfar, M. Fardis, P. Franchin, M. Kreslin, A. Pecker, "Eurocode 8: Seismic Design of Buildings", Lisbon, 2011. (SENSUM: www.sensum-project.eu, grant number: 312972 ) (RASOR: www.rasor-project.eu, grant number: 606888 )

  19. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  20. Informing Hospital Change Processes through Visualization and Simulation: A Case Study at a Children's Emergency Clinic.

    PubMed

    Persson, Johanna; Dalholm, Elisabeth Hornyánszky; Johansson, Gerd

    2014-01-01

    To demonstrate the use of visualization and simulation tools in order to involve stakeholders and inform the process in hospital change processes, illustrated by an empirical study from a children's emergency clinic. Reorganization and redevelopment of a hospital is a complex activity that involves many stakeholders and demands. Visualization and simulation tools have proven useful for involving practitioners and eliciting relevant knowledge. More knowledge is desired about how these tools can be implemented in practice for hospital planning processes. A participatory planning process including practitioners and researchers was executed over a 3-year period to evaluate a combination of visualization and simulation tools to involve stakeholders in the planning process and to elicit knowledge about needs and requirements. The initial clinic proposal from the architect was discarded as a result of the empirical study. Much general knowledge about the needs of the organization was extracted by means of the adopted tools. Some of the tools proved to be more accessible than others for the practitioners participating in the study. The combination of tools added value to the process by presenting information in alternative ways and eliciting questions from different angles. Visualization and simulation tools inform a planning process (or other types of change processes) by providing the means to see beyond present demands and current work structures. Long-term involvement in combination with accessible tools is central for creating a participatory setting where the practitioners' knowledge guides the process. © 2014 Vendome Group, LLC.

  1. Medical document anonymization with a semantic lexicon.

    PubMed Central

    Ruch, P.; Baud, R. H.; Rassinoux, A. M.; Bouillon, P.; Robert, G.

    2000-01-01

    We present an original system for locating and removing personally-identifying information in patient records. In this experiment, anonymization is seen as a particular case of knowledge extraction. We use natural language processing tools provided by the MEDTAG framework: a semantic lexicon specialized in medicine, and a toolkit for word-sense and morpho-syntactic tagging. The system finds 98-99% of all personally-identifying information. PMID:11079980

  2. HELP: XID+, the probabilistic de-blender for Herschel SPIRE maps

    NASA Astrophysics Data System (ADS)

    Hurley, P. D.; Oliver, S.; Betancourt, M.; Clarke, C.; Cowley, W. I.; Duivenvoorden, S.; Farrah, D.; Griffin, M.; Lacey, C.; Le Floc'h, E.; Papadopoulos, A.; Sargent, M.; Scudder, J. M.; Vaccari, M.; Valtchanov, I.; Wang, L.

    2017-01-01

    We have developed a new prior-based source extraction tool, XID+, to carry out photometry in the Herschel SPIRE (Spectral and Photometric Imaging Receiver) maps at the positions of known sources. XID+ is developed using a probabilistic Bayesian framework that provides a natural framework in which to include prior information, and uses the Bayesian inference tool Stan to obtain the full posterior probability distribution on flux estimates. In this paper, we discuss the details of XID+ and demonstrate the basic capabilities and performance by running it on simulated SPIRE maps resembling the COSMOS field, and comparing to the current prior-based source extraction tool DESPHOT. Not only we show that XID+ performs better on metrics such as flux accuracy and flux uncertainty accuracy, but we also illustrate how obtaining the posterior probability distribution can help overcome some of the issues inherent with maximum-likelihood-based source extraction routines. We run XID+ on the COSMOS SPIRE maps from Herschel Multi-Tiered Extragalactic Survey using a 24-μm catalogue as a positional prior, and a uniform flux prior ranging from 0.01 to 1000 mJy. We show the marginalized SPIRE colour-colour plot and marginalized contribution to the cosmic infrared background at the SPIRE wavelengths. XID+ is a core tool arising from the Herschel Extragalactic Legacy Project (HELP) and we discuss how additional work within HELP providing prior information on fluxes can and will be utilized. The software is available at https://github.com/H-E-L-P/XID_plus. We also provide the data product for COSMOS. We believe this is the first time that the full posterior probability of galaxy photometry has been provided as a data product.

  3. Aerobraking Maneuver (ABM) Report Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forrest; Gladden, Roy; Khanampornpan, Teerapat

    2008-01-01

    abmREPORT Version 3.1 is a Perl script that extracts vital summarization information from the Mars Reconnaissance Orbiter (MRO) aerobraking ABM build process. This information facilitates sequence reviews, and provides a high-level summarization of the sequence for mission management. The script extracts information from the ENV, SSF, FRF, SCMFmax, and OPTG files and burn magnitude configuration files and presents them in a single, easy-to-check report that provides the majority of the parameters necessary for cross check and verification during the sequence review process. This means that needed information, formerly spread across a number of different files and each in a different format, is all available in this one application. This program is built on the capabilities developed in dragReport and then the scripts evolved as the two tools continued to be developed in parallel.

  4. Energy Survey of Machine Tools: Separating Power Information of the Main Transmission System During Machining Process

    NASA Astrophysics Data System (ADS)

    Liu, Shuang; Liu, Fei; Hu, Shaohua; Yin, Zhenbiao

    The major power information of the main transmission system in machine tools (MTSMT) during machining process includes effective output power (i.e. cutting power), input power and power loss from the mechanical transmission system, and the main motor power loss. These information are easy to obtain in the lab but difficult to evaluate in a manufacturing process. To solve this problem, a separation method is proposed here to extract the MTSMT power information during machining process. In this method, the energy flow and the mathematical models of major power information of MTSMT during the machining process are set up first. Based on the mathematical models and the basic data tables obtained from experiments, the above mentioned power information during machining process can be separated just by measuring the real time total input power of the spindle motor. The operation program of this method is also given.

  5. Air Traffic Complexity Measurement Environment (ACME): Software User's Guide

    NASA Technical Reports Server (NTRS)

    1996-01-01

    A user's guide for the Air Traffic Complexity Measurement Environment (ACME) software is presented. The ACME consists of two major components, a complexity analysis tool and user interface. The Complexity Analysis Tool (CAT) analyzes complexity off-line, producing data files which may be examined interactively via the Complexity Data Analysis Tool (CDAT). The Complexity Analysis Tool is composed of three independently executing processes that communicate via PVM (Parallel Virtual Machine) and Unix sockets. The Runtime Data Management and Control process (RUNDMC) extracts flight plan and track information from a SAR input file, and sends the information to GARP (Generate Aircraft Routes Process) and CAT (Complexity Analysis Task). GARP in turn generates aircraft trajectories, which are utilized by CAT to calculate sector complexity. CAT writes flight plan, track and complexity data to an output file, which can be examined interactively. The Complexity Data Analysis Tool (CDAT) provides an interactive graphic environment for examining the complexity data produced by the Complexity Analysis Tool (CAT). CDAT can also play back track data extracted from System Analysis Recording (SAR) tapes. The CDAT user interface consists of a primary window, a controls window, and miscellaneous pop-ups. Aircraft track and position data is displayed in the main viewing area of the primary window. The controls window contains miscellaneous control and display items. Complexity data is displayed in pop-up windows. CDAT plays back sector complexity and aircraft track and position data as a function of time. Controls are provided to start and stop playback, adjust the playback rate, and reposition the display to a specified time.

  6. Applying different independent component analysis algorithms and support vector regression for IT chain store sales forecasting.

    PubMed

    Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.

  7. Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

    PubMed Central

    Dai, Wensheng

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740

  8. Mitigating Information Overload: The Impact of Context-Based Approach to the Design of Tools for Intelligence Analysts

    DTIC Science & Technology

    2008-03-01

    amount of arriving data, extract actionable information, and integrate it with prior knowledge. Add to that the pressures of today’s fusion center...information, and integrate it with prior knowledge. Add to that the pressures of today’s fusion center climate and it becomes clear that analysts, police... fusion centers, including specifics about how these problems manifest at the Illinois State Police (ISP) Statewide Terrorism and Intelligence Center

  9. FY10 Report on Multi-scale Simulation of Solvent Extraction Processes: Molecular-scale and Continuum-scale Studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wardle, Kent E.; Frey, Kurt; Pereira, Candido

    2014-02-02

    This task is aimed at predictive modeling of solvent extraction processes in typical extraction equipment through multiple simulation methods at various scales of resolution. We have conducted detailed continuum fluid dynamics simulation on the process unit level as well as simulations of the molecular-level physical interactions which govern extraction chemistry. Through combination of information gained through simulations at each of these two tiers along with advanced techniques such as the Lattice Boltzmann Method (LBM) which can bridge these two scales, we can develop the tools to work towards predictive simulation for solvent extraction on the equipment scale (Figure 1). Themore » goal of such a tool-along with enabling optimized design and operation of extraction units-would be to allow prediction of stage extraction effrciency under specified conditions. Simulation efforts on each of the two scales will be described below. As the initial application of FELBM in the work performed during FYl0 has been on annular mixing it will be discussed in context of the continuum-scale. In the future, however, it is anticipated that the real value of FELBM will be in its use as a tool for sub-grid model development through highly refined DNS-like multiphase simulations facilitating exploration and development of droplet models including breakup and coalescence which will be needed for the large-scale simulations where droplet level physics cannot be resolved. In this area, it can have a significant advantage over traditional CFD methods as its high computational efficiency allows exploration of significantly greater physical detail especially as computational resources increase in the future.« less

  10. Novel Tool for Complete Digitization of Paper Electrocardiography Data.

    PubMed

    Ravichandran, Lakshminarayan; Harless, Chris; Shah, Amit J; Wick, Carson A; Mcclellan, James H; Tridandapani, Srini

    We present a Matlab-based tool to convert electrocardiography (ECG) information from paper charts into digital ECG signals. The tool can be used for long-term retrospective studies of cardiac patients to study the evolving features with prognostic value. To perform the conversion, we: 1) detect the graphical grid on ECG charts using grayscale thresholding; 2) digitize the ECG signal based on its contour using a column-wise pixel scan; and 3) use template-based optical character recognition to extract patient demographic information from the paper ECG in order to interface the data with the patients' medical record. To validate the digitization technique: 1) correlation between the digital signals and signals digitized from paper ECG are performed and 2) clinically significant ECG parameters are measured and compared from both the paper-based ECG signals and the digitized ECG. The validation demonstrates a correlation value of 0.85-0.9 between the digital ECG signal and the signal digitized from the paper ECG. There is a high correlation in the clinical parameters between the ECG information from the paper charts and digitized signal, with intra-observer and inter-observer correlations of 0.8-0.9 (p < 0.05), and kappa statistics ranging from 0.85 (inter-observer) to 1.00 (intra-observer). The important features of the ECG signal, especially the QRST complex and the associated intervals, are preserved by obtaining the contour from the paper ECG. The differences between the measures of clinically important features extracted from the original signal and the reconstructed signal are insignificant, thus highlighting the accuracy of this technique. Using this type of ECG digitization tool to carry out retrospective studies on large databases, which rely on paper ECG records, studies of emerging ECG features can be performed. In addition, this tool can be used to potentially integrate digitized ECG information with digital ECG analysis programs and with the patient's electronic medical record.

  11. Induced lexico-syntactic patterns improve information extraction from online medical forums.

    PubMed

    Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D

    2014-01-01

    To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  12. Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

    PubMed Central

    Marcum, Christopher Steven; Butts, Carter T.

    2015-01-01

    The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488

  13. The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis.

    PubMed

    Van Doorslaer, Koenraad; Tan, Qina; Xirasagar, Sandhya; Bandaru, Sandya; Gopalan, Vivek; Mohamoud, Yasmin; Huyen, Yentram; McBride, Alison A

    2013-01-01

    The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.

  14. SIDECACHE: Information access, management and dissemination framework for web services.

    PubMed

    Doderer, Mark S; Burkhardt, Cory; Robbins, Kay A

    2011-06-14

    Many bioinformatics algorithms and data sets are deployed using web services so that the results can be explored via the Internet and easily integrated into other tools and services. These services often include data from other sites that is accessed either dynamically or through file downloads. Developers of these services face several problems because of the dynamic nature of the information from the upstream services. Many publicly available repositories of bioinformatics data frequently update their information. When such an update occurs, the developers of the downstream service may also need to update. For file downloads, this process is typically performed manually followed by web service restart. Requests for information obtained by dynamic access of upstream sources is sometimes subject to rate restrictions. SideCache provides a framework for deploying web services that integrate information extracted from other databases and from web sources that are periodically updated. This situation occurs frequently in biotechnology where new information is being continuously generated and the latest information is important. SideCache provides several types of services including proxy access and rate control, local caching, and automatic web service updating. We have used the SideCache framework to automate the deployment and updating of a number of bioinformatics web services and tools that extract information from remote primary sources such as NCBI, NCIBI, and Ensembl. The SideCache framework also has been used to share research results through the use of a SideCache derived web service.

  15. PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

    PubMed

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/

  16. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature

    PubMed Central

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628

  17. Irrigation network extraction methodology from LiDAR DTM using Whitebox and ArcGIS

    NASA Astrophysics Data System (ADS)

    Mahor, M. A. P.; De La Cruz, R. M.; Olfindo, N. T.; Perez, A. M. C.

    2016-10-01

    Irrigation networks are important in distributing water resources to areas where rainfall is not enough to sustain agriculture. They are also crucial when it comes to being able to redirect vast amounts of water to decrease the risks of flooding in flat areas, especially near sources of water. With the lack of studies about irrigation feature extraction, which range from wide canals to small ditches, this study aims to present a method of extracting these features from LiDAR-derived digital terrain models (DTMs) using Geographic Information Systems (GIS) tools such as ArcGIS and Whitebox Geospatial Analysis Tools (Whitebox GAT). High-resolution LiDAR DTMs with 1-meter horizontal and 0.25-meter vertical accuracies were processed to generate the gully depth map. This map was then reclassified, converted to vector, and filtered according to segment length, and sinuosity to be able to isolate these irrigation features. Initial results in the test area show that the extraction completeness is greater than 80% when compared with data obtained from the National Irrigation Administration (NIA).

  18. WHATIF: an open-source desktop application for extraction and management of the incidental findings from next-generation sequencing variant data

    PubMed Central

    Ye, Zhan; Kadolph, Christopher; Strenn, Robert; Wall, Daniel; McPherson, Elizabeth; Lin, Simon

    2015-01-01

    Background Identification and evaluation of incidental findings in patients following whole exome (WGS) or whole genome sequencing (WGS) is challenging for both practicing physicians and researchers. The American College of Medical Genetics and Genomics (ACMG) recently recommended a list of reportable incidental genetic findings. However, no informatics tools are currently available to support evaluation of incidental findings in next-generation sequencing data. Methods The Wisconsin Hierarchical Analysis Tool for Incidental Findings (WHATIF), was developed as a stand-alone Windows-based desktop executable, to support the interactive analysis of incidental findings in the context of the ACMG recommendations. WHATIF integrates the European Bioinformatics Institute Variant Effect Predictor (VEP) tool for biological interpretation and the National Center for Biotechnology Information ClinVar tool for clinical interpretation. Results An open-source desktop program was created to annotate incidental findings and present the results with a user-friendly interface. Further, a meaningful index (WHATIF Index) was devised for each gene to facilitate ranking of the relative importance of the variants and estimate the potential workload associated with further evaluation of the variants. Our WHATIF application is available at: http://tinyurl.com/WHATIF-SOFTWARE Conclusions The WHATIF application offers a user-friendly interface and allows users to investigate the extracted variant information efficiently and intuitively while always accessing the up to date information on variants via application programming interfaces (API) connections. WHATIF’s highly flexible design and straightforward implementation aids users in customizing the source code to meet their own special needs. PMID:25890833

  19. Automated Fluid Feature Extraction from Transient Simulations

    NASA Technical Reports Server (NTRS)

    Haimes, Robert; Lovely, David

    1999-01-01

    In the past, feature extraction and identification were interesting concepts, but not required to understand the underlying physics of a steady flow field. This is because the results of the more traditional tools like iso-surfaces, cuts and streamlines were more interactive and easily abstracted so they could be represented to the investigator. These tools worked and properly conveyed the collected information at the expense of much interaction. For unsteady flow-fields, the investigator does not have the luxury of spending time scanning only one "snap-shot" of the simulation. Automated assistance is required in pointing out areas of potential interest contained within the flow. This must not require a heavy compute burden (the visualization should not significantly slow down the solution procedure for co-processing environments like pV3). And methods must be developed to abstract the feature and display it in a manner that physically makes sense. The following is a list of the important physical phenomena found in transient (and steady-state) fluid flow: (1) Shocks, (2) Vortex cores, (3) Regions of recirculation, (4) Boundary layers, (5) Wakes. Three papers and an initial specification for the (The Fluid eXtraction tool kit) FX Programmer's guide were included. The papers, submitted to the AIAA Computational Fluid Dynamics Conference, are entitled : (1) Using Residence Time for the Extraction of Recirculation Regions, (2) Shock Detection from Computational Fluid Dynamics results and (3) On the Velocity Gradient Tensor and Fluid Feature Extraction.

  20. A sentence sliding window approach to extract protein annotations from biomedical articles

    PubMed Central

    Krallinger, Martin; Padron, Maria; Valencia, Alfonso

    2005-01-01

    Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831

  1. Automated Modular Magnetic Resonance Imaging Clinical Decision Support System (MIROR): An Application in Pediatric Cancer Diagnosis.

    PubMed

    Zarinabad, Niloufar; Meeus, Emma M; Manias, Karen; Foster, Katharine; Peet, Andrew

    2018-05-02

    Advances in magnetic resonance imaging and the introduction of clinical decision support systems has underlined the need for an analysis tool to extract and analyze relevant information from magnetic resonance imaging data to aid decision making, prevent errors, and enhance health care. The aim of this study was to design and develop a modular medical image region of interest analysis tool and repository (MIROR) for automatic processing, classification, evaluation, and representation of advanced magnetic resonance imaging data. The clinical decision support system was developed and evaluated for diffusion-weighted imaging of body tumors in children (cohort of 48 children, with 37 malignant and 11 benign tumors). Mevislab software and Python have been used for the development of MIROR. Regions of interests were drawn around benign and malignant body tumors on different diffusion parametric maps, and extracted information was used to discriminate the malignant tumors from benign tumors. Using MIROR, the various histogram parameters derived for each tumor case when compared with the information in the repository provided additional information for tumor characterization and facilitated the discrimination between benign and malignant tumors. Clinical decision support system cross-validation showed high sensitivity and specificity in discriminating between these tumor groups using histogram parameters. MIROR, as a diagnostic tool and repository, allowed the interpretation and analysis of magnetic resonance imaging images to be more accessible and comprehensive for clinicians. It aims to increase clinicians' skillset by introducing newer techniques and up-to-date findings to their repertoire and make information from previous cases available to aid decision making. The modular-based format of the tool allows integration of analyses that are not readily available clinically and streamlines the future developments. ©Niloufar Zarinabad, Emma M Meeus, Karen Manias, Katharine Foster, Andrew Peet. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 02.05.2018.

  2. Using GIS in ecological management: green assessment of the impacts of petroleum activities in the state of Texas.

    PubMed

    Merem, Edmund; Robinson, Bennetta; Wesley, Joan M; Yerramilli, Sudha; Twumasi, Yaw A

    2010-05-01

    Geo-information technologies are valuable tools for ecological assessment in stressed environments. Visualizing natural features prone to disasters from the oil sector spatially not only helps in focusing the scope of environmental management with records of changes in affected areas, but it also furnishes information on the pace at which resource extraction affects nature. Notwithstanding the recourse to ecosystem protection, geo-spatial analysis of the impacts remains sketchy. This paper uses GIS and descriptive statistics to assess the ecological impacts of petroleum extraction activities in Texas. While the focus ranges from issues to mitigation strategies, the results point to growth in indicators of ecosystem decline.

  3. Using GIS in Ecological Management: Green Assessment of the Impacts of Petroleum Activities in the State of Texas

    PubMed Central

    Merem, Edmund; Robinson, Bennetta; Wesley, Joan M.; Yerramilli, Sudha; Twumasi, Yaw A.

    2010-01-01

    Geo-information technologies are valuable tools for ecological assessment in stressed environments. Visualizing natural features prone to disasters from the oil sector spatially not only helps in focusing the scope of environmental management with records of changes in affected areas, but it also furnishes information on the pace at which resource extraction affects nature. Notwithstanding the recourse to ecosystem protection, geo-spatial analysis of the impacts remains sketchy. This paper uses GIS and descriptive statistics to assess the ecological impacts of petroleum extraction activities in Texas. While the focus ranges from issues to mitigation strategies, the results point to growth in indicators of ecosystem decline. PMID:20623014

  4. [A new tool for retrieving clinical data from various sources].

    PubMed

    Nielsen, Erik Waage; Hovland, Anders; Strømsnes, Oddgeir

    2006-02-23

    A doctor's tool for extracting clinical data from various sources on groups of hospital patients into one file has been in demand. For this purpose we evaluated Qlikview. Based on clinical information required by two cardiologists, an IT specialist with thorough knowledge of the hospital's data system (www.dips.no) used 30 days to assemble one Qlikview file. Data was also assembled from a pre-hospital ambulance system. The 13 Mb Qlikview file held various information on 12430 patients admitted to the cardiac unit 26,287 times over the last 21 years. Included were also 530,912 clinical laboratory analyses from these patients during the past five years. Some information required by the cardiologists was inaccessible due to lack of coding or data storage. Some databases could not export their data. Others were encrypted by the software company. A major part of the required data could be extracted to Qlikview. Searches went fast in spite of the huge amount of data. Qlikview could assemble clinical information to doctors from different data systems. Doctors from different hospitals could share and further refine empty Qlikview files for their own use. When the file is assembled, doctors can, on their own, search for answers to constantly changing clinical questions, also at odd hours.

  5. Mapping care processes within a hospital: from theory to a web-based proposal merging enterprise modelling and ISO normative principles.

    PubMed

    Staccini, Pascal; Joubert, Michel; Quaranta, Jean-François; Fieschi, Marius

    2005-03-01

    Today, the economic and regulatory environment, involving activity-based and prospective payment systems, healthcare quality and risk analysis, traceability of the acts performed and evaluation of care practices, accounts for the current interest in clinical and hospital information systems. The structured gathering of information relative to users' needs and system requirements is fundamental when installing such systems. This stage takes time and is generally misconstrued by caregivers and is of limited efficacy to analysts. We used a modelling technique designed for manufacturing processes (IDEF0/SADT). We enhanced the basic model of an activity with descriptors extracted from the Ishikawa cause-and-effect diagram (methods, men, materials, machines, and environment). We proposed an object data model of a process and its components, and programmed a web-based tool in an object-oriented environment. This tool makes it possible to extract the data dictionary of a given process from the description of its elements and to locate documents (procedures, recommendations, instructions) according to each activity or role. Aimed at structuring needs and storing information provided by directly involved teams regarding the workings of an institution (or at least part of it), the process-mapping approach has an important contribution to make in the analysis of clinical information systems.

  6. BioRAT: extracting biological information from full-length papers.

    PubMed

    Corney, David P A; Buxton, Bernard F; Langdon, William B; Jones, David T

    2004-11-22

    Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.

  7. Text mining and its potential applications in systems biology.

    PubMed

    Ananiadou, Sophia; Kell, Douglas B; Tsujii, Jun-ichi

    2006-12-01

    With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.

  8. Automated extraction of chemical structure information from digital raster images

    PubMed Central

    Park, Jungkap; Rosania, Gus R; Shedden, Kerby A; Nguyen, Mandee; Lyu, Naesung; Saitou, Kazuhiro

    2009-01-01

    Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles. PMID:19196483

  9. Topological properties of flat electroencephalography's state space

    NASA Astrophysics Data System (ADS)

    Ken, Tan Lit; Ahmad, Tahir bin; Mohd, Mohd Sham bin; Ngien, Su Kong; Suwa, Tohru; Meng, Ong Sie

    2016-02-01

    Neuroinverse problem are often associated with complex neuronal activity. It involves locating problematic cell which is highly challenging. While epileptic foci localization is possible with the aid of EEG signals, it relies greatly on the ability to extract hidden information or pattern within EEG signals. Flat EEG being an enhancement of EEG is a way of viewing electroencephalograph on the real plane. In the perspective of dynamical systems, Flat EEG is equivalent to epileptic seizure hence, making it a great platform to study epileptic seizure. Throughout the years, various mathematical tools have been applied on Flat EEG to extract hidden information that is hardly noticeable by traditional visual inspection. While these tools have given worthy results, the journey towards understanding seizure process completely is yet to be succeeded. Since the underlying structure of Flat EEG is dynamic and is deemed to contain wealthy information regarding brainstorm, it would certainly be appealing to explore in depth its structures. To better understand the complex seizure process, this paper studies the event of epileptic seizure via Flat EEG in a more general framework by means of topology, particularly, on the state space where the event of Flat EEG lies.

  10. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement.

    PubMed

    Horvath, Monica M; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-04-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction--the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a Guided Query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. Copyright © 2010 Elsevier Inc. All rights reserved.

  11. Advanced image collection, information extraction, and change detection in support of NN-20 broad area search and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Petrie, G.M.; Perry, E.M.; Kirkham, R.R.

    1997-09-01

    This report describes the work performed at the Pacific Northwest National Laboratory (PNNL) for the U.S. Department of Energy`s Office of Nonproliferation and National Security, Office of Research and Development (NN-20). The work supports the NN-20 Broad Area Search and Analysis, a program initiated by NN-20 to improve the detection and classification of undeclared weapons facilities. Ongoing PNNL research activities are described in three main components: image collection, information processing, and change analysis. The Multispectral Airborne Imaging System, which was developed to collect georeferenced imagery in the visible through infrared regions of the spectrum, and flown on a light aircraftmore » platform, will supply current land use conditions. The image information extraction software (dynamic clustering and end-member extraction) uses imagery, like the multispectral data collected by the PNNL multispectral system, to efficiently generate landcover information. The advanced change detection uses a priori (benchmark) information, current landcover conditions, and user-supplied rules to rank suspect areas by probable risk of undeclared facilities or proliferation activities. These components, both separately and combined, provide important tools for improving the detection of undeclared facilities.« less

  12. Associating Human-Centered Concepts with Social Networks Using Fuzzy Sets

    NASA Astrophysics Data System (ADS)

    Yager, Ronald R.

    The rapidly growing global interconnectivity, brought about to a large extent by the Internet, has dramatically increased the importance and diversity of social networks. Modern social networks cut across a spectrum from benign recreational focused websites such as Facebook to occupationally oriented websites such as LinkedIn to criminally focused groups such as drug cartels to devastation and terror focused groups such as Al-Qaeda. Many organizations are interested in analyzing and extracting information related to these social networks. Among these are governmental police and security agencies as well marketing and sales organizations. To aid these organizations there is a need for technologies to model social networks and intelligently extract information from these models. While established technologies exist for the modeling of relational networks [1-7] few technologies exist to extract information from these, compatible with human perception and understanding. Data bases is an example of a technology in which we have tools for representing our information as well as tools for querying and extracting the information contained. Our goal is in some sense analogous. We want to use the relational network model to represent information, in this case about relationships and interconnections, and then be able to query the social network using intelligent human-centered concepts. To extend our capabilities to interact with social relational networks we need to associate with these network human concepts and ideas. Since human beings predominantly use linguistic terms in which to reason and understand we need to build bridges between human conceptualization and the formal mathematical representation of the social network. Consider for example a concept such as "leader". An analyst may be able to express, in linguistic terms, using a network relevant vocabulary, properties of a leader. Our task is to translate this linguistic description into a mathematical formalism that allows us to determine how true it is that a particular node is a leader. In this work we look at the use of fuzzy set methodologies [8-10] to provide a bridge between the human analyst and the formal model of the network.

  13. An automatic rat brain extraction method based on a deformable surface model.

    PubMed

    Li, Jiehua; Liu, Xiaofeng; Zhuo, Jiachen; Gullapalli, Rao P; Zara, Jason M

    2013-08-15

    The extraction of the brain from the skull in medical images is a necessary first step before image registration or segmentation. While pre-clinical MR imaging studies on small animals, such as rats, are increasing, fully automatic imaging processing techniques specific to small animal studies remain lacking. In this paper, we present an automatic rat brain extraction method, the Rat Brain Deformable model method (RBD), which adapts the popular human brain extraction tool (BET) through the incorporation of information on the brain geometry and MR image characteristics of the rat brain. The robustness of the method was demonstrated on T2-weighted MR images of 64 rats and compared with other brain extraction methods (BET, PCNN, PCNN-3D). The results demonstrate that RBD reliably extracts the rat brain with high accuracy (>92% volume overlap) and is robust against signal inhomogeneity in the images. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Operation Reliability Assessment for Cutting Tools by Applying a Proportional Covariate Model to Condition Monitoring Information

    PubMed Central

    Cai, Gaigai; Chen, Xuefeng; Li, Bing; Chen, Baojia; He, Zhengjia

    2012-01-01

    The reliability of cutting tools is critical to machining precision and production efficiency. The conventional statistic-based reliability assessment method aims at providing a general and overall estimation of reliability for a large population of identical units under given and fixed conditions. However, it has limited effectiveness in depicting the operational characteristics of a cutting tool. To overcome this limitation, this paper proposes an approach to assess the operation reliability of cutting tools. A proportional covariate model is introduced to construct the relationship between operation reliability and condition monitoring information. The wavelet packet transform and an improved distance evaluation technique are used to extract sensitive features from vibration signals, and a covariate function is constructed based on the proportional covariate model. Ultimately, the failure rate function of the cutting tool being assessed is calculated using the baseline covariate function obtained from a small sample of historical data. Experimental results and a comparative study show that the proposed method is effective for assessing the operation reliability of cutting tools. PMID:23201980

  15. On-line object feature extraction for multispectral scene representation

    NASA Technical Reports Server (NTRS)

    Ghassemian, Hassan; Landgrebe, David

    1988-01-01

    A new on-line unsupervised object-feature extraction method is presented that reduces the complexity and costs associated with the analysis of the multispectral image data and data transmission, storage, archival and distribution. The ambiguity in the object detection process can be reduced if the spatial dependencies, which exist among the adjacent pixels, are intelligently incorporated into the decision making process. The unity relation was defined that must exist among the pixels of an object. Automatic Multispectral Image Compaction Algorithm (AMICA) uses the within object pixel-feature gradient vector as a valuable contextual information to construct the object's features, which preserve the class separability information within the data. For on-line object extraction the path-hypothesis and the basic mathematical tools for its realization are introduced in terms of a specific similarity measure and adjacency relation. AMICA is applied to several sets of real image data, and the performance and reliability of features is evaluated.

  16. A graph algebra for scalable visual analytics.

    PubMed

    Shaverdian, Anna A; Zhou, Hao; Michailidis, George; Jagadish, Hosagrahar V

    2012-01-01

    Visual analytics (VA), which combines analytical techniques with advanced visualization features, is fast becoming a standard tool for extracting information from graph data. Researchers have developed many tools for this purpose, suggesting a need for formal methods to guide these tools' creation. Increased data demands on computing requires redesigning VA tools to consider performance and reliability in the context of analysis of exascale datasets. Furthermore, visual analysts need a way to document their analyses for reuse and results justification. A VA graph framework encapsulated in a graph algebra helps address these needs. Its atomic operators include selection and aggregation. The framework employs a visual operator and supports dynamic attributes of data to enable scalable visual exploration of data.

  17. Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials

    PubMed Central

    Federer, Callie; Yoo, Minjae

    2016-01-01

    Abstract Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov (https://clinicaltrials.gov/), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov. Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs. PMID:27631620

  18. Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials.

    PubMed

    Federer, Callie; Yoo, Minjae; Tan, Aik Choon

    2016-12-01

    Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov ( https://clinicaltrials.gov/ ), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov . Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs.

  19. Visualization of JPEG Metadata

    NASA Astrophysics Data System (ADS)

    Malik Mohamad, Kamaruddin; Deris, Mustafa Mat

    There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.

  20. Research Costs Investigated: A Study Into the Budgets of Dutch Publicly Funded Drug-Related Research.

    PubMed

    van Asselt, Thea; Ramaekers, Bram; Corro Ramos, Isaac; Joore, Manuela; Al, Maiwenn; Lesman-Leegte, Ivonne; Postma, Maarten; Vemer, Pepijn; Feenstra, Talitha

    2018-01-01

    The costs of performing research are an important input in value of information (VOI) analyses but are difficult to assess. The aim of this study was to investigate the costs of research, serving two purposes: (1) estimating research costs for use in VOI analyses; and (2) developing a costing tool to support reviewers of grant proposals in assessing whether the proposed budget is realistic. For granted study proposals from the Netherlands Organization for Health Research and Development (ZonMw), type of study, potential cost drivers, proposed budget, and general characteristics were extracted. Regression analysis was conducted in an attempt to generate a 'predicted budget' for certain combinations of cost drivers, for implementation in the costing tool. Of 133 drug-related research grant proposals, 74 were included for complete data extraction. Because an association between cost drivers and budgets was not confirmed, we could not generate a predicted budget based on regression analysis, but only historic reference budgets given certain study characteristics. The costing tool was designed accordingly, i.e. with given selection criteria the tool returns the range of budgets in comparable studies. This range can be used in VOI analysis to estimate whether the expected net benefit of sampling will be positive to decide upon the net value of future research. The absence of association between study characteristics and budgets may indicate inconsistencies in the budgeting or granting process. Nonetheless, the tool generates useful information on historical budgets, and the option to formally relate VOI to budgets. To our knowledge, this is the first attempt at creating such a tool, which can be complemented with new studies being granted, enlarging the underlying database and keeping estimates up to date.

  1. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

    PubMed

    Scheuch, Matthias; Höper, Dirk; Beer, Martin

    2015-03-03

    Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

  2. Connecting Architecture and Implementation

    NASA Astrophysics Data System (ADS)

    Buchgeher, Georg; Weinreich, Rainer

    Software architectures are still typically defined and described independently from implementation. To avoid architectural erosion and drift, architectural representation needs to be continuously updated and synchronized with system implementation. Existing approaches for architecture representation like informal architecture documentation, UML diagrams, and Architecture Description Languages (ADLs) provide only limited support for connecting architecture descriptions and implementations. Architecture management tools like Lattix, SonarJ, and Sotoarc and UML-tools tackle this problem by extracting architecture information directly from code. This approach works for low-level architectural abstractions like classes and interfaces in object-oriented systems but fails to support architectural abstractions not found in programming languages. In this paper we present an approach for linking and continuously synchronizing a formalized architecture representation to an implementation. The approach is a synthesis of functionality provided by code-centric architecture management and UML tools and higher-level architecture analysis approaches like ADLs.

  3. VisualUrText: A Text Analytics Tool for Unstructured Textual Data

    NASA Astrophysics Data System (ADS)

    Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

    2018-05-01

    The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.

  4. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media

    PubMed Central

    Cameron, Delroy; Smith, Gary A.; Daniulaityte, Raminta; Sheth, Amit P.; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z.; Falck, Russel

    2013-01-01

    Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel Semantic Web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO) (pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC). A combination of lexical, pattern-based and semantics-based techniques is used together with the domain knowledge to extract fine-grained semantic information from UGC. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, routes of administration, etc. The DAO is also used to help recognize three types of data, namely: 1) entities, 2) relationships and 3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information from UGC, and querying, search, trend analysis and overall content analysis of social media related to prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. PMID:23892295

  5. Developing a Satellite Based Automatic System for Crop Monitoring: Kenya's Great Rift Valley, A Case Study

    NASA Astrophysics Data System (ADS)

    Lucciani, Roberto; Laneve, Giovanni; Jahjah, Munzer; Mito, Collins

    2016-08-01

    The crop growth stage represents essential information for agricultural areas management. In this study we investigate the feasibility of a tool based on remotely sensed satellite (Landsat 8) imagery, capable of automatically classify crop fields and how much resolution enhancement based on pan-sharpening techniques and phenological information extraction, useful to create decision rules that allow to identify semantic class to assign to an object, can effectively support the classification process. Moreover we investigate the opportunity to extract vegetation health status information from remotely sensed assessment of the equivalent water thickness (EWT). Our case study is the Kenya's Great Rift valley, in this area a ground truth campaign was conducted during August 2015 in order to collect crop fields GPS measurements, leaf area index (LAI) and chlorophyll samples.

  6. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes

    PubMed Central

    Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D.; Day, Michele E.; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan

    2016-01-01

    Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort. PMID:28269947

  7. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

    PubMed

    Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D; Day, Michele E; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan

    2016-01-01

    Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.

  8. Extraction of Urban Trees from Integrated Airborne Based Digital Image and LIDAR Point Cloud Datasets - Initial Results

    NASA Astrophysics Data System (ADS)

    Dogon-yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-10-01

    Timely and accurate acquisition of information on the condition and structural changes of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting tree features include; ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraint, such as labour intensive field work, a lot of financial requirement, influences by weather condition and topographical covers which can be overcome by means of integrated airborne based LiDAR and very high resolution digital image datasets. This study presented a semi-automated approach for extracting urban trees from integrated airborne based LIDAR and multispectral digital image datasets over Istanbul city of Turkey. The above scheme includes detection and extraction of shadow free vegetation features based on spectral properties of digital images using shadow index and NDVI techniques and automated extraction of 3D information about vegetation features from the integrated processing of shadow free vegetation image and LiDAR point cloud datasets. The ability of the developed algorithms shows a promising result as an automated and cost effective approach to estimating and delineated 3D information of urban trees. The research also proved that integrated datasets is a suitable technology and a viable source of information for city managers to be used in urban trees management.

  9. Integrated Functional and Executional Modelling of Software Using Web-Based Databases

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Marietta, Roberta

    1998-01-01

    NASA's software subsystems undergo extensive modification and updates over the operational lifetimes. It is imperative that modified software should satisfy safety goals. This report discusses the difficulties encountered in doing so and discusses a solution based on integrated modelling of software, use of automatic information extraction tools, web technology and databases.

  10. An Electronic Engineering Curriculum Design Based on Concept-Mapping Techniques

    ERIC Educational Resources Information Center

    Toral, S. L.; Martinez-Torres, M. R.; Barrero, F.; Gallardo, S.; Duran, M. J.

    2007-01-01

    Curriculum design is a concern in European Universities as they face the forthcoming European Higher Education Area (EHEA). This process can be eased by the use of scientific tools such as Concept-Mapping Techniques (CMT) that extract and organize the most relevant information from experts' experience using statistics techniques, and helps a…

  11. Modeling the relationship between extractable chlorophyll and SPAD-502 readings for endangered plant species research

    Treesearch

    Tracy S. Hawkins; Emile S. Gardiner; Greg S. Comer

    2009-01-01

    Handheld chlorophyll meters have proven to be useful tools for rapid, nondestructive assessment of chlorophyll and nutrient status in various agricultural and arborescent plant species. We proposed that a SPAD-502 chlorophyll meter would provide valuable information when monitoring life cycle changes and intraspecific variation in...

  12. Interactive Digital Signal Processor

    NASA Technical Reports Server (NTRS)

    Mish, W. H.

    1985-01-01

    Interactive Digital Signal Processor, IDSP, consists of set of time series analysis "operators" based on various algorithms commonly used for digital signal analysis. Processing of digital signal time series to extract information usually achieved by applications of number of fairly standard operations. IDSP excellent teaching tool for demonstrating application for time series operators to artificially generated signals.

  13. Nontronite mineral identification in nilgiri hills of tamil nadu using hyperspectral remote sensing

    NASA Astrophysics Data System (ADS)

    Vigneshkumar, M.; Yarakkula, Kiran

    2017-11-01

    Hyperspectral Remote sensing is a tool to identify the minerals along with field investigation. Tamil Nadu has abundant minerals like 30% titanium, 52% molybdenum, 59% garnet, 69% dunite, 75% vermiculite and 81% lignite. To enhance the user and industry requirements, mineral extraction is required. To identify the minerals properly, sophisticated tools are required. Hyperspectral remote sensing provides continuous extraction of earth surface information in an accurate manner. Nontronite is an iron-rich mineral mainly available in Nilgiri hills, Tamil Nadu, India. Due to the large number of bands, hyperspectral data require various preprocessing steps such as bad bands removal, destriping, radiance conversion and atmospheric correction. The atmospheric correction is performed using FLAASH method. The spectral data reduction is carried out with minimum noise fraction (MNF) method. The spatial information is reduced using pixel purity index (PPI) with 10000 iterations. The selected end members are compared with spectral libraries like USGS, JPL, and JHU. In the Nontronite mineral gives the probability of 0.85. Finally the classification is accomplished using spectral angle mapper (SAM) method.

  14. Measurement Tools for the Immersive Visualization Environment: Steps Toward the Virtual Laboratory.

    PubMed

    Hagedorn, John G; Dunkers, Joy P; Satterfield, Steven G; Peskin, Adele P; Kelso, John T; Terrill, Judith E

    2007-01-01

    This paper describes a set of tools for performing measurements of objects in a virtual reality based immersive visualization environment. These tools enable the use of the immersive environment as an instrument for extracting quantitative information from data representations that hitherto had be used solely for qualitative examination. We provide, within the virtual environment, ways for the user to analyze and interact with the quantitative data generated. We describe results generated by these methods to obtain dimensional descriptors of tissue engineered medical products. We regard this toolbox as our first step in the implementation of a virtual measurement laboratory within an immersive visualization environment.

  15. GDRMS: a system for automatic extraction of the disease-centre relation

    NASA Astrophysics Data System (ADS)

    Yang, Ronggen; Zhang, Yue; Gong, Lejun

    2012-01-01

    With the rapidly increasing of biomedical literature, the deluge of new articles is leading to information overload. Extracting the available knowledge from the huge amount of biomedical literature has become a major challenge. GDRMS is developed as a tool that extracts the relationship between disease and gene, gene and gene from biomedical literatures using text mining technology. It is a ruled-based system which also provides disease-centre network visualization, constructs the disease-gene database, and represents a gene engine for understanding the function of the gene. The main focus of GDRMS is to provide a valuable opportunity to explore the relationship between disease and gene for the research community about etiology of disease.

  16. Extraction and fusion of spectral parameters for face recognition

    NASA Astrophysics Data System (ADS)

    Boisier, B.; Billiot, B.; Abdessalem, Z.; Gouton, P.; Hardeberg, J. Y.

    2011-03-01

    Many methods have been developed in image processing for face recognition, especially in recent years with the increase of biometric technologies. However, most of these techniques are used on grayscale images acquired in the visible range of the electromagnetic spectrum. The aims of our study are to improve existing tools and to develop new methods for face recognition. The techniques used take advantage of the different spectral ranges, the visible, optical infrared and thermal infrared, by either combining them or analyzing them separately in order to extract the most appropriate information for face recognition. We also verify the consistency of several keypoints extraction techniques in the Near Infrared (NIR) and in the Visible Spectrum.

  17. Situational Awareness Geospatial Application (iSAGA)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sher, Benjamin

    Situational Awareness Geospatial Application (iSAGA) is a geospatial situational awareness software tool that uses an algorithm to extract location data from nearly any internet-based, or custom data source and display it geospatially; allows user-friendly conduct of spatial analysis using custom-developed tools; searches complex Geographic Information System (GIS) databases and accesses high resolution imagery. iSAGA has application at the federal, state and local levels of emergency response, consequence management, law enforcement, emergency operations and other decision makers as a tool to provide complete, visual, situational awareness using data feeds and tools selected by the individual agency or organization. Feeds may bemore » layered and custom tools developed to uniquely suit each subscribing agency or organization. iSAGA may similarly be applied to international agencies and organizations.« less

  18. Preparation and use of varied natural tools for extractive foraging by bonobos (Pan Paniscus).

    PubMed

    Roffman, Itai; Savage-Rumbaugh, Sue; Rubert-Pugh, Elizabeth; Stadler, André; Ronen, Avraham; Nevo, Eviatar

    2015-09-01

    The tool-assisted extractive foraging capabilities of captive (zoo) and semi-captive (sanctuary) bonobo (Pan paniscus) groups were compared to each other and to those known in wild chimpanzee (Pan troglodytes) cultures. The bonobos were provided with natural raw materials and challenged with tasks not previously encountered, in experimental settings simulating natural contexts where resources requiring special retrieval efforts were hidden. They were shown that food was buried underground or inserted into long bone cavities, and left to tackle the tasks without further intervention. The bonobos used modified branches and unmodified antlers or stones to dig under rocks and in the ground or to break bones to retrieve the food. Antlers, short sticks, long sticks, and rocks were effectively used as mattocks, daggers, levers, and shovels, respectively. One bonobo successively struck a long bone with an angular hammer stone, completely bisecting it longitudinally. Another bonobo modified long branches into spears and used them as attack weapons and barriers. Bonobos in the sanctuary, unlike those in the zoo, used tool sets to perform sequential actions. The competent and diverse tool-assisted extractive foraging by the bonobos corroborates and complements the extensive information on similar tool use by chimpanzees, suggesting that such competence is a shared trait. Better performance by the sanctuary bonobos than the zoo group was probably due to differences in their cultural exposure and housing conditions. The bonobos' foraging techniques resembled some of those attributed to Oldowan hominins, implying that they can serve as referential models. © 2015 Wiley Periodicals, Inc.

  19. Natural Language Processing in Radiology: A Systematic Review.

    PubMed

    Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A

    2016-05-01

    Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.

  20. Multiplexed Sequence Encoding: A Framework for DNA Communication.

    PubMed

    Zakeri, Bijan; Carr, Peter A; Lu, Timothy K

    2016-01-01

    Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication-data encoding, data transfer & data extraction-and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system-Multiplexed Sequence Encoding (MuSE)-that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA.

  1. Automated Design Tools for Integrated Mixed-Signal Microsystems (NeoCAD)

    DTIC Science & Technology

    2005-02-01

    method, Model Order Reduction (MOR) tools, system-level, mixed-signal circuit synthesis and optimization tools, and parsitic extraction tools. A unique...Mission Area: Command and Control mixed signal circuit simulation parasitic extraction time-domain simulation IC design flow model order reduction... Extraction 1.2 Overall Program Milestones CHAPTER 2 FAST TIME DOMAIN MIXED-SIGNAL CIRCUIT SIMULATION 2.1 HAARSPICE Algorithms 2.1.1 Mathematical Background

  2. Tuberculosis diagnosis support analysis for precarious health information systems.

    PubMed

    Orjuela-Cañón, Alvaro David; Camargo Mendoza, Jorge Eliécer; Awad García, Carlos Enrique; Vergara Vela, Erika Paola

    2018-04-01

    Pulmonary tuberculosis is a world emergency for the World Health Organization. Techniques and new diagnosis tools are important to battle this bacterial infection. There have been many advances in all those fields, but in developing countries such as Colombia, where the resources and infrastructure are limited, new fast and less expensive strategies are increasingly needed. Artificial neural networks are computational intelligence techniques that can be used in this kind of problems and offer additional support in the tuberculosis diagnosis process, providing a tool to medical staff to make decisions about management of subjects under suspicious of tuberculosis. A database extracted from 105 subjects with precarious information of people under suspect of pulmonary tuberculosis was used in this study. Data extracted from sex, age, diabetes, homeless, AIDS status and a variable with clinical knowledge from the medical personnel were used. Models based on artificial neural networks were used, exploring supervised learning to detect the disease. Unsupervised learning was used to create three risk groups based on available information. Obtained results are comparable with traditional techniques for detection of tuberculosis, showing advantages such as fast and low implementation costs. Sensitivity of 97% and specificity of 71% where achieved. Used techniques allowed to obtain valuable information that can be useful for physicians who treat the disease in decision making processes, especially under limited infrastructure and data. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. Comparison of Two Simplification Methods for Shoreline Extraction from Digital Orthophoto Images

    NASA Astrophysics Data System (ADS)

    Bayram, B.; Sen, A.; Selbesoglu, M. O.; Vārna, I.; Petersons, P.; Aykut, N. O.; Seker, D. Z.

    2017-11-01

    The coastal ecosystems are very sensitive to external influences. Coastal resources such as sand dunes, coral reefs and mangroves has vital importance to prevent coastal erosion. Human based effects also threats the coastal areas. Therefore, the change of coastal areas should be monitored. Up-to-date, accurate shoreline information is indispensable for coastal managers and decision makers. Remote sensing and image processing techniques give a big opportunity to obtain reliable shoreline information. In the presented study, NIR bands of seven 1:5000 scaled digital orthophoto images of Riga Bay-Latvia have been used. The Object-oriented Simple Linear Clustering method has been utilized to extract shoreline of Riga Bay. Bend and Douglas-Peucker methods have been used to simplify the extracted shoreline to test the effect of both methods. Photogrammetrically digitized shoreline has been taken as reference data to compare obtained results. The accuracy assessment has been realised by Digital Shoreline Analysis tool. As a result, the achieved shoreline by the Bend method has been found closer to the extracted shoreline with Simple Linear Clustering method.

  4. Imaging genetics approach to predict progression of Parkinson's diseases.

    PubMed

    Mansu Kim; Seong-Jin Son; Hyunjin Park

    2017-07-01

    Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.

  5. Novel Tool for Complete Digitization of Paper Electrocardiography Data

    PubMed Central

    Harless, Chris; Shah, Amit J.; Wick, Carson A.; Mcclellan, James H.

    2013-01-01

    Objective: We present a Matlab-based tool to convert electrocardiography (ECG) information from paper charts into digital ECG signals. The tool can be used for long-term retrospective studies of cardiac patients to study the evolving features with prognostic value. Methods and procedures: To perform the conversion, we: 1) detect the graphical grid on ECG charts using grayscale thresholding; 2) digitize the ECG signal based on its contour using a column-wise pixel scan; and 3) use template-based optical character recognition to extract patient demographic information from the paper ECG in order to interface the data with the patients' medical record. To validate the digitization technique: 1) correlation between the digital signals and signals digitized from paper ECG are performed and 2) clinically significant ECG parameters are measured and compared from both the paper-based ECG signals and the digitized ECG. Results: The validation demonstrates a correlation value of 0.85–0.9 between the digital ECG signal and the signal digitized from the paper ECG. There is a high correlation in the clinical parameters between the ECG information from the paper charts and digitized signal, with intra-observer and inter-observer correlations of 0.8–0.9 \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$({\\rm p}<{0.05})$\\end{document}, and kappa statistics ranging from 0.85 (inter-observer) to 1.00 (intra-observer). Conclusion: The important features of the ECG signal, especially the QRST complex and the associated intervals, are preserved by obtaining the contour from the paper ECG. The differences between the measures of clinically important features extracted from the original signal and the reconstructed signal are insignificant, thus highlighting the accuracy of this technique. Clinical impact: Using this type of ECG digitization tool to carry out retrospective studies on large databases, which rely on paper ECG records, studies of emerging ECG features can be performed. In addition, this tool can be used to potentially integrate digitized ECG information with digital ECG analysis programs and with the patient's electronic medical record. PMID:26594601

  6. A new version of the ERICA tool to facilitate impact assessments of radioactivity on wild plants and animals.

    PubMed

    Brown, J E; Alfonso, B; Avila, R; Beresford, N A; Copplestone, D; Hosseini, A

    2016-03-01

    A new version of the ERICA Tool (version 1.2) was released in November 2014; this constitutes the first major update of the Tool since release in 2007. The key features of the update are presented in this article. Of particular note are new transfer databases extracted from an international compilation of concentration ratios (CRwo-media) and the modification of 'extrapolation' approaches used to select transfer data in cases where information is not available. Bayesian updating approaches have been used in some cases to draw on relevant information that would otherwise have been excluded in the process of deriving CRwo-media statistics. All of these efforts have in turn led to the requirement to update Environmental Media Concentration Limits (EMCLs) used in Tier 1 assessments. Some of the significant changes with regard to EMCLs are highlighted. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Advanced Video Analysis Needs for Human Performance Evaluation

    NASA Technical Reports Server (NTRS)

    Campbell, Paul D.

    1994-01-01

    Evaluators of human task performance in space missions make use of video as a primary source of data. Extraction of relevant human performance information from video is often a labor-intensive process requiring a large amount of time on the part of the evaluator. Based on the experiences of several human performance evaluators, needs were defined for advanced tools which could aid in the analysis of video data from space missions. Such tools should increase the efficiency with which useful information is retrieved from large quantities of raw video. They should also provide the evaluator with new analytical functions which are not present in currently used methods. Video analysis tools based on the needs defined by this study would also have uses in U.S. industry and education. Evaluation of human performance from video data can be a valuable technique in many industrial and institutional settings where humans are involved in operational systems and processes.

  8. Application of Natural Language Processing and Network Analysis Techniques to Post-market Reports for the Evaluation of Dose-related Anti-Thymocyte Globulin Safety Patterns.

    PubMed

    Botsis, Taxiarchis; Foster, Matthew; Arya, Nina; Kreimeyer, Kory; Pandey, Abhishek; Arya, Deepa

    2017-04-26

    To evaluate the feasibility of automated dose and adverse event information retrieval in supporting the identification of safety patterns. We extracted all rabbit Anti-Thymocyte Globulin (rATG) reports submitted to the United States Food and Drug Administration Adverse Event Reporting System (FAERS) from the product's initial licensure in April 16, 1984 through February 8, 2016. We processed the narratives using the Medication Extraction (MedEx) and the Event-based Text-mining of Health Electronic Records (ETHER) systems and retrieved the appropriate medication, clinical, and temporal information. When necessary, the extracted information was manually curated. This process resulted in a high quality dataset that was analyzed with the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA) to explore the association of rATG dosing with post-transplant lymphoproliferative disorder (PTLD). Although manual curation was necessary to improve the data quality, MedEx and ETHER supported the extraction of the appropriate information. We created a final dataset of 1,380 cases with complete information for rATG dosing and date of administration. Analysis in PANACEA found that PTLD was associated with cumulative doses of rATG >8 mg/kg, even in periods where most of the submissions to FAERS reported low doses of rATG. We demonstrated the feasibility of investigating a dose-related safety pattern for a particular product in FAERS using a set of automated tools.

  9. Ganges-Brahmaputra-Meghna Delta Connectivity Analysis Using New Tools for the Automatic Extraction of Channel Networks from Remotely Sensed Imagery

    NASA Astrophysics Data System (ADS)

    Jarriel, T. M.; Isikdogan, F.; Passalacqua, P.; Bovik, A.

    2017-12-01

    River deltas are one of the environmental ecosystems most threatened by climate change and anthropogenic activity. While their low elevation gradients and fertile soil have made them optimal for human inhabitation and diverse ecologic growth, it also makes them susceptible to adverse effects of sea level rise, flooding, subsidence, and manmade structures such as dams, levees, and dikes. One particularly large and threatened delta that is the focus area of this study, is the Ganges-Brahmaputra-Meghna Delta (GBMD) on the southern coast of Bangladesh/West Bengal India. In this study we analyze the GBMD channel network, identify areas of maximum change of the network, and use this information to predict how the network will respond under future scenarios. Landsat images of the delta from 1973 to 2017 are analyzed using new tools for the automatic extraction of channel networks from remotely sensed imagery [Isikdogan et al., 2017a, Isikdogan et al., 2017b]. The tools return channel width and channel centerline location at the resolution of the input imagery (30 m). Channel location variance over time is computed using the combined data from 1973 to 2017 and, based on this information, zones of highest change in the system are identified (Figure 1). Network metrics measuring characteristics of the delta's channels and islands are calculated for each year of the study and compared to the variance results in order to identify what metrics capture this change. These results provide both a method to identify zones of the GBMD that are currently experiencing the most change, as well as a means to predict what areas of the delta will experience network changes in the future. This information will be useful for informing coastal sustainability decisions about what areas of such a large and complex network should be the focus of remediation and mitigation efforts. Isikdogan, F., A. Bovik, P. Passalacqua (2017a), RivaMap: An Automated River Analysis and Mapping Engine, Remote Sensing of Environment, in press. Isikdogan, F., A. Bovik, P. Passalacqua (2017b), River Network Extraction by Deep Convolutional Neural Networks, IEEE Geoscience and Remote Sensing Letters, under review.

  10. a Tool for Crowdsourced Building Information Modeling Through Low-Cost Range Camera: Preliminary Demonstration and Potential

    NASA Astrophysics Data System (ADS)

    Capocchiano, F.; Ravanelli, R.; Crespi, M.

    2017-11-01

    Within the construction sector, Building Information Models (BIMs) are more and more used thanks to the several benefits that they offer in the design of new buildings and the management of the existing ones. Frequently, however, BIMs are not available for already built constructions, but, at the same time, the range camera technology provides nowadays a cheap, intuitive and effective tool for automatically collecting the 3D geometry of indoor environments. It is thus essential to find new strategies, able to perform the first step of the scan to BIM process, by extracting the geometrical information contained in the 3D models that are so easily collected through the range cameras. In this work, a new algorithm to extract planimetries from the 3D models of rooms acquired by means of a range camera is therefore presented. The algorithm was tested on two rooms, characterized by different shapes and dimensions, whose 3D models were captured with the Occipital Structure SensorTM. The preliminary results are promising: the developed algorithm is able to model effectively the 2D shape of the investigated rooms, with an accuracy level comprised in the range of 5 - 10 cm. It can be potentially used by non-expert users in the first step of the BIM generation, when the building geometry is reconstructed, for collecting crowdsourced indoor information in the frame of BIMs Volunteered Geographic Information (VGI) generation.

  11. Millimeter wave scattering characteristics and radar cross section measurements of common roadway objects

    NASA Astrophysics Data System (ADS)

    Zoratti, Paul K.; Gilbert, R. Kent; Majewski, Ronald; Ference, Jack

    1995-12-01

    Development of automotive collision warning systems has progressed rapidly over the past several years. A key enabling technology for these systems is millimeter-wave radar. This paper addresses a very critical millimeter-wave radar sensing issue for automotive radar, namely the scattering characteristics of common roadway objects such as vehicles, roadsigns, and bridge overpass structures. The data presented in this paper were collected on ERIM's Fine Resolution Radar Imaging Rotary Platform Facility and processed with ERIM's image processing tools. The value of this approach is that it provides system developers with a 2D radar image from which information about individual point scatterers `within a single target' can be extracted. This information on scattering characteristics will be utilized to refine threat assessment processing algorithms and automotive radar hardware configurations. (1) By evaluating the scattering characteristics identified in the radar image, radar signatures as a function of aspect angle for common roadway objects can be established. These signatures will aid in the refinement of threat assessment processing algorithms. (2) Utilizing ERIM's image manipulation tools, total RCS and RCS as a function of range and azimuth can be extracted from the radar image data. This RCS information will be essential in defining the operational envelope (e.g. dynamic range) within which any radar sensor hardware must be designed.

  12. Automated extraction of radiation dose information from CT dose report images.

    PubMed

    Li, Xinhua; Zhang, Da; Liu, Bob

    2011-06-01

    The purpose of this article is to describe the development of an automated tool for retrieving texts from CT dose report images. Optical character recognition was adopted to perform text recognitions of CT dose report images. The developed tool is able to automate the process of analyzing multiple CT examinations, including text recognition, parsing, error correction, and exporting data to spreadsheets. The results were precise for total dose-length product (DLP) and were about 95% accurate for CT dose index and DLP of scanned series.

  13. Using Information from the Electronic Health Record to Improve Measurement of Unemployment in Service Members and Veterans with mTBI and Post-Deployment Stress

    PubMed Central

    Dillahunt-Aspillaga, Christina; Finch, Dezon; Massengale, Jill; Kretzmer, Tracy; Luther, Stephen L.; McCart, James A.

    2014-01-01

    Objective The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). Design Retrospective cohort study using data from selected progress notes stored in the EHR. Setting Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. Participants Service members and Veterans with TBI who participated in the PREP program (N = 60). Main Outcome Measures Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. Results Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. Conclusions Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort. PMID:25541956

  14. Nuclear surface diffuseness revealed in nucleon-nucleus diffraction

    NASA Astrophysics Data System (ADS)

    Hatakeyama, S.; Horiuchi, W.; Kohama, A.

    2018-05-01

    The nuclear surface provides useful information on nuclear radius, nuclear structure, as well as properties of nuclear matter. We discuss the relationship between the nuclear surface diffuseness and elastic scattering differential cross section at the first diffraction peak of high-energy nucleon-nucleus scattering as an efficient tool in order to extract the nuclear surface information from limited experimental data involving short-lived unstable nuclei. The high-energy reaction is described by a reliable microscopic reaction theory, the Glauber model. Extending the idea of the black sphere model, we find one-to-one correspondence between the nuclear bulk structure information and proton-nucleus elastic scattering diffraction peak. This implies that we can extract both the nuclear radius and diffuseness simultaneously, using the position of the first diffraction peak and its magnitude of the elastic scattering differential cross section. We confirm the reliability of this approach by using realistic density distributions obtained by a mean-field model.

  15. Extracting Metrics for Three-dimensional Root Systems: Volume and Surface Analysis from In-soil X-ray Computed Tomography Data.

    PubMed

    Suresh, Niraj; Stephens, Sean A; Adams, Lexor; Beck, Anthon N; McKinney, Adriana L; Varga, Tamas

    2016-04-26

    Plant roots play a critical role in plant-soil-microbe interactions that occur in the rhizosphere, as well as processes with important implications to climate change and crop management. Quantitative size information on roots in their native environment is invaluable for studying root growth and environmental processes involving plants. X-ray computed tomography (XCT) has been demonstrated to be an effective tool for in situ root scanning and analysis. We aimed to develop a costless and efficient tool that approximates the surface and volume of the root regardless of its shape from three-dimensional (3D) tomography data. The root structure of a Prairie dropseed (Sporobolus heterolepis) specimen was imaged using XCT. The root was reconstructed, and the primary root structure was extracted from the data using a combination of licensed and open-source software. An isosurface polygonal mesh was then created for ease of analysis. We have developed the standalone application imeshJ, generated in MATLAB(1), to calculate root volume and surface area from the mesh. The outputs of imeshJ are surface area (in mm(2)) and the volume (in mm(3)). The process, utilizing a unique combination of tools from imaging to quantitative root analysis, is described. A combination of XCT and open-source software proved to be a powerful combination to noninvasively image plant root samples, segment root data, and extract quantitative information from the 3D data. This methodology of processing 3D data should be applicable to other material/sample systems where there is connectivity between components of similar X-ray attenuation and difficulties arise with segmentation.

  16. Forensic surface metrology: tool mark evidence.

    PubMed

    Gambino, Carol; McLaughlin, Patrick; Kuo, Loretta; Kammerman, Frani; Shenkin, Peter; Diaczuk, Peter; Petraco, Nicholas; Hamby, James; Petraco, Nicholas D K

    2011-01-01

    Over the last several decades, forensic examiners of impression evidence have come under scrutiny in the courtroom due to analysis methods that rely heavily on subjective morphological comparisons. Currently, there is no universally accepted system that generates numerical data to independently corroborate visual comparisons. Our research attempts to develop such a system for tool mark evidence, proposing a methodology that objectively evaluates the association of striated tool marks with the tools that generated them. In our study, 58 primer shear marks on 9 mm cartridge cases, fired from four Glock model 19 pistols, were collected using high-resolution white light confocal microscopy. The resulting three-dimensional surface topographies were filtered to extract all "waviness surfaces"-the essential "line" information that firearm and tool mark examiners view under a microscope. Extracted waviness profiles were processed with principal component analysis (PCA) for dimension reduction. Support vector machines (SVM) were used to make the profile-gun associations, and conformal prediction theory (CPT) for establishing confidence levels. At the 95% confidence level, CPT coupled with PCA-SVM yielded an empirical error rate of 3.5%. Complementary, bootstrap-based computations for estimated error rates were 0%, indicating that the error rate for the algorithmic procedure is likely to remain low on larger data sets. Finally, suggestions are made for practical courtroom application of CPT for assigning levels of confidence to SVM identifications of tool marks recorded with confocal microscopy. Copyright © 2011 Wiley Periodicals, Inc.

  17. Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease

    PubMed Central

    Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.

    1998-01-01

    The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.

  18. The utility of an automated electronic system to monitor and audit transfusion practice.

    PubMed

    Grey, D E; Smith, V; Villanueva, G; Richards, B; Augustson, B; Erber, W N

    2006-05-01

    Transfusion laboratories with transfusion committees have a responsibility to monitor transfusion practice and generate improvements in clinical decision-making and red cell usage. However, this can be problematic and expensive because data cannot be readily extracted from most laboratory information systems. To overcome this problem, we developed and introduced a system to electronically extract and collate extensive amounts of data from two laboratory information systems and to link it with ICD10 clinical codes in a new database using standard information technology. Three data files were generated from two laboratory information systems, ULTRA (version 3.2) and TM, using standard information technology scripts. These were patient pre- and post-transfusion haemoglobin, blood group and antibody screen, and cross match and transfusion data. These data together with ICD10 codes for surgical cases were imported into an MS ACCESS database and linked by means of a unique laboratory number. Queries were then run to extract the relevant information and processed in Microsoft Excel for graphical presentation. We assessed the utility of this data extraction system to audit transfusion practice in a 600-bed adult tertiary hospital over an 18-month period. A total of 52 MB of data were extracted from the two laboratory information systems for the 18-month period and together with 2.0 MB theatre ICD10 data enabled case-specific transfusion information to be generated. The audit evaluated 15,992 blood group and antibody screens, 25,344 cross-matched red cell units and 15,455 transfused red cell units. Data evaluated included cross-matched to transfusion ratios and pre- and post-transfusion haemoglobin levels for a range of clinical diagnoses. Data showed significant differences between clinical units and by ICD10 code. This method to electronically extract large amounts of data and linkage with clinical databases has provided a powerful and sustainable tool for monitoring transfusion practice. It has been successfully used to identify areas requiring education, training and clinical guidance and allows for comparison with national haemoglobin-based transfusion guidelines.

  19. A new algorithm and system for the characterization of handwriting strokes with delta-lognormal parameters.

    PubMed

    Djioua, Moussa; Plamondon, Réjean

    2009-11-01

    In this paper, we present a new analytical method for estimating the parameters of Delta-Lognormal functions and characterizing handwriting strokes. According to the Kinematic Theory of rapid human movements, these parameters contain information on both the motor commands and the timing properties of a neuromuscular system. The new algorithm, called XZERO, exploits relationships between the zero crossings of the first and second time derivatives of a lognormal function and its four basic parameters. The methodology is described and then evaluated under various testing conditions. The new tool allows a greater variety of stroke patterns to be processed automatically. Furthermore, for the first time, the extraction accuracy is quantified empirically, taking advantage of the exponential relationships that link the dispersion of the extraction errors with its signal-to-noise ratio. A new extraction system which combines this algorithm with two other previously published methods is also described and evaluated. This system provides researchers involved in various domains of pattern analysis and artificial intelligence with new tools for the basic study of single strokes as primitives for understanding rapid human movements.

  20. Leveraging Semantic Knowledge in IRB Databases to Improve Translation Science

    PubMed Central

    Hurdle, John F.; Botkin, Jeffery; Rindflesch, Thomas C.

    2007-01-01

    We introduce the notion that research administrative databases (RADs), such as those increasingly used to manage information flow in the Institutional Review Board (IRB), offer a novel, useful, and mine-able data source overlooked by informaticists. As a proof of concept, using an IRB database we extracted all titles and abstracts from system startup through January 2007 (n=1,876); formatted these in a pseudo-MEDLINE format; and processed them through the SemRep semantic knowledge extraction system. Even though SemRep is tuned to find semantic relations in MEDLINE citations, we found that it performed comparably well on the IRB texts. When adjusted to eliminate non-healthcare IRB submissions (e.g., economic and education studies), SemRep extracted an average of 7.3 semantic relations per IRB abstract (compared to an average of 11.1 for MEDLINE citations) with a precision of 70% (compared to 78% for MEDLINE). We conclude that RADs, as represented by IRB data, are mine-able with existing tools, but that performance will improve as these tools are tuned for RAD structures. PMID:18693856

  1. Difet: Distributed Feature Extraction Tool for High Spatial Resolution Remote Sensing Images

    NASA Astrophysics Data System (ADS)

    Eken, S.; Aydın, E.; Sayar, A.

    2017-11-01

    In this paper, we propose distributed feature extraction tool from high spatial resolution remote sensing images. Tool is based on Apache Hadoop framework and Hadoop Image Processing Interface. Two corner detection (Harris and Shi-Tomasi) algorithms and five feature descriptors (SIFT, SURF, FAST, BRIEF, and ORB) are considered. Robustness of the tool in the task of feature extraction from LandSat-8 imageries are evaluated in terms of horizontal scalability.

  2. Bayesian decoding using unsorted spikes in the rat hippocampus

    PubMed Central

    Layton, Stuart P.; Chen, Zhe; Wilson, Matthew A.

    2013-01-01

    A fundamental task in neuroscience is to understand how neural ensembles represent information. Population decoding is a useful tool to extract information from neuronal populations based on the ensemble spiking activity. We propose a novel Bayesian decoding paradigm to decode unsorted spikes in the rat hippocampus. Our approach uses a direct mapping between spike waveform features and covariates of interest and avoids accumulation of spike sorting errors. Our decoding paradigm is nonparametric, encoding model-free for representing stimuli, and extracts information from all available spikes and their waveform features. We apply the proposed Bayesian decoding algorithm to a position reconstruction task for freely behaving rats based on tetrode recordings of rat hippocampal neuronal activity. Our detailed decoding analyses demonstrate that our approach is efficient and better utilizes the available information in the nonsortable hash than the standard sorting-based decoding algorithm. Our approach can be adapted to an online encoding/decoding framework for applications that require real-time decoding, such as brain-machine interfaces. PMID:24089403

  3. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  4. Computational Tools for Metabolic Engineering

    PubMed Central

    Copeland, Wilbert B.; Bartley, Bryan A.; Chandran, Deepak; Galdzicki, Michal; Kim, Kyung H.; Sleight, Sean C.; Maranas, Costas D.; Sauro, Herbert M.

    2012-01-01

    A great variety of software applications are now employed in the metabolic engineering field. These applications have been created to support a wide range of experimental and analysis techniques. Computational tools are utilized throughout the metabolic engineering workflow to extract and interpret relevant information from large data sets, to present complex models in a more manageable form, and to propose efficient network design strategies. In this review, we present a number of tools that can assist in modifying and understanding cellular metabolic networks. The review covers seven areas of relevance to metabolic engineers. These include metabolic reconstruction efforts, network visualization, nucleic acid and protein engineering, metabolic flux analysis, pathway prospecting, post-structural network analysis and culture optimization. The list of available tools is extensive and we can only highlight a small, representative portion of the tools from each area. PMID:22629572

  5. Decoding 2D-PAGE complex maps: relevance to proteomics.

    PubMed

    Pietrogrande, Maria Chiara; Marchetti, Nicola; Dondi, Francesco; Righetti, Pier Giorgio

    2006-03-20

    This review describes two mathematical approaches useful for decoding the complex signal of 2D-PAGE maps of protein mixtures. These methods are helpful for interpreting the large amount of data of each 2D-PAGE map by extracting all the analytical information hidden therein by spot overlapping. Here the basic theory and application to 2D-PAGE maps are reviewed: the means for extracting information from the experimental data and their relevance to proteomics are discussed. One method is based on the quantitative theory of statistical model of peak overlapping (SMO) using the spot experimental data (intensity and spatial coordinates). The second method is based on the study of the 2D-autocovariance function (2D-ACVF) computed on the experimental digitised map. They are two independent methods that are able to extract equal and complementary information from the 2D-PAGE map. Both methods permit to obtain fundamental information on the sample complexity and the separation performance and to single out ordered patterns present in spot positions: the availability of two independent procedures to compute the same separation parameters is a powerful tool to estimate the reliability of the obtained results. The SMO procedure is an unique tool to quantitatively estimate the degree of spot overlapping present in the map, while the 2D-ACVF method is particularly powerful in simply singling out the presence of order in the spot position from the complexity of the whole 2D map, i.e., spot trains. The procedures were validated by extensive numerical computation on computer-generated maps describing experimental 2D-PAGE gels of protein mixtures. Their applicability to real samples was tested on reference maps obtained from literature sources. The review describes the most relevant information for proteomics: sample complexity, separation performance, overlapping extent, identification of spot trains related to post-translational modifications (PTMs).

  6. Distributed telemedicine for the National Information Infrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Forslund, D.W.; Lee, Seong H.; Reverbel, F.C.

    1997-08-01

    TeleMed is an advanced system that provides a distributed multimedia electronic medical record available over a wide area network. It uses object-based computing, distributed data repositories, advanced graphical user interfaces, and visualization tools along with innovative concept extraction of image information for storing and accessing medical records developed in a separate project from 1994-5. In 1996, we began the transition to Java, extended the infrastructure, and worked to begin deploying TeleMed-like technologies throughout the nation. Other applications are mentioned.

  7. Wavelets and their applications past and future

    NASA Astrophysics Data System (ADS)

    Coifman, Ronald R.

    2009-04-01

    As this is a conference on mathematical tools for defense, I would like to dedicate this talk to the memory of Louis Auslander, who through his insights and visionary leadership, brought powerful new mathematics into DARPA, he has provided the main impetus to the development and insertion of wavelet based processing in defense. My goal here is to describe the evolution of a stream of ideas in Harmonic Analysis, ideas which in the past have been mostly applied for the analysis and extraction of information from physical data, and which now are increasingly applied to organize and extract information and knowledge from any set of digital documents, from text to music to questionnaires. This form of signal processing on digital data, is part of the future of wavelet analysis.

  8. MedEx/J: A One-Scan Simple and Fast NLP Tool for Japanese Clinical Texts.

    PubMed

    Aramaki, Eiji; Yano, Ken; Wakamiya, Shoko

    2017-01-01

    Because of recent replacement of physical documents with electronic medical records (EMR), the importance of information processing in the medical field has increased. In light of this trend, we have been developing MedEx/J, which retrieves important Japanese language information from medical reports. MedEx/J executes two tasks simultaneously: (1) term extraction, and (2) positive and negative event classification. We designate this approach as a one-scan approach, providing simplicity of systems and reasonable accuracy. MedEx/J performance on the two tasks is described herein: (1) term extraction (Fβ = 1 = 0.87) and (2) positive-negative classification (Fβ = 1 = 0.63). This paper also presents discussion and explains remaining issues in the medical natural language processing field.

  9. MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format.

    PubMed

    Ahmed, Zeeshan; Dandekar, Thomas

    2015-01-01

    Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography  (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool 'Mining Scientific Literature (MSL)', which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system's output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format.

  10. PREDOSE: a semantic web platform for drug abuse epidemiology using social media.

    PubMed

    Cameron, Delroy; Smith, Gary A; Daniulaityte, Raminta; Sheth, Amit P; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z; Falck, Russel

    2013-12-01

    The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel semantic web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO--pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC), through combination of lexical, pattern-based and semantics-based techniques. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information, which facilitate search, trend analysis and overall content analysis using social media on prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies

    PubMed Central

    Ni, Yan; Su, Mingming; Qiu, Yunping; Jia, Wei

    2017-01-01

    ADAP-GC is an automated computational pipeline for untargeted, GC-MS-based metabolomics studies. It takes raw mass spectrometry data as input and carries out a sequence of data processing steps including construction of extracted ion chromatograms, detection of chromatographic peak features, deconvolution of co-eluting compounds, and alignment of compounds across samples. Despite the increased accuracy from the original version to version 2.0 in terms of extracting metabolite information for identification and quantitation, ADAP-GC 2.0 requires appropriate specification of a number of parameters and has difficulty in extracting information of compounds that are in low concentration. To overcome these two limitations, ADAP-GC 3.0 was developed to improve both the robustness and sensitivity of compound detection. In this paper, we report how these goals were achieved and compare ADAP-GC 3.0 against three other software tools including ChromaTOF, AnalyzerPro, and AMDIS that are widely used in the metabolomics community. PMID:27461032

  12. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies.

    PubMed

    Ni, Yan; Su, Mingming; Qiu, Yunping; Jia, Wei; Du, Xiuxia

    2016-09-06

    ADAP-GC is an automated computational pipeline for untargeted, GC/MS-based metabolomics studies. It takes raw mass spectrometry data as input and carries out a sequence of data processing steps including construction of extracted ion chromatograms, detection of chromatographic peak features, deconvolution of coeluting compounds, and alignment of compounds across samples. Despite the increased accuracy from the original version to version 2.0 in terms of extracting metabolite information for identification and quantitation, ADAP-GC 2.0 requires appropriate specification of a number of parameters and has difficulty in extracting information on compounds that are in low concentration. To overcome these two limitations, ADAP-GC 3.0 was developed to improve both the robustness and sensitivity of compound detection. In this paper, we report how these goals were achieved and compare ADAP-GC 3.0 against three other software tools including ChromaTOF, AnalyzerPro, and AMDIS that are widely used in the metabolomics community.

  13. Radiomics: a new application from established techniques

    PubMed Central

    Parekh, Vishwa; Jacobs, Michael A.

    2016-01-01

    The increasing use of biomarkers in cancer have led to the concept of personalized medicine for patients. Personalized medicine provides better diagnosis and treatment options available to clinicians. Radiological imaging techniques provide an opportunity to deliver unique data on different types of tissue. However, obtaining useful information from all radiological data is challenging in the era of “big data”. Recent advances in computational power and the use of genomics have generated a new area of research termed Radiomics. Radiomics is defined as the high throughput extraction of quantitative imaging features or texture (radiomics) from imaging to decode tissue pathology and creating a high dimensional data set for feature extraction. Radiomic features provide information about the gray-scale patterns, inter-pixel relationships. In addition, shape and spectral properties can be extracted within the same regions of interest on radiological images. Moreover, these features can be further used to develop computational models using advanced machine learning algorithms that may serve as a tool for personalized diagnosis and treatment guidance. PMID:28042608

  14. Quantifying Human Visible Color Variation from High Definition Digital Images of Orb Web Spiders.

    PubMed

    Tapia-McClung, Horacio; Ajuria Ibarra, Helena; Rao, Dinesh

    2016-01-01

    Digital processing and analysis of high resolution images of 30 individuals of the orb web spider Verrucosa arenata were performed to extract and quantify human visible colors present on the dorsal abdomen of this species. Color extraction was performed with minimal user intervention using an unsupervised algorithm to determine groups of colors on each individual spider, which was then analyzed in order to quantify and classify the colors obtained, both spatially and using energy and entropy measures of the digital images. Analysis shows that the colors cover a small region of the visible spectrum, are not spatially homogeneously distributed over the patterns and from an entropic point of view, colors that cover a smaller region on the whole pattern carry more information than colors covering a larger region. This study demonstrates the use of processing tools to create automatic systems to extract valuable information from digital images that are precise, efficient and helpful for the understanding of the underlying biology.

  15. Quantifying Human Visible Color Variation from High Definition Digital Images of Orb Web Spiders

    PubMed Central

    Ajuria Ibarra, Helena; Rao, Dinesh

    2016-01-01

    Digital processing and analysis of high resolution images of 30 individuals of the orb web spider Verrucosa arenata were performed to extract and quantify human visible colors present on the dorsal abdomen of this species. Color extraction was performed with minimal user intervention using an unsupervised algorithm to determine groups of colors on each individual spider, which was then analyzed in order to quantify and classify the colors obtained, both spatially and using energy and entropy measures of the digital images. Analysis shows that the colors cover a small region of the visible spectrum, are not spatially homogeneously distributed over the patterns and from an entropic point of view, colors that cover a smaller region on the whole pattern carry more information than colors covering a larger region. This study demonstrates the use of processing tools to create automatic systems to extract valuable information from digital images that are precise, efficient and helpful for the understanding of the underlying biology. PMID:27902724

  16. Lynx: a database and knowledge extraction engine for integrative medicine.

    PubMed

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.

  17. Standardized data sharing in a paediatric oncology research network--a proof-of-concept study.

    PubMed

    Hochedlinger, Nina; Nitzlnader, Michael; Falgenhauer, Markus; Welte, Stefan; Hayn, Dieter; Koumakis, Lefteris; Potamias, George; Tsiknakis, Manolis; Saraceno, Davide; Rinaldi, Eugenia; Ladenstein, Ruth; Schreier, Günter

    2015-01-01

    Data that has been collected in the course of clinical trials are potentially valuable for additional scientific research questions in so called secondary use scenarios. This is of particular importance in rare disease areas like paediatric oncology. If data from several research projects need to be connected, so called Core Datasets can be used to define which information needs to be extracted from every involved source system. In this work, the utility of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) as a format for Core Datasets was evaluated and a web tool was developed which received Source ODM XML files and--via Extensible Stylesheet Language Transformation (XSLT)--generated standardized Core Dataset ODM XML files. Using this tool, data from different source systems were extracted and pooled for joined analysis in a proof-of-concept study, facilitating both, basic syntactic and semantic interoperability.

  18. Automated Fluid Feature Extraction from Transient Simulations

    NASA Technical Reports Server (NTRS)

    Haimes, Robert

    2000-01-01

    In the past, feature extraction and identification were interesting concepts, but not required in understanding the physics of a steady flow field. This is because the results of the more traditional tools like iso-surfaces, cuts and streamlines, were more interactive and easily abstracted so they could be represented to the investigator. These tools worked and properly conveyed the collected information at the expense of a great deal of interaction. For unsteady flow-fields, the investigator does not have the luxury of spending time scanning only one 'snap-shot' of the simulation. Automated assistance is required in pointing out areas of potential interest contained within the flow. This must not require a heavy compute burden (the visualization should not significantly slow down the solution procedure for co-processing environments like pV3). And methods must be developed to abstract the feature and display it in a manner that physically makes sense.

  19. Wrapping SRS with CORBA: from textual data to distributed objects.

    PubMed

    Coupaye, T

    1999-04-01

    Biological data come in very different shapes. Databanks are maintained and used by distinct organizations. Text is the de facto Standard exchange format. The SRS system can integrate heterogeneous textual databanks but it was lacking a way to structure the extracted data. This paper presents a CORBA interface to the SRS system which manages databanks in a flat file format. SRS Object Servers are CORBA wrappers for SRS. They allow client applications (visualisation tools, data mining tools, etc.) to access and query SRS servers remotely through an Object Request Broker (ORB). They provide loader objects that contain the information extracted from the databanks by SRS. Loader objects are not hard-coded but generated in a flexible way by using loader specifications which allow SRS administrators to package data coming from distinct databanks. The prototype may be available for beta-testing. Please contact the SRS group (http://srs.ebi.ac.uk).

  20. Automated Extraction of Flow Features

    NASA Technical Reports Server (NTRS)

    Dorney, Suzanne (Technical Monitor); Haimes, Robert

    2005-01-01

    Computational Fluid Dynamics (CFD) simulations are routinely performed as part of the design process of most fluid handling devices. In order to efficiently and effectively use the results of a CFD simulation, visualization tools are often used. These tools are used in all stages of the CFD simulation including pre-processing, interim-processing, and post-processing, to interpret the results. Each of these stages requires visualization tools that allow one to examine the geometry of the device, as well as the partial or final results of the simulation. An engineer will typically generate a series of contour and vector plots to better understand the physics of how the fluid is interacting with the physical device. Of particular interest are detecting features such as shocks, re-circulation zones, and vortices (which will highlight areas of stress and loss). As the demand for CFD analyses continues to increase the need for automated feature extraction capabilities has become vital. In the past, feature extraction and identification were interesting concepts, but not required in understanding the physics of a steady flow field. This is because the results of the more traditional tools like; isc-surface, cuts and streamlines, were more interactive and easily abstracted so they could be represented to the investigator. These tools worked and properly conveyed the collected information at the expense of a great deal of interaction. For unsteady flow-fields, the investigator does not have the luxury of spending time scanning only one "snapshot" of the simulation. Automated assistance is required in pointing out areas of potential interest contained within the flow. This must not require a heavy compute burden (the visualization should not significantly slow down the solution procedure for co-processing environments). Methods must be developed to abstract the feature of interest and display it in a manner that physically makes sense.

  1. Automated Extraction of Flow Features

    NASA Technical Reports Server (NTRS)

    Dorney, Suzanne (Technical Monitor); Haimes, Robert

    2004-01-01

    Computational Fluid Dynamics (CFD) simulations are routinely performed as part of the design process of most fluid handling devices. In order to efficiently and effectively use the results of a CFD simulation, visualization tools are often used. These tools are used in all stages of the CFD simulation including pre-processing, interim-processing, and post-processing, to interpret the results. Each of these stages requires visualization tools that allow one to examine the geometry of the device, as well as the partial or final results of the simulation. An engineer will typically generate a series of contour and vector plots to better understand the physics of how the fluid is interacting with the physical device. Of particular interest are detecting features such as shocks, recirculation zones, and vortices (which will highlight areas of stress and loss). As the demand for CFD analyses continues to increase the need for automated feature extraction capabilities has become vital. In the past, feature extraction and identification were interesting concepts, but not required in understanding the physics of a steady flow field. This is because the results of the more traditional tools like; iso-surface, cuts and streamlines, were more interactive and easily abstracted so they could be represented to the investigator. These tools worked and properly conveyed the collected information at the expense of a great deal of interaction. For unsteady flow-fields, the investigator does not have the luxury of spending time scanning only one "snapshot" of the simulation. Automated assistance is required in pointing out areas of potential interest contained within the flow. This must not require a heavy compute burden (the visualization should not significantly slow down the solution procedure for (co-processing environments). Methods must be developed to abstract the feature of interest and display it in a manner that physically makes sense.

  2. Support patient search on pathology reports with interactive online learning based data extraction.

    PubMed

    Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng

    2015-01-01

    Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.

  3. Fuzzy geometry, entropy, and image information

    NASA Technical Reports Server (NTRS)

    Pal, Sankar K.

    1991-01-01

    Presented here are various uncertainty measures arising from grayness ambiguity and spatial ambiguity in an image, and their possible applications as image information measures. Definitions are given of an image in the light of fuzzy set theory, and of information measures and tools relevant for processing/analysis e.g., fuzzy geometrical properties, correlation, bound functions and entropy measures. Also given is a formulation of algorithms along with management of uncertainties for segmentation and object extraction, and edge detection. The output obtained here is both fuzzy and nonfuzzy. Ambiguity in evaluation and assessment of membership function are also described.

  4. The use of experimental structures to model protein dynamics.

    PubMed

    Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L

    2015-01-01

    The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.

  5. The Use of Experimental Structures to Model Protein Dynamics

    PubMed Central

    Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.

    2014-01-01

    Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965

  6. Experimental "Microcultures" in Young Children: Identifying Biographic, Cognitive, and Social Predictors of Information Transmission

    ERIC Educational Resources Information Center

    Flynn, Emma; Whiten, Andrew

    2012-01-01

    In one of the first open diffusion experiments with young children, a tool-use task that afforded multiple methods to extract an enclosed reward and a child model habitually using one of these methods were introduced into different playgroups. Eighty-eight children, ranging in age from 2 years 8 months to 4 years 5 months, participated. Measures…

  7. ABM Drag_Pass Report Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat

    2008-01-01

    dragREPORT software was developed in parallel with abmREPORT, which is described in the preceding article. Both programs were built on the capabilities created during that process. This tool generates a drag_pass report that summarizes vital information from the MRO aerobreaking drag_pass build process to facilitate both sequence reviews and provide a high-level summarization of the sequence for mission management. The script extracts information from the ENV, SSF, FRF, SCMFmax, and OPTG files, presenting them in a single, easy-to-check report providing the majority of parameters needed for cross check and verification as part of the sequence review process. Prior to dragReport, all the needed information was spread across a number of different files, each in a different format. This software is a Perl script that extracts vital summarization information and build-process details from a number of source files into a single, concise report format used to aid the MPST sequence review process and to provide a high-level summarization of the sequence for mission management reference. This software could be adapted for future aerobraking missions to provide similar reports, review and summarization information.

  8. Regulatory and Non-regulatory Responses to Hydraulic Fracturing in Local Communities

    NASA Astrophysics Data System (ADS)

    Phartiyal, P.

    2015-12-01

    The practice of extracting oil and gas from tight rock formations using advances in technology, such as hydraulic fracturing and directional drilling, has expanded exponentially in states and localities across the country. As the scientific data collection and analysis catches up on the many potential impacts of this unconventional oil and gas development, communities are turning to their local officials to make decisions on whether and how fracking should proceed. While most regulatory authority on the issue rests with the state agencies, local officials have experimented with a wide range of regulatory, non-regulatory, and fiscal tools to manage the impacts of fracking. These impacts can occur on the local air, water, seismicity, soil, roads, schools, and affect residents, on-site workers, emergency and social services. Local officials' approaches are often influenced by their prior experience with minerals extraction in their localities. The speaker will present examples of the kinds of information sources, tools and approaches communities across the country are using, from noise barriers to setback requirements to information sharing in order to be able to balance the promise and perils of oil and gas development in their jurisdictions.

  9. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  10. Phenomenological analysis of medical time series with regular and stochastic components

    NASA Astrophysics Data System (ADS)

    Timashev, Serge F.; Polyakov, Yuriy S.

    2007-06-01

    Flicker-Noise Spectroscopy (FNS), a general approach to the extraction and parameterization of resonant and stochastic components contained in medical time series, is presented. The basic idea of FNS is to treat the correlation links present in sequences of different irregularities, such as spikes, "jumps", and discontinuities in derivatives of different orders, on all levels of the spatiotemporal hierarchy of the system under study as main information carriers. The tools to extract and analyze the information are power spectra and difference moments (structural functions), which complement the information of each other. The structural function stochastic component is formed exclusively by "jumps" of the dynamic variable while the power spectrum stochastic component is formed by both spikes and "jumps" on every level of the hierarchy. The information "passport" characteristics that are determined by fitting the derived expressions to the experimental variations for the stochastic components of power spectra and structural functions are interpreted as the correlation times and parameters that describe the rate of "memory loss" on these correlation time intervals for different irregularities. The number of the extracted parameters is determined by the requirements of the problem under study. Application of this approach to the analysis of tremor velocity signals for a Parkinsonian patient is discussed.

  11. Context-dependent ‘safekeeping’ of foraging tools in New Caledonian crows

    PubMed Central

    Klump, Barbara C.; van der Wal, Jessica E. M.; St Clair, James J. H.; Rutz, Christian

    2015-01-01

    Several animal species use tools for foraging, such as sticks to extract embedded arthropods and honey, or stones to crack open nuts and eggs. While providing access to nutritious foods, these behaviours may incur significant costs, such as the time and energy spent searching for, manufacturing and transporting tools. These costs can be reduced by re-using tools, keeping them safe when not needed. We experimentally investigated what New Caledonian crows do with their tools between successive prey extractions, and whether they express tool ‘safekeeping’ behaviours more often when the costs (foraging at height), or likelihood (handling of demanding prey), of tool loss are high. Birds generally took care of their tools (84% of 176 prey extractions, nine subjects), either trapping them underfoot (74%) or storing them in holes (26%)—behaviours we also observed in the wild (19 cases, four subjects). Moreover, tool-handling behaviour was context-dependent, with subjects: keeping their tools safe significantly more often when foraging at height; and storing tools significantly more often in holes when extracting more demanding prey (under these conditions, foot-trapping proved challenging). In arboreal environments, safekeeping can prevent costly tool losses, removing a potentially important constraint on the evolution of habitual and complex tool behaviour. PMID:25994674

  12. The DEDUCE Guided Query Tool: Providing Simplified Access to Clinical Data for Research and Quality Improvement

    PubMed Central

    Horvath, Monica M.; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-01-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction—the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a guided query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. PMID:21130181

  13. Evaluation of the reliability, usability, and applicability of AMSTAR, AMSTAR 2, and ROBIS: protocol for a descriptive analytic study.

    PubMed

    Gates, Allison; Gates, Michelle; Duarte, Gonçalo; Cary, Maria; Becker, Monika; Prediger, Barbara; Vandermeer, Ben; Fernandes, Ricardo M; Pieper, Dawid; Hartling, Lisa

    2018-06-13

    Systematic reviews (SRs) of randomised controlled trials (RCTs) can provide the best evidence to inform decision-making, but their methodological and reporting quality varies. Tools exist to guide the critical appraisal of quality and risk of bias in SRs, but evaluations of their measurement properties are limited. We will investigate the interrater reliability (IRR), usability, and applicability of A MeaSurement Tool to Assess systematic Reviews (AMSTAR), AMSTAR 2, and Risk Of Bias In Systematic reviews (ROBIS) for SRs in the fields of biomedicine and public health. An international team of researchers at three collaborating centres will undertake the study. We will use a random sample of 30 SRs of RCTs investigating therapeutic interventions indexed in MEDLINE in February 2014. Two reviewers at each centre will appraise the quality and risk of bias in each SR using AMSTAR, AMSTAR 2, and ROBIS. We will record the time to complete each assessment and for the two reviewers to reach consensus for each SR. We will extract the descriptive characteristics of each SR, the included studies, participants, interventions, and comparators. We will also extract the direction and strength of the results and conclusions for the primary outcome. We will summarise the descriptive characteristics of the SRs using means and standard deviations, or frequencies and proportions. To test for interrater reliability between reviewers and between the consensus agreements of reviewer pairs, we will use Gwet's AC 1 statistic. For comparability to previous evaluations, we will also calculate weighted Cohen's kappa and Fleiss' kappa statistics. To estimate usability, we will calculate the mean time to complete the appraisal and to reach consensus for each tool. To inform applications of the tools, we will test for statistical associations between quality scores and risk of bias judgments, and the results and conclusions of the SRs. Appraising the methodological and reporting quality of SRs is necessary to determine the trustworthiness of their conclusions. Which tool may be most reliably applied and how the appraisals should be used is uncertain; the usability of newly developed tools is unknown. This investigation of common (AMSTAR) and newly developed (AMSTAR 2, ROBIS) tools will provide empiric data to inform their application, interpretation, and refinement.

  14. Improvement of sand filter and constructed wetland design using an environmental decision support system.

    PubMed

    Turon, Clàudia; Comas, Joaquim; Torrens, Antonina; Molle, Pascal; Poch, Manel

    2008-01-01

    With the aim of improving effluent quality of waste stabilization ponds, different designs of vertical flow constructed wetlands and intermittent sand filters were tested on an experimental full-scale plant within the framework of a European project. The information extracted from this study was completed and updated with heuristic and bibliographic knowledge. The data and knowledge acquired were difficult to integrate into mathematical models because they involve qualitative information and expert reasoning. Therefore, it was decided to develop an environmental decision support system (EDSS-Filter-Design) as a tool to integrate mathematical models and knowledge-based techniques. This paper describes the development of this support tool, emphasizing the collection of data and knowledge and representation of this information by means of mathematical equations and a rule-based system. The developed support tool provides the main design characteristics of filters: (i) required surface, (ii) media type, and (iii) media depth. These design recommendations are based on wastewater characteristics, applied load, and required treatment level data provided by the user. The results of the EDSS-Filter-Design provide appropriate and useful information and guidelines on how to design filters, according to the expert criteria. The encapsulation of the information into a decision support system reduces the design period and provides a feasible, reasoned, and positively evaluated proposal.

  15. The value of necropsy reports for animal health surveillance.

    PubMed

    Küker, Susanne; Faverjon, Celine; Furrer, Lenz; Berezowski, John; Posthaus, Horst; Rinaldi, Fabio; Vial, Flavie

    2018-06-18

    Animal health data recorded in free text, such as in necropsy reports, can have valuable information for national surveillance systems. However, these data are rarely utilized because the text format requires labor-intensive classification of records before they can be analyzed with using statistical or other software. In a previous study, we designed a text-mining tool to extract data from text in necropsy reports. In the current study, we used the tool to extract data from the reports from pig and cattle necropsies performed between 2000 and 2011 at the Institute of Animal Pathology (ITPA), University of Bern, Switzerland. We evaluated data quality in terms of credibility, completeness and representativeness of the Swiss pig and cattle populations. Data was easily extracted from necropsy reports. Data quality in terms of completeness and validity varied a lot depending on the type of data reported. Diseases of the gastrointestinal system were reported most frequently (54.6% of pig submissions and 40.8% of cattle submissions). Diseases affecting serous membranes were reported in 16.0% of necropsied pigs and 27.6% of cattle. Respiratory diseases were reported in 18.3% of pigs and 21.6% of cattle submissions. This study suggests that extracting data from necropsy reports can provide information of value for animal health surveillance. This data has potential value for monitoring endemic disease syndromes in different age and production groups, or for early detection of emerging or re-emerging diseases. The study identified data entry and other errors that could be corrected to improve the quality and validity of the data. Submissions to veterinary diagnostic laboratories have selection biases and these should be considered when designing surveillance systems that include necropsy reports.

  16. DOCU-TEXT: A tool before the data dictionary

    NASA Technical Reports Server (NTRS)

    Carter, B.

    1983-01-01

    DOCU-TEXT, a proprietary software package that aids in the production of documentation for a data processing organization and can be installed and operated only on IBM computers is discussed. In organizing information that ultimately will reside in a data dictionary, DOCU-TEXT proved to be a useful documentation tool in extracting information from existing production jobs, procedure libraries, system catalogs, control data sets and related files. DOCU-TEXT reads these files to derive data that is useful at the system level. The output of DOCU-TEXT is a series of user selectable reports. These reports can reflect the interactions within a single job stream, a complete system, or all the systems in an installation. Any single report, or group of reports, can be generated in an independent documentation pass.

  17. Urinary bladder stone extraction and instruments compared in textbooks of Abul-Qasim Khalaf Ibn Abbas Alzahrawi (Albucasis) (930-1013) and Serefeddin Sabuncuoglu (1385-1470).

    PubMed

    Elcioglu, Omur; Ozden, Hilmi; Guven, Gul; Kabay, Sahin

    2010-09-01

    We investigated urinary bladder stone, surgical tools, and procedures in urologic sections of textbooks of Abul-Qasim Khalaf Ibn Abbas Alzahrawi (Albucasis) and Serefeddin Sabuncuoglu. In addition, we compared the relation of their textbooks with urologic surgery. Al-Tasreef Liman Aajaz Aan Al-Taaleef (Al-Tasreef), a surgery textbook written by Alzahrawi (who lived in Endulus between 930 and 1013) and Cerrahiyyetu'l-Haniyye, written by Sabuncuoglu (who lived in Turkey between 1385 and 1470) were evaluated with regard to urinary bladder stone and surgical instruments. The textbooks give information about urinary bladder stones. They include definitions of diseases, etiologies, and surgical techniques, and describe surgical tools. Cerrahiyyetu'l Haniyye is a colorful miniaturized textbook. The urinary bladder stone section in Cerrahiyyetu'l Haniyye is the translation of Al-Tasreef with some additional information and illustrations. Surgical tools and procedures described by the two physicians have reached to our century. Tools and procedures invented by Alzahrawi come to the present day in similar or developed styles.

  18. Trace-fiber color discrimination by electrospray ionization mass spectrometry: a tool for the analysis of dyes extracted from submillimeter nylon fibers.

    PubMed

    Tuinman, Albert A; Lewis, Linda A; Lewis, Samuel A

    2003-06-01

    The application of electrospray ionization mass spectrometry (ESI-MS) to trace-fiber color analysis is explored using acidic dyes commonly employed to color nylon-based fibers, as well as extracts from dyed nylon fibers. Qualitative information about constituent dyes and quantitative information about the relative amounts of those dyes present on a single fiber become readily available using this technique. Sample requirements for establishing the color identity of different samples (i.e., comparative trace-fiber analysis) are shown to be submillimeter. Absolute verification of dye mixture identity (beyond the comparison of molecular weights derived from ESI-MS) can be obtained by expanding the technique to include tandem mass spectrometry (ESI-MS/MS). For dyes of unknown origin, the ESI-MS/MS analyses may offer insights into the chemical structure of the compound-information not available from chromatographic techniques alone. This research demonstrates that ESI-MS is viable as a sensitive technique for distinguishing dye constituents extracted from a minute amount of trace-fiber evidence. A protocol is suggested to establish/refute the proposition that two fibers--one of which is available in minute quantity only--are of the same origin.

  19. Tool use, aye-ayes, and sensorimotor intelligence.

    PubMed

    Sterling, E J; Povinelli, D J

    1999-01-01

    Humans, chimpanzees, capuchins and aye-ayes all display an unusually high degree of encephalization and diverse omnivorous extractive foraging. It has been suggested that the high degree of encephalization in aye-ayes may be the result of their diverse, omnivorous extractive foraging behaviors. In combination with certain forms of tool use, omnivorous extractive foraging has been hypothesized to be linked to higher levels of sensorimotor intelligence (stages 5 or 6). Although free-ranging aye-ayes have not been observed to use tools directly in the context of their extractive foraging activities, they have recently been reported to use lianas as tools in a manner that independently suggests that they may possess stage 5 or 6 sensorimotor intelligence. Although other primate species which display diverse, omnivorous extractive foraging have been tested for sensorimotor intelligence, aye-ayes have not. We report a test of captive aye-ayes' comprehension of tool use in a situation designed to simulate natural conditions. The results support the view that aye-ayes do not achieve stage 6 comprehension of tool use, but rather may use trial-and-error learning to develop tool-use behaviors. Other theories for aye-aye encephalization are considered.

  20. Quantifying Traces of Tool Use: A Novel Morphometric Analysis of Damage Patterns on Percussive Tools

    PubMed Central

    Caruana, Matthew V.; Carvalho, Susana; Braun, David R.; Presnyakova, Darya; Haslam, Michael; Archer, Will; Bobe, Rene; Harris, John W. K.

    2014-01-01

    Percussive technology continues to play an increasingly important role in understanding the evolution of tool use. Comparing the archaeological record with extractive foraging behaviors in nonhuman primates has focused on percussive implements as a key to investigating the origins of lithic technology. Despite this, archaeological approaches towards percussive tools have been obscured by a lack of standardized methodologies. Central to this issue have been the use of qualitative, non-diagnostic techniques to identify percussive tools from archaeological contexts. Here we describe a new morphometric method for distinguishing anthropogenically-generated damage patterns on percussive tools from naturally damaged river cobbles. We employ a geomatic approach through the use of three-dimensional scanning and geographical information systems software to statistically quantify the identification process in percussive technology research. This will strengthen current technological analyses of percussive tools in archaeological frameworks and open new avenues for translating behavioral inferences of early hominins from percussive damage patterns. PMID:25415303

  1. Dynamic Visualization of Co-expression in Systems Genetics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    New, Joshua Ryan; Huang, Jian; Chesler, Elissa J

    2008-01-01

    Biologists hope to address grand scientific challenges by exploring the abundance of data made available through modern microarray technology and other high-throughput techniques. The impact of this data, however, is limited unless researchers can effectively assimilate such complex information and integrate it into their daily research; interactive visualization tools are called for to support the effort. Specifically, typical studies of gene co-expression require novel visualization tools that enable the dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These tools should allow biologists to develop an intuitive understanding of the structure of biologicalmore » networks and discover genes which reside in critical positions in networks and pathways. By using a graph as a universal data representation of correlation in gene expression data, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool for interacting with gene co-expression data integrates techniques such as: graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized b-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using a real-world workflow from a large-scale, systems genetics study of mammalian gene co-expression.« less

  2. A Design Support Framework through Dynamic Deployment of Hypothesis and Verification in the Design Process

    NASA Astrophysics Data System (ADS)

    Nomaguch, Yutaka; Fujita, Kikuo

    This paper proposes a design support framework, named DRIFT (Design Rationale Integration Framework of Three layers), which dynamically captures and manages hypothesis and verification in the design process. A core of DRIFT is a three-layered design process model of action, model operation and argumentation. This model integrates various design support tools and captures design operations performed on them. Action level captures the sequence of design operations. Model operation level captures the transition of design states, which records a design snapshot over design tools. Argumentation level captures the process of setting problems and alternatives. The linkage of three levels enables to automatically and efficiently capture and manage iterative hypothesis and verification processes through design operations over design tools. In DRIFT, such a linkage is extracted through the templates of design operations, which are extracted from the patterns embeded in design tools such as Design-For-X (DFX) approaches, and design tools are integrated through ontology-based representation of design concepts. An argumentation model, gIBIS (graphical Issue-Based Information System), is used for representing dependencies among problems and alternatives. A mechanism of TMS (Truth Maintenance System) is used for managing multiple hypothetical design stages. This paper also demonstrates a prototype implementation of DRIFT and its application to a simple design problem. Further, it is concluded with discussion of some future issues.

  3. KneeTex: an ontology-driven system for information extraction from MRI reports.

    PubMed

    Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

    2015-01-01

    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance. KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

  4. Sophia: A Expedient UMLS Concept Extraction Annotator.

    PubMed

    Divita, Guy; Zeng, Qing T; Gundlapalli, Adi V; Duvall, Scott; Nebeker, Jonathan; Samore, Matthew H

    2014-01-01

    An opportunity exists for meaningful concept extraction and indexing from large corpora of clinical notes in the Veterans Affairs (VA) electronic medical record. Currently available tools such as MetaMap, cTAKES and HITex do not scale up to address this big data need. Sophia, a rapid UMLS concept extraction annotator was developed to fulfill a mandate and address extraction where high throughput is needed while preserving performance. We report on the development, testing and benchmarking of Sophia against MetaMap and cTAKEs. Sophia demonstrated improved performance on recall as compared to cTAKES and MetaMap (0.71 vs 0.66 and 0.38). The overall f-score was similar to cTAKES and an improvement over MetaMap (0.53 vs 0.57 and 0.43). With regard to speed of processing records, we noted Sophia to be several fold faster than cTAKES and the scaled-out MetaMap service. Sophia offers a viable alternative for high-throughput information extraction tasks.

  5. Sophia: A Expedient UMLS Concept Extraction Annotator

    PubMed Central

    Divita, Guy; Zeng, Qing T; Gundlapalli, Adi V.; Duvall, Scott; Nebeker, Jonathan; Samore, Matthew H.

    2014-01-01

    An opportunity exists for meaningful concept extraction and indexing from large corpora of clinical notes in the Veterans Affairs (VA) electronic medical record. Currently available tools such as MetaMap, cTAKES and HITex do not scale up to address this big data need. Sophia, a rapid UMLS concept extraction annotator was developed to fulfill a mandate and address extraction where high throughput is needed while preserving performance. We report on the development, testing and benchmarking of Sophia against MetaMap and cTAKEs. Sophia demonstrated improved performance on recall as compared to cTAKES and MetaMap (0.71 vs 0.66 and 0.38). The overall f-score was similar to cTAKES and an improvement over MetaMap (0.53 vs 0.57 and 0.43). With regard to speed of processing records, we noted Sophia to be several fold faster than cTAKES and the scaled-out MetaMap service. Sophia offers a viable alternative for high-throughput information extraction tasks. PMID:25954351

  6. Assessing the health workforce implications of health policy and programming: how a review of grey literature informed the development of a new impact assessment tool.

    PubMed

    Nove, Andrea; Cometto, Giorgio; Campbell, James

    2017-11-09

    In their adoption of WHA resolution 69.19, World Health Organization Member States requested all bilateral and multilateral initiatives to conduct impact assessments of their funding to human resources for health. The High-Level Commission for Health Employment and Economic Growth similarly proposed that official development assistance for health, education, employment and gender are best aligned to creating decent jobs in the health and social workforce. No standard tools exist for assessing the impact of global health initiatives on the health workforce, but tools exist from other fields. The objectives of this paper are to describe how a review of grey literature informed the development of a draft health workforce impact assessment tool and to introduce the tool. A search of grey literature yielded 72 examples of impact assessment tools and guidance from a wide variety of fields including gender, health and human rights. These examples were reviewed, and information relevant to the development of a health workforce impact assessment was extracted from them using an inductive process. A number of good practice principles were identified from the review. These informed the development of a draft health workforce impact assessment tool, based on an established health labour market framework. The tool is designed to be applied before implementation. It consists of a relatively short and focused screening module to be applied to all relevant initiatives, followed by a more in-depth assessment to be applied only to initiatives for which the screening module indicates that significant implications for HRH are anticipated. It thus aims to strike a balance between maximising rigour and minimising administrative burden. The application of the new tool will help to ensure that health workforce implications are incorporated into global health decision-making processes from the outset and to enhance positive HRH impacts and avoid, minimise or offset negative impacts.

  7. Ferret: a user-friendly Java tool to extract data from the 1000 Genomes Project.

    PubMed

    Limou, Sophie; Taverner, Andrew M; Winkler, Cheryl A

    2016-07-15

    The 1000 Genomes (1KG) Project provides a near-comprehensive resource on human genetic variation in worldwide reference populations. 1KG variants can be accessed through a browser and through the raw and annotated data that are regularly released on an ftp server. We developed Ferret, a user-friendly Java tool, to easily extract genetic variation information from these large and complex data files. From a locus, gene(s) or SNP(s) of interest, Ferret retrieves genotype data for 1KG SNPs and indels, and computes allelic frequencies for 1KG populations and optionally, for the Exome Sequencing Project populations. By converting the 1KG data into files that can be imported into popular pre-existing tools (e.g. PLINK and HaploView), Ferret offers a straightforward way, even for non-bioinformatics specialists, to manipulate, explore and merge 1KG data with the user's dataset, as well as visualize linkage disequilibrium pattern, infer haplotypes and design tagSNPs. Ferret tool and source code are publicly available at http://limousophie35.github.io/Ferret/ ferret@nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  8. msBiodat analysis tool, big data analysis for high-throughput experiments.

    PubMed

    Muñoz-Torres, Pau M; Rokć, Filip; Belužic, Robert; Grbeša, Ivana; Vugrek, Oliver

    2016-01-01

    Mass spectrometry (MS) are a group of a high-throughput techniques used to increase knowledge about biomolecules. They produce a large amount of data which is presented as a list of hundreds or thousands of proteins. Filtering those data efficiently is the first step for extracting biologically relevant information. The filtering may increase interest by merging previous data with the data obtained from public databases, resulting in an accurate list of proteins which meet the predetermined conditions. In this article we present msBiodat Analysis Tool, a web-based application thought to approach proteomics to the big data analysis. With this tool, researchers can easily select the most relevant information from their MS experiments using an easy-to-use web interface. An interesting feature of msBiodat analysis tool is the possibility of selecting proteins by its annotation on Gene Ontology using its Gene Id, ensembl or UniProt codes. The msBiodat analysis tool is a web-based application that allows researchers with any programming experience to deal with efficient database querying advantages. Its versatility and user-friendly interface makes easy to perform fast and accurate data screening by using complex queries. Once the analysis is finished, the result is delivered by e-mail. msBiodat analysis tool is freely available at http://msbiodata.irb.hr.

  9. Lynx: a database and knowledge extraction engine for integrative medicine

    PubMed Central

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T. Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788

  10. Extracting nursing practice patterns from structured labor and delivery data sets.

    PubMed

    Hall, Eric S; Thornton, Sidney N

    2007-10-11

    This study was designed to demonstrate the feasibility of a computerized care process model that provides real-time case profiling and outcome forecasting. A methodology was defined for extracting nursing practice patterns from structured point-of-care data collected using the labor and delivery information system at Intermountain Healthcare. Data collected during January 2006 were retrieved from Intermountain Healthcare's enterprise data warehouse for use in the study. The knowledge discovery in databases process provided a framework for data analysis including data selection, preprocessing, data-mining, and evaluation. Development of an interactive data-mining tool and construction of a data model for stratification of patient records into profiles supported the goals of the study. Five benefits of the practice pattern extraction capability, which extend to other clinical domains, are listed with supporting examples.

  11. Validation of the ICU-DaMa tool for automatically extracting variables for minimum dataset and quality indicators: The importance of data quality assessment.

    PubMed

    Sirgo, Gonzalo; Esteban, Federico; Gómez, Josep; Moreno, Gerard; Rodríguez, Alejandro; Blanch, Lluis; Guardiola, Juan José; Gracia, Rafael; De Haro, Lluis; Bodí, María

    2018-04-01

    Big data analytics promise insights into healthcare processes and management, improving outcomes while reducing costs. However, data quality is a major challenge for reliable results. Business process discovery techniques and an associated data model were used to develop data management tool, ICU-DaMa, for extracting variables essential for overseeing the quality of care in the intensive care unit (ICU). To determine the feasibility of using ICU-DaMa to automatically extract variables for the minimum dataset and ICU quality indicators from the clinical information system (CIS). The Wilcoxon signed-rank test and Fisher's exact test were used to compare the values extracted from the CIS with ICU-DaMa for 25 variables from all patients attended in a polyvalent ICU during a two-month period against the gold standard of values manually extracted by two trained physicians. Discrepancies with the gold standard were classified into plausibility, conformance, and completeness errors. Data from 149 patients were included. Although there were no significant differences between the automatic method and the manual method, we detected differences in values for five variables, including one plausibility error and two conformance and completeness errors. Plausibility: 1) Sex, ICU-DaMa incorrectly classified one male patient as female (error generated by the Hospital's Admissions Department). Conformance: 2) Reason for isolation, ICU-DaMa failed to detect a human error in which a professional misclassified a patient's isolation. 3) Brain death, ICU-DaMa failed to detect another human error in which a professional likely entered two mutually exclusive values related to the death of the patient (brain death and controlled donation after circulatory death). Completeness: 4) Destination at ICU discharge, ICU-DaMa incorrectly classified two patients due to a professional failing to fill out the patient discharge form when thepatients died. 5) Length of continuous renal replacement therapy, data were missing for one patient because the CRRT device was not connected to the CIS. Automatic generation of minimum dataset and ICU quality indicators using ICU-DaMa is feasible. The discrepancies were identified and can be corrected by improving CIS ergonomics, training healthcare professionals in the culture of the quality of information, and using tools for detecting and correcting data errors. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Systematic review of surveillance systems and methods for early detection of exotic, new and re-emerging diseases in animal populations.

    PubMed

    Rodríguez-Prieto, V; Vicente-Rubiano, M; Sánchez-Matamoros, A; Rubio-Guerri, C; Melero, M; Martínez-López, B; Martínez-Avilés, M; Hoinville, L; Vergne, T; Comin, A; Schauer, B; Dórea, F; Pfeiffer, D U; Sánchez-Vizcaíno, J M

    2015-07-01

    In this globalized world, the spread of new, exotic and re-emerging diseases has become one of the most important threats to animal production and public health. This systematic review analyses conventional and novel early detection methods applied to surveillance. In all, 125 scientific documents were considered for this study. Exotic (n = 49) and re-emerging (n = 27) diseases constituted the most frequently represented health threats. In addition, the majority of studies were related to zoonoses (n = 66). The approaches found in the review could be divided in surveillance modalities, both active (n = 23) and passive (n = 5); and tools and methodologies that support surveillance activities (n = 57). Combinations of surveillance modalities and tools (n = 40) were also found. Risk-based approaches were very common (n = 60), especially in the papers describing tools and methodologies (n = 50). The main applications, benefits and limitations of each approach were extracted from the papers. This information will be very useful for informing the development of tools to facilitate the design of cost-effective surveillance strategies. Thus, the current literature review provides key information about the advantages, disadvantages, limitations and potential application of methodologies for the early detection of new, exotic and re-emerging diseases.

  13. Lessons Learned from Development of De-identification System for Biomedical Research in a Korean Tertiary Hospital.

    PubMed

    Shin, Soo-Yong; Lyu, Yongman; Shin, Yongdon; Choi, Hyo Joung; Park, Jihyun; Kim, Woo-Sung; Lee, Jae Ho

    2013-06-01

    The Korean government has enacted two laws, namely, the Personal Information Protection Act and the Bioethics and Safety Act to prevent the unauthorized use of medical information. To protect patients' privacy by complying with governmental regulations and improve the convenience of research, Asan Medical Center has been developing a de-identification system for biomedical research. We reviewed Korean regulations to define the scope of the de-identification methods and well-known previous biomedical research platforms to extract the functionalities of the systems. Based on these review results, we implemented necessary programs based on the Asan Medical Center Information System framework which was built using the Microsoft. NET Framework and C#. The developed de-identification system comprises three main components: a de-identification tool, a search tool, and a chart review tool. The de-identification tool can substitute a randomly assigned research ID for a hospital patient ID, remove the identifiers in the structured format, and mask them in the unstructured format, i.e., texts. This tool achieved 98.14% precision and 97.39% recall for 6,520 clinical notes. The search tool can find the number of patients which satisfies given search criteria. The chart review tool can provide de-identified patient's clinical data for review purposes. We found that a clinical data warehouse was essential for successful implementation of the de-identification system, and this system should be tightly linked to an electronic Institutional Review Board system for easy operation of honest brokers. Additionally, we found that a secure cloud environment could be adopted to protect patients' privacy more thoroughly.

  14. A new generation of tools for search, recovery and quality evaluation of World Wide Web medical resources.

    PubMed

    Aguillo, I

    2000-01-01

    Although the Internet is already a valuable information resource in medicine, there are important challenges to be faced before physicians and general users will have extensive access to this information. As a result of a research effort to compile a health-related Internet directory, new tools and strategies have been developed to solve key problems derived from the explosive growth of medical information on the Net and the great concern over the quality of such critical information. The current Internet search engines lack some important capabilities. We suggest using second generation tools (client-side based) able to deal with large quantities of data and to increase the usability of the records recovered. We tested the capabilities of these programs to solve health-related information problems, recognising six groups according to the kind of topics addressed: Z39.50 clients, downloaders, multisearchers, tracing agents, indexers and mappers. The evaluation of the quality of health information available on the Internet could require a large amount of human effort. A possible solution may be to use quantitative indicators based on the hypertext visibility of the Web sites. The cybermetric measures are valid for quality evaluation if they are derived from indirect peer review by experts with Web pages citing the site. The hypertext links acting as citations need to be extracted from a controlled sample of quality super-sites.

  15. MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format

    PubMed Central

    Ahmed, Zeeshan; Dandekar, Thomas

    2018-01-01

    Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography  (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool ‘Mining Scientific Literature (MSL)’, which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system’s output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format. PMID:29721305

  16. Kinect as a Tool for Gait Analysis: Validation of a Real-Time Joint Extraction Algorithm Working in Side View

    PubMed Central

    Cippitelli, Enea; Gasparrini, Samuele; Spinsante, Susanna; Gambi, Ennio

    2015-01-01

    The Microsoft Kinect sensor has gained attention as a tool for gait analysis for several years. Despite the many advantages the sensor provides, however, the lack of a native capability to extract joints from the side view of a human body still limits the adoption of the device to a number of relevant applications. This paper presents an algorithm to locate and estimate the trajectories of up to six joints extracted from the side depth view of a human body captured by the Kinect device. The algorithm is then applied to extract data that can be exploited to provide an objective score for the “Get Up and Go Test”, which is typically adopted for gait analysis in rehabilitation fields. Starting from the depth-data stream provided by the Microsoft Kinect sensor, the proposed algorithm relies on anthropometric models only, to locate and identify the positions of the joints. Differently from machine learning approaches, this solution avoids complex computations, which usually require significant resources. The reliability of the information about the joint position output by the algorithm is evaluated by comparison to a marker-based system. Tests show that the trajectories extracted by the proposed algorithm adhere to the reference curves better than the ones obtained from the skeleton generated by the native applications provided within the Microsoft Kinect (Microsoft Corporation, Redmond, WA, USA, 2013) and OpenNI (OpenNI organization, Tel Aviv, Israel, 2013) Software Development Kits. PMID:25594588

  17. Biomedical discovery acceleration, with applications to craniofacial development.

    PubMed

    Leach, Sonia M; Tipney, Hannah; Feng, Weiguo; Baumgartner, William A; Kasliwal, Priyanka; Schuyler, Ronald P; Williams, Trevor; Spritz, Richard A; Hunter, Lawrence

    2009-03-01

    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.

  18. An improvement of vehicle detection under shadow regions in satellite imagery

    NASA Astrophysics Data System (ADS)

    Karim, Shahid; Zhang, Ye; Ali, Saad; Asif, Muhammad Rizwan

    2018-04-01

    The processing of satellite imagery is dependent upon the quality of imagery. Due to low resolution, it is difficult to extract accurate information according to the requirements of applications. For the purpose of vehicle detection under shadow regions, we have used HOG for feature extraction, SVM is used for classification and HOG is discerned worthwhile tool for complex environments. Shadow images have been scrutinized and found very complex for detection as observed very low detection rates therefore our dedication is towards enhancement of detection rate under shadow regions by implementing appropriate preprocessing. Vehicles are precisely detected under non-shadow regions with high detection rate than shadow regions.

  19. Integrated Functional and Executional Modelling of Software Using Web-Based Databases

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Marietta, Roberta

    1998-01-01

    NASA's software subsystems undergo extensive modification and updates over the operational lifetimes. It is imperative that modified software should satisfy safety goals. This report discusses the difficulties encountered in doing so and discusses a solution based on integrated modelling of software, use of automatic information extraction tools, web technology and databases. To appear in an article of Journal of Database Management.

  20. Language translation, doman specific languages and ANTLR

    NASA Technical Reports Server (NTRS)

    Craymer, Loring; Parr, Terence

    2002-01-01

    We will discuss the features of ANTLR that make it an attractive tool for rapid developement of domain specific language translators and present some practical examples of its use: extraction of information from the Cassini Command Language specification, the processing of structured binary data, and IVL--an English-like language for generating VRML scene graph, which is used in configuring the jGuru.com server.

  1. Computer-based synthetic data to assess the tree delineation algorithm from airborne LiDAR survey

    Treesearch

    Lei Wang; Andrew G. Birt; Charles W. Lafon; David M. Cairns; Robert N. Coulson; Maria D. Tchakerian; Weimin Xi; Sorin C. Popescu; James M. Guldin

    2013-01-01

    Small Footprint LiDAR (Light Detection And Ranging) has been proposed as an effective tool for measuring detailed biophysical characteristics of forests over broad spatial scales. However, by itself LiDAR yields only a sample of the true 3D structure of a forest. In order to extract useful forestry relevant information, this data must be interpreted using mathematical...

  2. Farm Management Support on Cloud Computing Platform: A System for Cropland Monitoring Using Multi-Source Remotely Sensed Data

    NASA Astrophysics Data System (ADS)

    Coburn, C. A.; Qin, Y.; Zhang, J.; Staenz, K.

    2015-12-01

    Food security is one of the most pressing issues facing humankind. Recent estimates predict that over one billion people don't have enough food to meet their basic nutritional needs. The ability of remote sensing tools to monitor and model crop production and predict crop yield is essential for providing governments and farmers with vital information to ensure food security. Google Earth Engine (GEE) is a cloud computing platform, which integrates storage and processing algorithms for massive remotely sensed imagery and vector data sets. By providing the capabilities of storing and analyzing the data sets, it provides an ideal platform for the development of advanced analytic tools for extracting key variables used in regional and national food security systems. With the high performance computing and storing capabilities of GEE, a cloud-computing based system for near real-time crop land monitoring was developed using multi-source remotely sensed data over large areas. The system is able to process and visualize the MODIS time series NDVI profile in conjunction with Landsat 8 image segmentation for crop monitoring. With multi-temporal Landsat 8 imagery, the crop fields are extracted using the image segmentation algorithm developed by Baatz et al.[1]. The MODIS time series NDVI data are modeled by TIMESAT [2], a software package developed for analyzing time series of satellite data. The seasonality of MODIS time series data, for example, the start date of the growing season, length of growing season, and NDVI peak at a field-level are obtained for evaluating the crop-growth conditions. The system fuses MODIS time series NDVI data and Landsat 8 imagery to provide information of near real-time crop-growth conditions through the visualization of MODIS NDVI time series and comparison of multi-year NDVI profiles. Stakeholders, i.e., farmers and government officers, are able to obtain crop-growth information at crop-field level online. This unique utilization of GEE in combination with advanced analytic and extraction techniques provides a vital remote sensing tool for decision makers and scientists with a high-degree of flexibility to adapt to different uses.

  3. An anaesthesia information management system as a tool for a quality assurance program: 10years of experience.

    PubMed

    Motamed, Cyrus; Bourgain, Jean Louis

    2016-06-01

    Anaesthesia Information Management Systems (AIMS) generate large amounts of data, which might be useful for quality assurance programs. This study was designed to highlight the multiple contributions of our AIMS system in extracting quality indicators over a period of 10years. The study was conducted from 2002 to 2011. Two methods were used to extract anaesthesia indicators: the manual extraction of individual files for monitoring neuromuscular relaxation and structured query language (SQL) extraction for other indicators which were postoperative nausea and vomiting (PONV), pain, sedation scores, pain-related medications, scores and postoperative hypothermia. For each indicator, a program of information/meetings and adaptation/suggestions for operating room and PACU personnel was initiated to improve quality assurance, while data were extracted each year. The study included 77,573 patients. The mean overall completeness of data for the initial years ranged from 55 to 85% and was indicator-dependent, which then improved to 95% completeness for the last 5years. The incidence of neuromuscular monitoring was initially 67% and then increased to 95% (P<0.05). The rate of pharmacological reversal remained around 53% throughout the study. Regarding SQL data, an improvement of severe postoperative pain and PONV scores was observed throughout the study, while mild postoperative hypothermia remained a challenge, despite efforts for improvement. The AIMS system permitted the follow-up of certain indicators through manual sampling and many more via SQL extraction in a sustained and non-time-consuming way across years. However, it requires competent and especially dedicated resources to handle the database. Copyright © 2016 Société française d'anesthésie et de réanimation (Sfar). Published by Elsevier Masson SAS. All rights reserved.

  4. Asymptotic Cramer-Rao bounds for Morlet wavelet filter bank transforms of FM signals

    NASA Astrophysics Data System (ADS)

    Scheper, Richard

    2002-03-01

    Wavelet filter banks are potentially useful tools for analyzing and extracting information from frequency modulated (FM) signals in noise. Chief among the advantages of such filter banks is the tendency of wavelet transforms to concentrate signal energy while simultaneously dispersing noise energy over the time-frequency plane, thus raising the effective signal to noise ratio of filtered signals. Over the past decade, much effort has gone into devising new algorithms to extract the relevant information from transformed signals while identifying and discarding the transformed noise. Therefore, estimates of the ultimate performance bounds on such algorithms would serve as valuable benchmarks in the process of choosing optimal algorithms for given signal classes. Discussed here is the specific case of FM signals analyzed by Morlet wavelet filter banks. By making use of the stationary phase approximation of the Morlet transform, and assuming that the measured signals are well resolved digitally, the asymptotic form of the Fisher Information Matrix is derived. From this, Cramer-Rao bounds are analytically derived for simple cases.

  5. The virtual library: Coming of age

    NASA Technical Reports Server (NTRS)

    Hunter, Judy F.; Cotter, Gladys A.

    1994-01-01

    With the high speed networking capabilities, multiple media options, and massive amounts of information that exist in electronic format today, the concept of a 'virtual' library or 'library without walls' is becoming viable. In virtual library environment, the information processed goes beyond the traditional definition of documents to include the results of scientific and technical research and development (reports, software, data) recorded in any format or media: electronic, audio, video, or scanned images. Network access to information must include tools to help locate information sources and navigate the networks to connect to the sources, as well as methods to extract the relevant information. Graphical User Interfaces (GUI's) that are intuitive and navigational tools such as Intelligent Gateway Processors (IGP) will provide users with seamless and transparent use of high speed networks to access, organize, and manage information. Traditional libraries will become points of electronic access to information on multiple medias. The emphasis will be towards unique collections of information at each library rather than entire collections at every library. It is no longer a question of whether there is enough information available; it is more a question of how to manage the vast volumes of information. The future equation will involve being able to organize knowledge, manage information, and provide access at the point of origin.

  6. Data Flow for the TERRA-REF project

    NASA Astrophysics Data System (ADS)

    Kooper, R.; Burnette, M.; Maloney, J.; LeBauer, D.

    2017-12-01

    The Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) program aims to identify crop traits that are best suited to producing high-energy sustainable biofuels and match those plant characteristics to their genes to speed the plant breeding process. One tool used to achieve this goal is a high-throughput phenotyping robot outfitted with sensors and cameras to monitor the growth of 1.25 acres of sorghum. Data types range from hyperspectral imaging to 3D reconstructions and thermal profiles, all at 1mm resolution. This system produces thousands of daily measurements with high spatiotemporal resolution. The team at NCSA processes, annotates, organizes and stores the massive amounts of data produced by this system - up to 5 TB per day. Data from the sensors is streamed to a local gantry-cache server. The standardized sensor raw data stream is automatically and securely delivered to NCSA using Globus Connect service. Once files have been successfully received by the Globus endpoint, the files are removed from the gantry-cache server. As each dataset arrives or is created the Clowder system automatically triggers different software tools to analyze each file, extract information, and convert files to a common format. Other tools can be triggered to run after all required data is uploaded. For example, a stitched image of the entire field is created after all images of the field become available. Some of these tools were developed by external collaborators based on predictive models and algorithms, others were developed as part of other projects and could be leveraged by the TERRA project. Data will be stored for the lifetime of the project and is estimated to reach 10 PB over 3 years. The Clowder system, BETY and other systems will allow users to easily find data by browsing or searching the extracted information.

  7. Reevaluation of microplastics extraction efficiency with the aim of Munich Plastic Sediment Separator.

    NASA Astrophysics Data System (ADS)

    Zobkov, Mikhail; Esiukova, Elena; Grave, Aleksei; Khatmullina, Liliya

    2017-04-01

    Invading of microplastics into marine environment is known as a global ecological threat. Specific density of microplastics can vary significantly depending on a polymer type, technological processes of its production, additives, weathering, and biofouling. Plastic particles can sink or float on the sea surface, but with time, most of drifting plastics become negatively buoyant and sink to the sea floor due to biofouling or adherence of denser particles. As a result, the seabed becomes the ultimate repository for microplastic particles and fibres. A study of microplastics content in aquatic sediments is an important source of information about ways of their migration, sink and accumulation zones. The Munich Plastic Sediment Separator (MPSS), proposed by Imhoff et al. (2012), is considered as the most effective tool for microplastic extraction. However, we observed that the numbers of marine microplastics extracted with this tool from different kinds of bottom sediments were significantly underestimated. We examined the extraction efficiency of the MPSS by adding artificial reference particles (ARPs) to marine sediment sample before the extraction procedure. Extraction was performed by two different methods: the modified NOAA method and using the MPSS. The separation solution with specific density 1.5 g/ml was used. Subsequent cleaning, drying and microscope detection procedures were identical. The microplastics content was determined in supernatant fraction, in the bulk of the extraction solution, in spoil dump fraction of MPSS and in instrument wash-out. While the extraction efficiency from natural sediments of ARPs by the MPSS was really high (100% in most cases), the extraction efficiency of marine microplastics was up to 10 times lower than that obtained with modified NOAA method for the same samples. Less than 40% of the total marine microplastics content has been successfully extracted with the MPSS. Large amounts of marine microplastics were found in the spoil dump and in the bulk solution fractions of the MPSS. Changes in stirring and separation periods had weak impact on the extraction efficiency of ARPs and marine microplastics. Until now, we are unable to find effective working procedures for adequate extraction of marine microplastics with the MPSS. The MPSS was found to be a useful tool for microplastics extraction from large sediment samples for qualitative analysis and to obtain examination specimens. Applying the MPSS for quantitative microplastics analysis requires further testing and elaboration of standardized extraction procedures. The research is supported by the Russian Science Foundation, grant number 15-17-10020 (project MARBLE). Imhof, H. K., Schmid, J., Niessner, R., Ivleva, N. P., Laforsch, C. 2012. A novel, highly efficient method for the separation and quantification of plastic particles in sediments of aquatic environments. Limnology and Oceanography: Methods, 10(7), 524-537. DOI 10.4319/lom.2012.10.524

  8. Overview of image processing tools to extract physical information from JET videos

    NASA Astrophysics Data System (ADS)

    Craciunescu, T.; Murari, A.; Gelfusa, M.; Tiseanu, I.; Zoita, V.; EFDA Contributors, JET

    2014-11-01

    In magnetic confinement nuclear fusion devices such as JET, the last few years have witnessed a significant increase in the use of digital imagery, not only for the surveying and control of experiments, but also for the physical interpretation of results. More than 25 cameras are routinely used for imaging on JET in the infrared (IR) and visible spectral regions. These cameras can produce up to tens of Gbytes per shot and their information content can be very different, depending on the experimental conditions. However, the relevant information about the underlying physical processes is generally of much reduced dimensionality compared to the recorded data. The extraction of this information, which allows full exploitation of these diagnostics, is a challenging task. The image analysis consists, in most cases, of inverse problems which are typically ill-posed mathematically. The typology of objects to be analysed is very wide, and usually the images are affected by noise, low levels of contrast, low grey-level in-depth resolution, reshaping of moving objects, etc. Moreover, the plasma events have time constants of ms or tens of ms, which imposes tough conditions for real-time applications. On JET, in the last few years new tools and methods have been developed for physical information retrieval. The methodology of optical flow has allowed, under certain assumptions, the derivation of information about the dynamics of video objects associated with different physical phenomena, such as instabilities, pellets and filaments. The approach has been extended in order to approximate the optical flow within the MPEG compressed domain, allowing the manipulation of the large JET video databases and, in specific cases, even real-time data processing. The fast visible camera may provide new information that is potentially useful for disruption prediction. A set of methods, based on the extraction of structural information from the visual scene, have been developed for the automatic detection of MARFE (multifaceted asymmetric radiation from the edge) occurrences, which precede disruptions in density limit discharges. An original spot detection method has been developed for large surveys of videos in JET, and for the assessment of the long term trends in their evolution. The analysis of JET IR videos, recorded during JET operation with the ITER-like wall, allows the retrieval of data and hence correlation of the evolution of spots properties with macroscopic events, in particular series of intentional disruptions.

  9. Tools and Data Services from the GSFC Earth Sciences DAAC for Aura Science Data Users

    NASA Technical Reports Server (NTRS)

    Kempler, S.; Johnson, J.; Leptoukh, G.; Ahmad, S.; Pham, L.; Eng, E.; Berrick, S.; Teng, W.; Vollmer, B.

    2004-01-01

    In these times of rapidly increasing amounts of archived data, tools and data services that manipulate data and uncover nuggets of information that potentially lead to scientific discovery are becoming more and more essential. The Goddard Space Flight Center (GSFC) Earth Sciences (GES) Distributed Active Archive Center (DAAC) has made great strides in facilitating science and applications research by, in consultation with its users, developing innovative tools and data services. That is, as data users become more sophisticated in their research and more savvy with information extraction methodologies, the GES DAAC has been responsive to this evolution. This presentation addresses the tools and data services available and under study at the GES DAAC, applied to the Earth sciences atmospheric data. Now, with the data from NASA's latest Atmospheric Chemistry mission, Aura, being readied for public release, GES DAAC tools, proven successful for past atmospheric science missions such as MODIS, AIRS, TRMM, TOMS, and UARS, provide an excellent basis for similar tools updated for the data from the Aura instruments. GES DAAC resident Aura data sets are from the Microwave Limb Sounder (MLS), Ozone Monitoring Instrument (OMI), and High Resolution Dynamics Limb Sounder (HIRDLS). Data obtained by these instruments afford researchers the opportunity to acquire accurate and continuous visualization and analysis, customized for Aura data, will facilitate the use and increase the usefulness of the new data. The Aura data, together with other heritage data at the GES DAAC, can potentially provide a long time series of data. GES DAAC tools will be discussed, as well as the GES DAAC Near Archive Data Mining (NADM) environment, the GIOVANNI on-line analysis tool, and rich data search and order services. Information can be found at: http://daac.gsfc.nasa.gov/upperatm/aura/. Additional information is contained in the original extended abstract.

  10. Extracting Databases from Dark Data with DeepDive.

    PubMed

    Zhang, Ce; Shin, Jaeho; Ré, Christopher; Cafarella, Michael; Niu, Feng

    2016-01-01

    DeepDive is a system for extracting relational databases from dark data : the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data - scientific papers, Web classified ads, customer service notes, and so on - were instead in a relational database, it would give analysts a massive and valuable new set of "big data." DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference.

  11. Method development towards qualitative and semi-quantitative analysis of multiple pesticides from food surfaces and extracts by desorption electrospray ionization mass spectrometry as a preselective tool for food control.

    PubMed

    Gerbig, Stefanie; Stern, Gerold; Brunn, Hubertus E; Düring, Rolf-Alexander; Spengler, Bernhard; Schulz, Sabine

    2017-03-01

    Direct analysis of fruit and vegetable surfaces is an important tool for in situ detection of food contaminants such as pesticides. We tested three different ways to prepare samples for the qualitative desorption electrospray ionization mass spectrometry (DESI-MS) analysis of 32 pesticides found on nine authentic fruits collected from food control. Best recovery rates for topically applied pesticides (88%) were found by analyzing the surface of a glass slide which had been rubbed against the surface of the food. Pesticide concentration in all samples was at or below the maximum residue level allowed. In addition to the high sensitivity of the method for qualitative analysis, quantitative or, at least, semi-quantitative information is needed in food control. We developed a DESI-MS method for the simultaneous determination of linear calibration curves of multiple pesticides of the same chemical class using normalization to one internal standard (ISTD). The method was first optimized for food extracts and subsequently evaluated for the quantification of pesticides in three authentic food extracts. Next, pesticides and the ISTD were applied directly onto food surfaces, and the corresponding calibration curves were obtained. The determination of linear calibration curves was still feasible, as demonstrated for three different food surfaces. This proof-of-principle method was used to simultaneously quantify two pesticides on an authentic sample, showing that the method developed could serve as a fast and simple preselective tool for disclosure of pesticide regulation violations. Graphical Abstract Multiple pesticide residues were detected and quantified in-situ from an authentic set of food items and extracts in a proof of principle study.

  12. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial

    PubMed Central

    Roelofs, Erik; Persoon, Lucas; Nijsten, Sebastiaan; Wiessler, Wolfgang; Dekker, André; Lambin, Philippe

    2016-01-01

    Introduction Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. Material and methods In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. Results The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p < 0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p < 0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Conclusions Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases. PMID:23394741

  13. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial.

    PubMed

    Roelofs, Erik; Persoon, Lucas; Nijsten, Sebastiaan; Wiessler, Wolfgang; Dekker, André; Lambin, Philippe

    2013-07-01

    Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p<0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p<0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  14. LitPathExplorer: a confidence-based visual text analytics tool for exploring literature-enriched pathway models.

    PubMed

    Soto, Axel J; Zerva, Chrysoula; Batista-Navarro, Riza; Ananiadou, Sophia

    2018-04-15

    Pathway models are valuable resources that help us understand the various mechanisms underpinning complex biological processes. Their curation is typically carried out through manual inspection of published scientific literature to find information relevant to a model, which is a laborious and knowledge-intensive task. Furthermore, models curated manually cannot be easily updated and maintained with new evidence extracted from the literature without automated support. We have developed LitPathExplorer, a visual text analytics tool that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models using statements (i.e. events) extracted automatically from the literature and organized according to levels of confidence. LitPathExplorer supports pathway modellers and curators alike by: (i) extracting events from the literature that corroborate existing models with evidence; (ii) discovering new events which can update models; and (iii) providing a confidence value for each event that is automatically computed based on linguistic features and article metadata. Our evaluation of event extraction showed a precision of 89% and a recall of 71%. Evaluation of our confidence measure, when used for ranking sampled events, showed an average precision ranging between 61 and 73%, which can be improved to 95% when the user is involved in the semi-supervised learning process. Qualitative evaluation using pair analytics based on the feedback of three domain experts confirmed the utility of our tool within the context of pathway model exploration. LitPathExplorer is available at http://nactem.ac.uk/LitPathExplorer_BI/. sophia.ananiadou@manchester.ac.uk. Supplementary data are available at Bioinformatics online.

  15. Validation of Caregiver-Centered Delirium Detection Tools: A Systematic Review.

    PubMed

    Rosgen, Brianna; Krewulak, Karla; Demiantschuk, Danielle; Ely, E Wesley; Davidson, Judy E; Stelfox, Henry T; Fiest, Kirsten M

    2018-04-18

    To summarize the validity of caregiver-centered delirium detection tools in hospitalized adults and assess associated patient and caregiver outcomes. Systematic review. We searched MEDLINE, EMBASE, PsycINFO, CINAHL, and Scopus from inception to May 15, 2017. Hospitalized adults. Caregiver-centered delirium detection tools. We drafted a protocol from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Two reviewers independently completed abstract and full-text review, data extraction, and quality assessment. We summarized findings using descriptive statistics including mean, median, standard deviation, range, frequencies (percentages), and Cohen's kappa. Studies that reported on the validity of caregiver-centered delirium detection tools or associated patient and caregiver outcomes and were cohort or cross-sectional in design were included. We reviewed 6,056 titles and abstracts, included 6 articles, and identified 6 caregiver-centered tools. All tools were designed to be administered in several minutes or less and had 11 items or fewer. Three tools were caregiver administered (completed independently by caregivers): Family Confusion Assessment Method (FAM-CAM), Informant Assessment of Geriatric Delirium (I-AGeD), and Sour Seven. Three tools were caregiver informed (administered by a healthcare professional using caregiver input): Single Question in Delirium (SQiD), Single Screening Question Delirium (SSQ-Delirium), and Stressful Caregiving Response to Experiences of Dying. Caregiver-administered tools had better psychometric properties (FAM-CAM sensitivity 75%, 95% confidence interval (CI)=35-95%, specificity 91%, 95% CI=74-97%; Sour Seven positive predictive value 89.5%, negative predictive value 90%) than caregiver-informed tools (SQiD: sensitivity 80%, 95% CI=28.4-99.5%; specificity 71%, 95% CI=41.9-91.6%; SSQ-Delirium sensitivity 79.6%, specificity 56.1%). Delirium detection is essential for appropriate delirium management. Caregiver-centered delirium detection tools show promise in improving delirium detection and associated patient and caregiver outcomes. Comparative studies using larger sample sizes and multiple centers are required to determine validity and reliability characteristics. © 2018, Copyright the Authors Journal compilation © 2018, The American Geriatrics Society.

  16. Metabolomic Analysis and Visualization Engine for LC–MS Data

    PubMed Central

    Melamud, Eugene; Vastag, Livia; Rabinowitz, Joshua D.

    2017-01-01

    Metabolomic analysis by liquid chromatography–high-resolution mass spectrometry results in data sets with thousands of features arising from metabolites, fragments, isotopes, and adducts. Here we describe a software package, Metabolomic Analysis and Visualization ENgine (MAVEN), designed for efficient interactive analysis of LC–MS data, including in the presence of isotope labeling. The software contains tools for all aspects of the data analysis process, from feature extraction to pathway-based graphical data display. To facilitate data validation, a machine learning algorithm automatically assesses peak quality. Users interact with raw data primarily in the form of extracted ion chromatograms, which are displayed with overlaid circles indicating peak quality, and bar graphs of peak intensities for both unlabeled and isotope-labeled metabolite forms. Click-based navigation leads to additional information, such as raw data for specific isotopic forms or for metabolites changing significantly between conditions. Fast data processing algorithms result in nearly delay-free browsing. Drop-down menus provide tools for the overlay of data onto pathway maps. These tools enable animating series of pathway graphs, e.g., to show propagation of labeled forms through a metabolic network. MAVEN is released under an open source license at http://maven.princeton.edu. PMID:21049934

  17. PDF text classification to leverage information extraction from publication reports.

    PubMed

    Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

    2016-06-01

    Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

    PubMed Central

    Pathak, Jyotishman; Bailey, Kent R; Beebe, Calvin E; Bethard, Steven; Carrell, David S; Chen, Pei J; Dligach, Dmitriy; Endle, Cory M; Hart, Lacey A; Haug, Peter J; Huff, Stanley M; Kaggal, Vinod C; Li, Dingcheng; Liu, Hongfang; Marchant, Kyle; Masanz, James; Miller, Timothy; Oniki, Thomas A; Palmer, Martha; Peterson, Kevin J; Rea, Susan; Savova, Guergana K; Stancl, Craig R; Sohn, Sunghwan; Solbrig, Harold R; Suesse, Dale B; Tao, Cui; Taylor, David P; Westberg, Les; Wu, Stephen; Zhuo, Ning; Chute, Christopher G

    2013-01-01

    Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems—Mayo Clinic and Intermountain Healthcare—were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines—namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)—we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts. PMID:24190931

  19. Quantitative 3-D Imaging, Segmentation and Feature Extraction of the Respiratory System in Small Mammals for Computational Biophysics Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trease, Lynn L.; Trease, Harold E.; Fowler, John

    2007-03-15

    One of the critical steps toward performing computational biology simulations, using mesh based integration methods, is in using topologically faithful geometry derived from experimental digital image data as the basis for generating the computational meshes. Digital image data representations contain both the topology of the geometric features and experimental field data distributions. The geometric features that need to be captured from the digital image data are three-dimensional, therefore the process and tools we have developed work with volumetric image data represented as data-cubes. This allows us to take advantage of 2D curvature information during the segmentation and feature extraction process.more » The process is basically: 1) segmenting to isolate and enhance the contrast of the features that we wish to extract and reconstruct, 2) extracting the geometry of the features in an isosurfacing technique, and 3) building the computational mesh using the extracted feature geometry. “Quantitative” image reconstruction and feature extraction is done for the purpose of generating computational meshes, not just for producing graphics "screen" quality images. For example, the surface geometry that we extract must represent a closed water-tight surface.« less

  20. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions.

    PubMed

    Gundersen, Gregory W; Jones, Matthew R; Rouillard, Andrew D; Kou, Yan; Monteiro, Caroline D; Feldmann, Axel S; Hu, Kevin S; Ma'ayan, Avi

    2015-09-15

    Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/. avi.maayan@mssm.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication.

    PubMed

    Allard, Pierre-Marie; Péresse, Tiphaine; Bisson, Jonathan; Gindro, Katia; Marcourt, Laurence; Pham, Van Cuong; Roussi, Fanny; Litaudon, Marc; Wolfender, Jean-Luc

    2016-03-15

    Dereplication represents a key step for rapidly identifying known secondary metabolites in complex biological matrices. In this context, liquid-chromatography coupled to high resolution mass spectrometry (LC-HRMS) is increasingly used and, via untargeted data-dependent MS/MS experiments, massive amounts of detailed information on the chemical composition of crude extracts can be generated. An efficient exploitation of such data sets requires automated data treatment and access to dedicated fragmentation databases. Various novel bioinformatics approaches such as molecular networking (MN) and in-silico fragmentation tools have emerged recently and provide new perspective for early metabolite identification in natural products (NPs) research. Here we propose an innovative dereplication strategy based on the combination of MN with an extensive in-silico MS/MS fragmentation database of NPs. Using two case studies, we demonstrate that this combined approach offers a powerful tool to navigate through the chemistry of complex NPs extracts, dereplicate metabolites, and annotate analogues of database entries.

  2. Rapid Phenotyping of Root Systems of Brachypodium Plants Using X-ray Computed Tomography: a Comparative Study of Soil Types and Segmentation Tools

    NASA Astrophysics Data System (ADS)

    Varga, T.; McKinney, A. L.; Bingham, E.; Handakumbura, P. P.; Jansson, C.

    2017-12-01

    Plant roots play a critical role in plant-soil-microbe interactions that occur in the rhizosphere, as well as in processes with important implications to farming and thus human food supply. X-ray computed tomography (XCT) has been proven to be an effective tool for non-invasive root imaging and analysis. Selected Brachypodium distachyon phenotypes were grown in both natural and artificial soil mixes. The specimens were imaged by XCT, and the root architectures were extracted from the data using three different software-based methods; RooTrak, ImageJ-based WEKA segmentation, and the segmentation feature in VG Studio MAX. The 3D root image was successfully segmented at 30 µm resolution by all three methods. In this presentation, ease of segmentation and the accuracy of the extracted quantitative information (root volume and surface area) will be compared between soil types and segmentation methods. The best route to easy and accurate segmentation and root analysis will be highlighted.

  3. Temporal tuning in the bat auditory cortex is sharper when studied with natural echolocation sequences.

    PubMed

    Beetz, M Jerome; Hechavarría, Julio C; Kössl, Manfred

    2016-06-30

    Precise temporal coding is necessary for proper acoustic analysis. However, at cortical level, forward suppression appears to limit the ability of neurons to extract temporal information from natural sound sequences. Here we studied how temporal processing can be maintained in the bats' cortex in the presence of suppression evoked by natural echolocation streams that are relevant to the bats' behavior. We show that cortical neurons tuned to target-distance actually profit from forward suppression induced by natural echolocation sequences. These neurons can more precisely extract target distance information when they are stimulated with natural echolocation sequences than during stimulation with isolated call-echo pairs. We conclude that forward suppression does for time domain tuning what lateral inhibition does for selectivity forms such as auditory frequency tuning and visual orientation tuning. When talking about cortical processing, suppression should be seen as a mechanistic tool rather than a limiting element.

  4. Text mining for adverse drug events: the promise, challenges, and state of the art.

    PubMed

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H

    2014-10-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

  5. Using Best Practices to Extract, Organize, and Reuse Embedded Decision Support Content Knowledge Rules from Mature Clinical Systems.

    PubMed

    DesAutels, Spencer J; Fox, Zachary E; Giuse, Dario A; Williams, Annette M; Kou, Qing-Hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia

    2016-01-01

    Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems.

  6. Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art

    PubMed Central

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H.

    2014-01-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. Text mining is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text-mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance. PMID:25151493

  7. Sasquatch Footprint Tool

    NASA Technical Reports Server (NTRS)

    Bledsoe, Kristin

    2013-01-01

    The Crew Exploration Vehicle Parachute Assembly System (CPAS) is the parachute system for NASA s Orion spacecraft. The test program consists of numerous drop tests, wherein a test article rigged with parachutes is extracted or released from an aircraft. During such tests, range safety is paramount, as is the recoverability of the parachutes and test article. It is crucial to establish an aircraft release point that will ensure that the article and all items released from it will land in safe locations. A new footprint predictor tool, called Sasquatch, was created in MATLAB. This tool takes in a simulated trajectory for the test article, information about all released objects, and atmospheric wind data (simulated or actual) to calculate the trajectories of the released objects. Dispersions are applied to the landing locations of those objects, taking into account the variability of winds, aircraft release point, and object descent rate. Sasquatch establishes a payload release point (e.g., where the payload will be extracted from the carrier aircraft) that will ensure that the payload and all objects released from it will land in a specified cleared area. The landing locations (the final points in the trajectories) are plotted on a map of the test range. Sasquatch was originally designed for CPAS drop tests and includes extensive information about both the CPAS hardware and the primary test range used for CPAS testing. However, it can easily be adapted for more complex CPAS drop tests, other NASA projects, and commercial partners. CPAS has developed the Sasquatch footprint tool to ensure range safety during parachute drop tests. Sasquatch is well correlated to test data and continues to ensure the safety of test personnel as well as the safe recovery of all equipment. The tool will continue to be modified based on new test data, improving predictions and providing added capability to meet the requirements of more complex testing.

  8. Fundamental finite key limits for one-way information reconciliation in quantum key distribution

    NASA Astrophysics Data System (ADS)

    Tomamichel, Marco; Martinez-Mateo, Jesus; Pacher, Christoph; Elkouss, David

    2017-11-01

    The security of quantum key distribution protocols is guaranteed by the laws of quantum mechanics. However, a precise analysis of the security properties requires tools from both classical cryptography and information theory. Here, we employ recent results in non-asymptotic classical information theory to show that one-way information reconciliation imposes fundamental limitations on the amount of secret key that can be extracted in the finite key regime. In particular, we find that an often used approximation for the information leakage during information reconciliation is not generally valid. We propose an improved approximation that takes into account finite key effects and numerically test it against codes for two probability distributions, that we call binary-binary and binary-Gaussian, that typically appear in quantum key distribution protocols.

  9. A computational image analysis glossary for biologists.

    PubMed

    Roeder, Adrienne H K; Cunha, Alexandre; Burl, Michael C; Meyerowitz, Elliot M

    2012-09-01

    Recent advances in biological imaging have resulted in an explosion in the quality and quantity of images obtained in a digital format. Developmental biologists are increasingly acquiring beautiful and complex images, thus creating vast image datasets. In the past, patterns in image data have been detected by the human eye. Larger datasets, however, necessitate high-throughput objective analysis tools to computationally extract quantitative information from the images. These tools have been developed in collaborations between biologists, computer scientists, mathematicians and physicists. In this Primer we present a glossary of image analysis terms to aid biologists and briefly discuss the importance of robust image analysis in developmental studies.

  10. Taming Log Files from Game/Simulation-Based Assessments: Data Models and Data Analysis Tools. Research Report. ETS RR-16-10

    ERIC Educational Resources Information Center

    Hao, Jiangang; Smith, Lawrence; Mislevy, Robert; von Davier, Alina; Bauer, Malcolm

    2016-01-01

    Extracting information efficiently from game/simulation-based assessment (G/SBA) logs requires two things: a well-structured log file and a set of analysis methods. In this report, we propose a generic data model specified as an extensible markup language (XML) schema for the log files of G/SBAs. We also propose a set of analysis methods for…

  11. MIPS: curated databases and comprehensive secondary data resources in 2010.

    PubMed

    Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

  12. MIPS: curated databases and comprehensive secondary data resources in 2010

    PubMed Central

    Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531

  13. A-MADMAN: Annotation-based microarray data meta-analysis tool

    PubMed Central

    Bisognin, Andrea; Coppe, Alessandro; Ferrari, Francesco; Risso, Davide; Romualdi, Chiara; Bicciato, Silvio; Bortoluzzi, Stefania

    2009-01-01

    Background Publicly available datasets of microarray gene expression signals represent an unprecedented opportunity for extracting genomic relevant information and validating biological hypotheses. However, the exploitation of this exceptionally rich mine of information is still hampered by the lack of appropriate computational tools, able to overcome the critical issues raised by meta-analysis. Results This work presents A-MADMAN, an open source web application which allows the retrieval, annotation, organization and meta-analysis of gene expression datasets obtained from Gene Expression Omnibus. A-MADMAN addresses and resolves several open issues in the meta-analysis of gene expression data. Conclusion A-MADMAN allows i) the batch retrieval from Gene Expression Omnibus and the local organization of raw data files and of any related meta-information, ii) the re-annotation of samples to fix incomplete, or otherwise inadequate, metadata and to create user-defined batches of data, iii) the integrative analysis of data obtained from different Affymetrix platforms through custom chip definition files and meta-normalization. Software and documentation are available on-line at . PMID:19563634

  14. Biobibliometrics (UGDH-TP53-BRCA1) Genes Connections in the Possible Relationship Between Breast Cancer and EEG.

    PubMed

    Martzoukos, Yannis; Papavlasopoulos, Sozon; Poulos, Marios; Syrrou, Maria

    2017-01-01

    In recent years there has been an increasingly amount of data stored in biomedical Databases due to the breakthroughs in biology and bioinformatics, biomedical information is growing exponentially making efficient information retrieval from scientist more and more challenging. New Scientific fields as Bioinformatics seem to be the tool needed to extract scientifically important data based on experimental results and information provided by papers and journals. In this paper we are going to implement a custom made IT system in order to find connections between genes in the breast cancer pathways such the BRCA1 with the electrical energy in the human brain with UGDH gene via the TP53 tumor gene. The proposed system will be able to identify the appearance of each gene ID and compare the coexistence of two genes in PubMed articles/papers. The final system could become a useful tool against the struggle of scientists and medical professionals in the near future.

  15. Interactive Visualization of Large-Scale Hydrological Data using Emerging Technologies in Web Systems and Parallel Programming

    NASA Astrophysics Data System (ADS)

    Demir, I.; Krajewski, W. F.

    2013-12-01

    As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, visualize and share large data sets with general public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and modify the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations or large data repositories. Scientific visualization will be an increasingly important component to build comprehensive environmental information platforms. This presentation provides an overview of the trends and challenges in the field of scientific visualization, and demonstrates information visualization and communication tools developed within the light of these challenges.

  16. Imitate or innovate? Children's innovation is influenced by the efficacy of observed behaviour.

    PubMed

    Carr, Kayleigh; Kendal, Rachel L; Flynn, Emma G

    2015-09-01

    This study investigated the age at which children judge it futile to imitate unreliable information, in the form of a visibly ineffective demonstrated solution, and deviate to produce novel solutions ('innovations'). Children aged 4-9 years were presented with a novel puzzle box, the Multiple-Methods Box (MMB), which offered multiple innovation opportunities to extract a reward using different tools, access points and exits. 209 children were assigned to conditions in which eight social demonstrations of a reward retrieval method were provided; each condition differed incrementally in terms of the method's efficacy (0%, 25%, 75%, and 100% success at extracting the reward). An additional 47 children were assigned to a no-demonstration control condition. Innovative reward extractions from the MMB increased with decreasing efficacy of the demonstrated method. However, imitation remained a widely used strategy irrespective of the efficacy of the method being reproduced (90% of children produced at least one imitative attempt, and imitated on an average of 4.9 out of 8 attempt trials). Children were more likely to innovate in relation to the tool than exit, even though the latter would have been more effective. Overall, innovation was rare: only 12.4% of children innovated by discovering at least one novel reward exit. Children's prioritisation of social information is consistent with theories of cultural evolution indicating imitation is a prepotent response following observation of behaviour, and that innovation is a rarity; so much so, that even maladaptive behaviour is copied. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  17. MEMS product engineering: methodology and tools

    NASA Astrophysics Data System (ADS)

    Ortloff, Dirk; Popp, Jens; Schmidt, Thilo; Hahn, Kai; Mielke, Matthias; Brück, Rainer

    2011-03-01

    The development of MEMS comprises the structural design as well as the definition of an appropriate manufacturing process. Technology constraints have a considerable impact on the device design and vice-versa. Product design and technology development are therefore concurrent tasks. Based on a comprehensive methodology the authors introduce a software environment that links commercial design tools from both area into a common design flow. In this paper emphasis is put on automatic low threshold data acquisition. The intention is to collect and categorize development data for further developments with minimum overhead and minimum disturbance of established business processes. As a first step software tools that automatically extract data from spreadsheets or file-systems and put them in context with existing information are presented. The developments are currently carried out in a European research project.

  18. Systematic review of tools to measure outcomes for young children with autism spectrum disorder.

    PubMed

    McConachie, Helen; Parr, Jeremy R; Glod, Magdalena; Hanratty, Jennifer; Livingstone, Nuala; Oono, Inalegwu P; Robalino, Shannon; Baird, Gillian; Beresford, Bryony; Charman, Tony; Garland, Deborah; Green, Jonathan; Gringras, Paul; Jones, Glenys; Law, James; Le Couteur, Ann S; Macdonald, Geraldine; McColl, Elaine M; Morris, Christopher; Rodgers, Jacqueline; Simonoff, Emily; Terwee, Caroline B; Williams, Katrina

    2015-06-01

    The needs of children with autism spectrum disorder (ASD) are complex and this is reflected in the number and diversity of outcomes assessed and measurement tools used to collect evidence about children's progress. Relevant outcomes include improvement in core ASD impairments, such as communication, social awareness, sensory sensitivities and repetitiveness; skills such as social functioning and play; participation outcomes such as social inclusion; and parent and family impact. To examine the measurement properties of tools used to measure progress and outcomes in children with ASD up to the age of 6 years. To identify outcome areas regarded as important by people with ASD and parents. The MeASURe (Measurement in Autism Spectrum disorder Under Review) research collaboration included ASD experts and review methodologists. We undertook systematic review of tools used in ASD early intervention and observational studies from 1992 to 2013; systematic review, using the COSMIN checklist (Consensus-based Standards for the selection of health Measurement Instruments) of papers addressing the measurement properties of identified tools in children with ASD; and synthesis of evidence and gaps. The review design and process was informed throughout by consultation with stakeholders including parents, young people with ASD, clinicians and researchers. The conceptual framework developed for the review was drawn from the International Classification of Functioning, Disability and Health, including the domains 'Impairments', 'Activity Level Indicators', 'Participation', and 'Family Measures'. In review 1, 10,154 papers were sifted - 3091 by full text - and data extracted from 184; in total, 131 tools were identified, excluding observational coding, study-specific measures and those not in English. In review 2, 2665 papers were sifted and data concerning measurement properties of 57 (43%) tools were extracted from 128 papers. Evidence for the measurement properties of the reviewed tools was combined with information about their accessibility and presentation. Twelve tools were identified as having the strongest supporting evidence, the majority measuring autism characteristics and problem behaviour. The patchy evidence and limited scope of outcomes measured mean these tools do not constitute a 'recommended battery' for use. In particular, there is little evidence that the identified tools would be good at detecting change in intervention studies. The obvious gaps in available outcome measurement include well-being and participation outcomes for children, and family quality-of-life outcomes, domains particularly valued by our informants (young people with ASD and parents). This is the first systematic review of the quality and appropriateness of tools designed to monitor progress and outcomes of young children with ASD. Although it was not possible to recommend fully robust tools at this stage, the review consolidates what is known about the field and will act as a benchmark for future developments. With input from parents and other stakeholders, recommendations are made about priority targets for research. Priorities include development of a tool to measure child quality of life in ASD, and validation of a potential primary outcome tool for trials of early social communication intervention. This study is registered as PROSPERO CRD42012002223. The National Institute for Health Research Health Technology Assessment programme.

  19. Combining TXRF, FT-IR and GC-MS information for identification of inorganic and organic components in black pigments of rock art from Alero Hornillos 2 (Jujuy, Argentina).

    PubMed

    Vázquez, Cristina; Maier, Marta S; Parera, Sara D; Yacobaccio, Hugo; Solá, Patricia

    2008-06-01

    Archaeological samples are complex in composition since they generally comprise a mixture of materials submitted to deterioration factors largely dependent on the environmental conditions. Therefore, the integration of analytical tools such as TXRF, FT-IR and GC-MS can maximize the amount of information provided by the sample. Recently, two black rock art samples of camelid figures at Alero Hornillos 2, an archaeological site located near the town of Susques (Jujuy Province, Argentina), were investigated. TXRF, selected for inorganic information, showed the presence of manganese and iron among other elements, consistent with an iron and manganese oxide as the black pigment. Aiming at the detection of any residual organic compounds, the samples were extracted with a chloroform-methanol mixture and the extracts were analyzed by FT-IR, showing the presence of bands attributable to lipids. Analysis by GC-MS of the carboxylic acid methyl esters prepared from the sample extracts, indicated that the main organic constituents were saturated (C(16:0) and C(18:0)) fatty acids in relative abundance characteristic of degraded animal fat. The presence of minor C(15:0) and C(17:0) fatty acids and branched-chain iso-C(16:0) pointed to a ruminant animal source.

  20. Revealing the properties of oils from their dissolved hydrocarbon compounds in water with an integrated sensor array system.

    PubMed

    Qi, Xiubin; Crooke, Emma; Ross, Andrew; Bastow, Trevor P; Stalvies, Charlotte

    2011-09-21

    This paper presents a system and method developed to identify a source oil's characteristic properties by testing the oil's dissolved components in water. Through close examination of the oil dissolution process in water, we hypothesise that when oil is in contact with water, the resulting oil-water extract, a complex hydrocarbon mixture, carries the signature property information of the parent oil. If the dominating differences in compositions between such extracts of different oils can be identified, this information could guide the selection of various sensors, capable of capturing such chemical variations. When used as an array, such a sensor system can be used to determine parent oil information from the oil-water extract. To test this hypothesis, 22 oils' water extracts were prepared and selected dominant hydrocarbons analyzed with Gas Chromatography-Mass Spectrometry (GC-MS); the subsequent Principal Component Analysis (PCA) indicates that the major difference between the extract solutions is the relative concentration between the volatile mono-aromatics and fluorescent polyaromatics. An integrated sensor array system that is composed of 3 volatile hydrocarbon sensors and 2 polyaromatic hydrocarbon sensors was built accordingly to capture the major and subtle differences of these extracts. It was tested by exposure to a total of 110 water extract solutions diluted from the 22 extracts. The sensor response data collected from the testing were processed with two multivariate analysis tools to reveal information retained in the response patterns of the arrayed sensors: by conducting PCA, we were able to demonstrate the ability to qualitatively identify and distinguish different oil samples from their sensor array response patterns. When a supervised PCA, Linear Discriminate Analysis (LDA), was applied, even quantitative classification can be achieved: the multivariate model generated from the LDA achieved 89.7% of successful classification of the type of the oil samples. By grouping the samples based on the level of viscosity and density we were able to reveal the correlation between the oil extracts' sensor array responses and their original oils' feature properties. The equipment and method developed in this study have promising potential to be readily applied in field studies and marine surveys for oil exploration or oil spill monitoring.

  1. Wavelet extractor: A Bayesian well-tie and wavelet extraction program

    NASA Astrophysics Data System (ADS)

    Gunning, James; Glinsky, Michael E.

    2006-06-01

    We introduce a new open-source toolkit for the well-tie or wavelet extraction problem of estimating seismic wavelets from seismic data, time-to-depth information, and well-log suites. The wavelet extraction model is formulated as a Bayesian inverse problem, and the software will simultaneously estimate wavelet coefficients, other parameters associated with uncertainty in the time-to-depth mapping, positioning errors in the seismic imaging, and useful amplitude-variation-with-offset (AVO) related parameters in multi-stack extractions. It is capable of multi-well, multi-stack extractions, and uses continuous seismic data-cube interpolation to cope with the problem of arbitrary well paths. Velocity constraints in the form of checkshot data, interpreted markers, and sonic logs are integrated in a natural way. The Bayesian formulation allows computation of full posterior uncertainties of the model parameters, and the important problem of the uncertain wavelet span is addressed uses a multi-model posterior developed from Bayesian model selection theory. The wavelet extraction tool is distributed as part of the Delivery seismic inversion toolkit. A simple log and seismic viewing tool is included in the distribution. The code is written in Java, and thus platform independent, but the Seismic Unix (SU) data model makes the inversion particularly suited to Unix/Linux environments. It is a natural companion piece of software to Delivery, having the capacity to produce maximum likelihood wavelet and noise estimates, but will also be of significant utility to practitioners wanting to produce wavelet estimates for other inversion codes or purposes. The generation of full parameter uncertainties is a crucial function for workers wishing to investigate questions of wavelet stability before proceeding to more advanced inversion studies.

  2. Lessons Learned from Development of De-identification System for Biomedical Research in a Korean Tertiary Hospital

    PubMed Central

    Shin, Soo-Yong; Lyu, Yongman; Shin, Yongdon; Choi, Hyo Joung; Park, Jihyun; Kim, Woo-Sung

    2013-01-01

    Objectives The Korean government has enacted two laws, namely, the Personal Information Protection Act and the Bioethics and Safety Act to prevent the unauthorized use of medical information. To protect patients' privacy by complying with governmental regulations and improve the convenience of research, Asan Medical Center has been developing a de-identification system for biomedical research. Methods We reviewed Korean regulations to define the scope of the de-identification methods and well-known previous biomedical research platforms to extract the functionalities of the systems. Based on these review results, we implemented necessary programs based on the Asan Medical Center Information System framework which was built using the Microsoft. NET Framework and C#. Results The developed de-identification system comprises three main components: a de-identification tool, a search tool, and a chart review tool. The de-identification tool can substitute a randomly assigned research ID for a hospital patient ID, remove the identifiers in the structured format, and mask them in the unstructured format, i.e., texts. This tool achieved 98.14% precision and 97.39% recall for 6,520 clinical notes. The search tool can find the number of patients which satisfies given search criteria. The chart review tool can provide de-identified patient's clinical data for review purposes. Conclusions We found that a clinical data warehouse was essential for successful implementation of the de-identification system, and this system should be tightly linked to an electronic Institutional Review Board system for easy operation of honest brokers. Additionally, we found that a secure cloud environment could be adopted to protect patients' privacy more thoroughly. PMID:23882415

  3. Assessment of Anaerobic Metabolic Activity and Microbial Diversity in a Petroleum-Contaminated Aquifer Using Push-Pull Tests in Combination With Molecular Tools and Stable Isotopes

    NASA Astrophysics Data System (ADS)

    Schroth, M. H.; Kleikemper, J.; Pombo, S. A.; Zeyer, J.

    2002-12-01

    In the past, studies on microbial communities in natural environments have typically focused on either their structure or on their metabolic function. However, linking structure and function is important for understanding microbial community dynamics, in particular in contaminated environments. We will present results of a novel combination of a hydrogeological field method (push-pull tests) with molecular tools and stable isotope analysis, which was employed to quantify anaerobic activities and associated microbial diversity in a petroleum-contaminated aquifer in Studen, Switzerland. Push-pull tests consisted of the injection of test solution containing a conservative tracer and reactants (electron acceptors, 13C-labeled carbon sources) into the aquifer anoxic zone. Following an incubation period, the test solution/groundwater mixture was extracted from the same location. Metabolic activities were computed from solute concentrations measured during extraction. Simultaneously, microbial diversity in sediment and groundwater was characterized by using fluorescence in situ hybridization (FISH), denaturing gradient gel electrophoresis (DGGE), as well as phospholipids fatty acid (PLFA) analysis in combination with 13C isotopic measurements. Results from DGGE analyses provided information on the general community structure before, during and after the tests, while FISH yielded information on active populations. Moreover, using 13C-labeling of microbial PLFA we were able to directly link carbon source assimilation in an aquifer to indigenous microorganisms while providing quantitative information on respective carbon source consumption.

  4. Dissecting children's observational learning of complex actions through selective video displays.

    PubMed

    Flynn, Emma; Whiten, Andrew

    2013-10-01

    Children can learn how to use complex objects by watching others, yet the relative importance of different elements they may observe, such as the interactions of the individual parts of the apparatus, a model's movements, and desirable outcomes, remains unclear. In total, 140 3-year-olds and 140 5-year-olds participated in a study where they observed a video showing tools being used to extract a reward item from a complex puzzle box. Conditions varied according to the elements that could be seen in the video: (a) the whole display, including the model's hands, the tools, and the box; (b) the tools and the box but not the model's hands; (c) the model's hands and the tools but not the box; (d) only the end state with the box opened; and (e) no demonstration. Children's later attempts at the task were coded to establish whether they imitated the hierarchically organized sequence of the model's actions, the action details, and/or the outcome. Children's successful retrieval of the reward from the box and the replication of hierarchical sequence information were reduced in all but the whole display condition. Only once children had attempted the task and witnessed a second demonstration did the display focused on the tools and box prove to be better for hierarchical sequence information than the display focused on the tools and hands only. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. Automated Data Cleansing in Data Harvesting and Data Migration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, Mark; Vowell, Lance; King, Ian

    2011-03-16

    In the proposal for this project, we noted how the explosion of digitized information available through corporate databases, data stores and online search systems has resulted in the knowledge worker being bombarded by information. Knowledge workers typically spend more than 20-30% of their time seeking and sorting information, only finding the information 50-60% of the time . This information exists as unstructured, semi-structured and structured data. The problem of information overload is compounded by the production of duplicate or near-duplicate information. In addition, near-duplicate items frequently have different origins, creating a situation in which each item may have unique informationmore » of value, but their differences are not significant enough to justify maintaining them as separate entities. Effective tools can be provided to eliminate duplicate and near-duplicate information. The proposed approach was to extract unique information from data sets and consolidation that information into a single comprehensive file.« less

  6. GeneTools--application for functional annotation and statistical hypothesis testing.

    PubMed

    Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid

    2006-10-24

    Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no

  7. TH-C-18A-08: A Management Tool for CT Dose Monitoring, Analysis, and Protocol Review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, J; Chan, F; Newman, B

    2014-06-15

    Purpose: To develop a customizable tool for enterprise-wide managing of CT protocols and analyzing radiation dose information of CT exams for a variety of quality control applications Methods: All clinical CT protocols implemented on the 11 CT scanners at our institution were extracted in digital format. The original protocols had been preset by our CT management team. A commercial CT dose tracking software (DoseWatch,GE healthcare,WI) was used to collect exam information (exam date, patient age etc.), scanning parameters, and radiation doses for all CT exams. We developed a Matlab-based program (MathWorks,MA) with graphic user interface which allows to analyze themore » scanning protocols with the actual dose estimates, and compare the data to national (ACR,AAPM) and internal reference values for CT quality control. Results: The CT protocol review portion of our tool allows the user to look up the scanning and image reconstruction parameters of any protocol on any of the installed CT systems among about 120 protocols per scanner. In the dose analysis tool, dose information of all CT exams (from 05/2013 to 02/2014) was stratified on a protocol level, and within a protocol down to series level, i.e. each individual exposure event. This allows numerical and graphical review of dose information of any combination of scanner models, protocols and series. The key functions of the tool include: statistics of CTDI, DLP and SSDE, dose monitoring using user-set CTDI/DLP/SSDE thresholds, look-up of any CT exam dose data, and CT protocol review. Conclusion: our inhouse CT management tool provides radiologists, technologists and administration a first-hand near real-time enterprise-wide knowledge on CT dose levels of different exam types. Medical physicists use this tool to manage CT protocols, compare and optimize dose levels across different scanner models. It provides technologists feedback on CT scanning operation, and knowledge on important dose baselines and thresholds.« less

  8. Vaccine adverse event text mining system for extracting features from vaccine safety reports.

    PubMed

    Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert

    2012-01-01

    To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.

  9. ReGaTE: Registration of Galaxy Tools in Elixir

    PubMed Central

    Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé

    2017-01-01

    Abstract Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE. PMID:28402416

  10. Early visual analysis tool using magnetoencephalography for treatment and recovery of neuronal dysfunction.

    PubMed

    Rasheed, Waqas; Neoh, Yee Yik; Bin Hamid, Nor Hisham; Reza, Faruque; Idris, Zamzuri; Tang, Tong Boon

    2017-10-01

    Functional neuroimaging modalities play an important role in deciding the diagnosis and course of treatment of neuronal dysfunction and degeneration. This article presents an analytical tool with visualization by exploiting the strengths of the MEG (magnetoencephalographic) neuroimaging technique. The tool automates MEG data import (in tSSS format), channel information extraction, time/frequency decomposition, and circular graph visualization (connectogram) for simple result inspection. For advanced users, the tool also provides magnitude squared coherence (MSC) values allowing personalized threshold levels, and the computation of default model from MEG data of control population. Default model obtained from healthy population data serves as a useful benchmark to diagnose and monitor neuronal recovery during treatment. The proposed tool further provides optional labels with international 10-10 system nomenclature in order to facilitate comparison studies with EEG (electroencephalography) sensor space. Potential applications in epilepsy and traumatic brain injury studies are also discussed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. 1,4-Dioxane Remediation by Extreme Soil Vapor Extraction (XSVE). Screening-Level Feasibility Assessment and Design Tool in Support of 1,4-Dioxane Remediation by Extreme Soil Vapor Extraction (XSVE) ESTCP Project ER 201326

    DTIC Science & Technology

    2017-10-01

    USER GUIDE 1,4-Dioxane Remediation by Extreme Soil Vapor Extraction (XSVE) Screening-Level Feasibility Assessment and Design Tool in...Support of 1,4-Dioxane Remediation by Extreme Soil Vapor Extraction (XSVE) ESTCP Project ER-201326 OCTOBER 2017 Rob Hinchee Integrated Science...Technology, Inc. 1509 Coastal Highway Panacea, FL 32346 8/8/2013 - 8/8/2018 10-2017 1,4-Dioxane Remediation by Extreme Soil Vapor Extraction (XSVE) Screening

  12. SLIDE - a web-based tool for interactive visualization of large-scale -omics data.

    PubMed

    Ghosh, Soumita; Datta, Abhik; Tan, Kaisen; Choi, Hyungwon

    2018-06-28

    Data visualization is often regarded as a post hoc step for verifying statistically significant results in the analysis of high-throughput data sets. This common practice leaves a large amount of raw data behind, from which more information can be extracted. However, existing solutions do not provide capabilities to explore large-scale raw datasets using biologically sensible queries, nor do they allow user interaction based real-time customization of graphics. To address these drawbacks, we have designed an open-source, web-based tool called Systems-Level Interactive Data Exploration, or SLIDE to visualize large-scale -omics data interactively. SLIDE's interface makes it easier for scientists to explore quantitative expression data in multiple resolutions in a single screen. SLIDE is publicly available under BSD license both as an online version as well as a stand-alone version at https://github.com/soumitag/SLIDE. Supplementary Information are available at Bioinformatics online.

  13. CoPub: a literature-based keyword enrichment tool for microarray data analysis.

    PubMed

    Frijters, Raoul; Heupers, Bart; van Beek, Pieter; Bouwhuis, Maurice; van Schaik, René; de Vlieg, Jacob; Polman, Jan; Alkema, Wynand

    2008-07-01

    Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl.

  14. Semantic Interoperability of Health Risk Assessments

    PubMed Central

    Rajda, Jay; Vreeman, Daniel J.; Wei, Henry G.

    2011-01-01

    The health insurance and benefits industry has administered Health Risk Assessments (HRAs) at an increasing rate. These are used to collect data on modifiable health risk factors for wellness and disease management programs. However, there is significant variability in the semantics of these assessments, making it difficult to compare data sets from the output of 2 different HRAs. There is also an increasing need to exchange this data with Health Information Exchanges and Electronic Medical Records. To standardize the data and concepts from these tools, we outline a process to determine presence of certain common elements of modifiable health risk extracted from these surveys. This information is coded using concept identifiers, which allows cross-survey comparison and analysis. We propose that using LOINC codes or other universal coding schema may allow semantic interoperability of a variety of HRA tools across the industry, research, and clinical settings. PMID:22195174

  15. Real-time Social Internet Data to Guide Forecasting Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Del Valle, Sara Y.

    Our goal is to improve decision support by monitoring and forecasting events using social media, mathematical models, and quantifying model uncertainty. Our approach is real-time, data-driven forecasts with quantified uncertainty: Not just for weather anymore. Information flow from human observations of events through an Internet system and classification algorithms is used to produce quantitatively uncertain forecast. In summary, we want to develop new tools to extract useful information from Internet data streams, develop new approaches to assimilate real-time information into predictive models, validate approaches by forecasting events, and our ultimate goal is to develop an event forecasting system using mathematicalmore » approaches and heterogeneous data streams.« less

  16. Twitter for travel medicine providers.

    PubMed

    Mills, Deborah J; Kohl, Sarah E

    2016-03-01

    Travel medicine practitioners, perhaps more so than medical practitioners working in other areas of medicine, require a constant flow of information to stay up-to-date, and provide best practice information and care to their patients. Many travel medicine providers are unaware of the popularity and potential of the Twitter platform. Twitter use among our travellers, as well as by physicians and health providers, is growing exponentially. There is a rapidly expanding body of published literature on this information tool. This review provides a brief overview of the ways Twitter is being used by health practitioners, the advantages that are peculiar to Twitter as a platform of social media, and how the interested practitioner can get started. Some key points about the dark side of Twitter are highlighted, as well as the potential benefits of using Twitter as a way to disseminate accurate medical information to the public. This article will help readers develop an increased understanding of Twitter as a tool for extracting useful facts and insights from the ever increasing volume of health information. © International Society of Travel Medicine, 2016. All rights reserved. Published by Oxford University Press. For permissions, please e-mail: journals.permissions@oup.com.

  17. Intelligent multi-sensor integrations

    NASA Technical Reports Server (NTRS)

    Volz, Richard A.; Jain, Ramesh; Weymouth, Terry

    1989-01-01

    Growth in the intelligence of space systems requires the use and integration of data from multiple sensors. Generic tools are being developed for extracting and integrating information obtained from multiple sources. The full spectrum is addressed for issues ranging from data acquisition, to characterization of sensor data, to adaptive systems for utilizing the data. In particular, there are three major aspects to the project, multisensor processing, an adaptive approach to object recognition, and distributed sensor system integration.

  18. Advanced data management for optimising the operation of a full-scale WWTP.

    PubMed

    Beltrán, Sergio; Maiza, Mikel; de la Sota, Alejandro; Villanueva, José María; Ayesa, Eduardo

    2012-01-01

    The lack of appropriate data management tools is presently a limiting factor for a broader implementation and a more efficient use of sensors and analysers, monitoring systems and process controllers in wastewater treatment plants (WWTPs). This paper presents a technical solution for advanced data management of a full-scale WWTP. The solution is based on an efficient and intelligent use of the plant data by a standard centralisation of the heterogeneous data acquired from different sources, effective data processing to extract adequate information, and a straightforward connection to other emerging tools focused on the operational optimisation of the plant such as advanced monitoring and control or dynamic simulators. A pilot study of the advanced data manager tool was designed and implemented in the Galindo-Bilbao WWTP. The results of the pilot study showed its potential for agile and intelligent plant data management by generating new enriched information combining data from different plant sources, facilitating the connection of operational support systems, and developing automatic plots and trends of simulated results and actual data for plant performance and diagnosis.

  19. Building a protein name dictionary from full text: a machine learning term extraction approach.

    PubMed

    Shi, Lei; Campagne, Fabien

    2005-04-07

    The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.

  20. Heterogeneous postsurgical data analytics for predictive modeling of mortality risks in intensive care units.

    PubMed

    Yun Chen; Hui Yang

    2014-01-01

    The rapid advancements of biomedical instrumentation and healthcare technology have resulted in data-rich environments in hospitals. However, the meaningful information extracted from rich datasets is limited. There is a dire need to go beyond current medical practices, and develop data-driven methods and tools that will enable and help (i) the handling of big data, (ii) the extraction of data-driven knowledge, (iii) the exploitation of acquired knowledge for optimizing clinical decisions. This present study focuses on the prediction of mortality rates in Intensive Care Units (ICU) using patient-specific healthcare recordings. It is worth mentioning that postsurgical monitoring in ICU leads to massive datasets with unique properties, e.g., variable heterogeneity, patient heterogeneity, and time asyncronization. To cope with the challenges in ICU datasets, we developed the postsurgical decision support system with a series of analytical tools, including data categorization, data pre-processing, feature extraction, feature selection, and predictive modeling. Experimental results show that the proposed data-driven methodology outperforms traditional approaches and yields better results based on the evaluation of real-world ICU data from 4000 subjects in the database. This research shows great potentials for the use of data-driven analytics to improve the quality of healthcare services.

  1. Building a protein name dictionary from full text: a machine learning term extraction approach

    PubMed Central

    Shi, Lei; Campagne, Fabien

    2005-01-01

    Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt. PMID:15817129

  2. Semi-Automated Approach for Mapping Urban Trees from Integrated Aerial LiDAR Point Cloud and Digital Imagery Datasets

    NASA Astrophysics Data System (ADS)

    Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-09-01

    Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.

  3. What can management theories offer evidence-based practice? A comparative analysis of measurement tools for organisational context.

    PubMed

    French, Beverley; Thomas, Lois H; Baker, Paula; Burton, Christopher R; Pennington, Lindsay; Roddam, Hazel

    2009-05-19

    Given the current emphasis on networks as vehicles for innovation and change in health service delivery, the ability to conceptualize and measure organisational enablers for the social construction of knowledge merits attention. This study aimed to develop a composite tool to measure the organisational context for evidence-based practice (EBP) in healthcare. A structured search of the major healthcare and management databases for measurement tools from four domains: research utilisation (RU), research activity (RA), knowledge management (KM), and organisational learning (OL). Included studies were reports of the development or use of measurement tools that included organisational factors. Tools were appraised for face and content validity, plus development and testing methods. Measurement tool items were extracted, merged across the four domains, and categorised within a constructed framework describing the absorptive and receptive capacities of organisations. Thirty measurement tools were identified and appraised. Eighteen tools from the four domains were selected for item extraction and analysis. The constructed framework consists of seven categories relating to three core organisational attributes of vision, leadership, and a learning culture, and four stages of knowledge need, acquisition of new knowledge, knowledge sharing, and knowledge use. Measurement tools from RA or RU domains had more items relating to the categories of leadership, and acquisition of new knowledge; while tools from KM or learning organisation domains had more items relating to vision, learning culture, knowledge need, and knowledge sharing. There was equal emphasis on knowledge use in the different domains. If the translation of evidence into knowledge is viewed as socially mediated, tools to measure the organisational context of EBP in healthcare could be enhanced by consideration of related concepts from the organisational and management sciences. Comparison of measurement tools across domains suggests that there is scope within EBP for supplementing the current emphasis on human and technical resources to support information uptake and use by individuals. Consideration of measurement tools from the fields of KM and OL shows more content related to social mechanisms to facilitate knowledge recognition, translation, and transfer between individuals and groups.

  4. What can management theories offer evidence-based practice? A comparative analysis of measurement tools for organisational context

    PubMed Central

    French, Beverley; Thomas, Lois H; Baker, Paula; Burton, Christopher R; Pennington, Lindsay; Roddam, Hazel

    2009-01-01

    Background Given the current emphasis on networks as vehicles for innovation and change in health service delivery, the ability to conceptualise and measure organisational enablers for the social construction of knowledge merits attention. This study aimed to develop a composite tool to measure the organisational context for evidence-based practice (EBP) in healthcare. Methods A structured search of the major healthcare and management databases for measurement tools from four domains: research utilisation (RU), research activity (RA), knowledge management (KM), and organisational learning (OL). Included studies were reports of the development or use of measurement tools that included organisational factors. Tools were appraised for face and content validity, plus development and testing methods. Measurement tool items were extracted, merged across the four domains, and categorised within a constructed framework describing the absorptive and receptive capacities of organisations. Results Thirty measurement tools were identified and appraised. Eighteen tools from the four domains were selected for item extraction and analysis. The constructed framework consists of seven categories relating to three core organisational attributes of vision, leadership, and a learning culture, and four stages of knowledge need, acquisition of new knowledge, knowledge sharing, and knowledge use. Measurement tools from RA or RU domains had more items relating to the categories of leadership, and acquisition of new knowledge; while tools from KM or learning organisation domains had more items relating to vision, learning culture, knowledge need, and knowledge sharing. There was equal emphasis on knowledge use in the different domains. Conclusion If the translation of evidence into knowledge is viewed as socially mediated, tools to measure the organisational context of EBP in healthcare could be enhanced by consideration of related concepts from the organisational and management sciences. Comparison of measurement tools across domains suggests that there is scope within EBP for supplementing the current emphasis on human and technical resources to support information uptake and use by individuals. Consideration of measurement tools from the fields of KM and OL shows more content related to social mechanisms to facilitate knowledge recognition, translation, and transfer between individuals and groups. PMID:19454008

  5. KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process

    NASA Technical Reports Server (NTRS)

    Gettig, Gary A.

    1988-01-01

    Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.

  6. Extracting biomedical events from pairs of text entities

    PubMed Central

    2015-01-01

    Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478

  7. Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

    PubMed

    Becker, Matthias; Böckmann, Britta

    2016-01-01

    Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.

  8. A Tale of Two Regions: Landscape Ecological Planning for Shale Gas Energy Futures

    NASA Astrophysics Data System (ADS)

    Murtha, T., Jr.; Schroth, O.; Orland, B.; Goldberg, L.; Mazurczyk, T.

    2015-12-01

    As we increasingly embrace deep shale gas deposits to meet global energy demands new and dispersed local and regional policy and planning challenges emerge. Even in regions with long histories of energy extraction, such as coal, shale gas and the infrastructure needed to produce the gas and transport it to market offers uniquely complex transformations in land use and landcover not previously experienced. These transformations are fast paced, dispersed and can overwhelm local and regional planning and regulatory processes. Coupled to these transformations is a structural confounding factor. While extraction and testing are carried out locally, regulation and decision-making is multilayered, often influenced by national and international factors. Using a geodesign framework, this paper applies a set of geospatial landscape ecological planning tools in two shale gas settings. First, we describe and detail a series of ongoing studies and tools that we have developed for communities in the Marcellus Shale region of the eastern United States, specifically the northern tier of Pennsylvania. Second, we apply a subset of these tools to potential gas development areas of the Fylde region in Lancashire, United Kingdom. For the past five years we have tested, applied and refined a set of place based and data driven geospatial models for forecasting, envisioning, analyzing and evaluating shale gas activities in northern Pennsylvania. These models are continuously compared to important landscape ecological planning challenges and priorities in the region, e.g. visual and cultural resource preservation. Adapting and applying these tools to a different landscape allow us to not only isolate and define important regulatory and policy exigencies in each specific setting, but also to develop and refine these models for broader application. As we continue to explore increasingly complex energy solutions globally, we need an equally complex comparative set of landscape ecological planning tools to inform policy, design and regional planning. Adapting tools and techniques developed in Pennsylvania where shale gas extraction is ongoing to Lancashire, where industry is still in the exploratory phase offers a key opportunity to test and refine more generalizable models.

  9. Clinical records anonymisation and text extraction (CRATE): an open-source software system.

    PubMed

    Cardinal, Rudolf N

    2017-04-26

    Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.

  10. Investigation and Evaluation of the open source ETL tools GeoKettle and Talend Open Studio in terms of their ability to process spatial data

    NASA Astrophysics Data System (ADS)

    Kuhnert, Kristin; Quedenau, Jörn

    2016-04-01

    Integration and harmonization of large spatial data sets is not only since the introduction of the spatial data infrastructure INSPIRE a big issue. The process of extracting and combining spatial data from heterogeneous source formats, transforming that data to obtain the required quality for particular purposes and loading it into a data store, are common tasks. The procedure of Extraction, Transformation and Loading of data is called ETL process. Geographic Information Systems (GIS) can take over many of these tasks but often they are not suitable for processing large datasets. ETL tools can make the implementation and execution of ETL processes convenient and efficient. One reason for choosing ETL tools for data integration is that they ease maintenance because of a clear (graphical) presentation of the transformation steps. Developers and administrators are provided with tools for identification of errors, analyzing processing performance and managing the execution of ETL processes. Another benefit of ETL tools is that for most tasks no or only little scripting skills are required so that also researchers without programming background can easily work with it. Investigations on ETL tools for business approaches are available for a long time. However, little work has been published on the capabilities of those tools to handle spatial data. In this work, we review and compare the open source ETL tools GeoKettle and Talend Open Studio in terms of processing spatial data sets of different formats. For evaluation, ETL processes are performed with both software packages based on air quality data measured during the BÄRLIN2014 Campaign initiated by the Institute for Advanced Sustainability Studies (IASS). The aim of the BÄRLIN2014 Campaign is to better understand the sources and distribution of particulate matter in Berlin. The air quality data are available in heterogeneous formats because they were measured with different instruments. For further data analysis, the instrument data has been complemented by other georeferenced data provided by the local environmental authorities. This includes both vector and raster data on e.g. land use categories or building heights, extracted from flat files and OGC-compliant web services. The requirements on the ETL tools are now for instance the extraction of different input datasets like Web Feature Services or vector datasets and the loading of those into databases. The tools also have to manage transformations on spatial datasets like to work with spatial functions (e.g. intersection, union) or change spatial reference systems. Preliminary results suggest that many complex transformation tasks could be accomplished with the existing set of components from both software tools, while there are still many gaps in the range of available features. Both ETL tools differ in functionality and in the way of implementation of various steps. For some tasks no predefined components are available at all, which could partly be compensated by the use of the respective API (freely configurable components in Java or JavaScript).

  11. [Construction of chemical information database based on optical structure recognition technique].

    PubMed

    Lv, C Y; Li, M N; Zhang, L R; Liu, Z M

    2018-04-18

    To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically. Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database. A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records. This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.

  12. Characterizing Task-Based OpenMP Programs

    PubMed Central

    Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats

    2015-01-01

    Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023

  13. Tools and Data Services from the NASA Earth Satellite Observations for Remote Sensing Commercial Applications

    NASA Technical Reports Server (NTRS)

    Vicente, Gilberto

    2005-01-01

    Several commercial applications of remote sensing data, such as water resources management, environmental monitoring, climate prediction, agriculture, forestry, preparation for and migration of extreme weather events, require access to vast amounts of archived high quality data, software tools and services for data manipulation and information extraction. These on the other hand require gaining detailed understanding of the data's internal structure and physical implementation of data reduction, combination and data product production. The time-consuming task must be undertaken before the core investigation can begin and is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets of different formats, structures, and resolutions.

  14. Single-trial event-related potential extraction through one-unit ICA-with-reference

    NASA Astrophysics Data System (ADS)

    Lih Lee, Wee; Tan, Tele; Falkmer, Torbjörn; Leung, Yee Hong

    2016-12-01

    Objective. In recent years, ICA has been one of the more popular methods for extracting event-related potential (ERP) at the single-trial level. It is a blind source separation technique that allows the extraction of an ERP without making strong assumptions on the temporal and spatial characteristics of an ERP. However, the problem with traditional ICA is that the extraction is not direct and is time-consuming due to the need for source selection processing. In this paper, the application of an one-unit ICA-with-Reference (ICA-R), a constrained ICA method, is proposed. Approach. In cases where the time-region of the desired ERP is known a priori, this time information is utilized to generate a reference signal, which is then used for guiding the one-unit ICA-R to extract the source signal of the desired ERP directly. Main results. Our results showed that, as compared to traditional ICA, ICA-R is a more effective method for analysing ERP because it avoids manual source selection and it requires less computation thus resulting in faster ERP extraction. Significance. In addition to that, since the method is automated, it reduces the risks of any subjective bias in the ERP analysis. It is also a potential tool for extracting the ERP in online application.

  15. Single-trial event-related potential extraction through one-unit ICA-with-reference.

    PubMed

    Lee, Wee Lih; Tan, Tele; Falkmer, Torbjörn; Leung, Yee Hong

    2016-12-01

    In recent years, ICA has been one of the more popular methods for extracting event-related potential (ERP) at the single-trial level. It is a blind source separation technique that allows the extraction of an ERP without making strong assumptions on the temporal and spatial characteristics of an ERP. However, the problem with traditional ICA is that the extraction is not direct and is time-consuming due to the need for source selection processing. In this paper, the application of an one-unit ICA-with-Reference (ICA-R), a constrained ICA method, is proposed. In cases where the time-region of the desired ERP is known a priori, this time information is utilized to generate a reference signal, which is then used for guiding the one-unit ICA-R to extract the source signal of the desired ERP directly. Our results showed that, as compared to traditional ICA, ICA-R is a more effective method for analysing ERP because it avoids manual source selection and it requires less computation thus resulting in faster ERP extraction. In addition to that, since the method is automated, it reduces the risks of any subjective bias in the ERP analysis. It is also a potential tool for extracting the ERP in online application.

  16. Classification Features of US Images Liver Extracted with Co-occurrence Matrix Using the Nearest Neighbor Algorithm

    NASA Astrophysics Data System (ADS)

    Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen

    2011-12-01

    Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.

  17. The Promise of Information and Communication Technology in Healthcare: Extracting Value From the Chaos.

    PubMed

    Mamlin, Burke W; Tierney, William M

    2016-01-01

    Healthcare is an information business with expanding use of information and communication technologies (ICTs). Current ICT tools are immature, but a brighter future looms. We examine 7 areas of ICT in healthcare: electronic health records (EHRs), health information exchange (HIE), patient portals, telemedicine, social media, mobile devices and wearable sensors and monitors, and privacy and security. In each of these areas, we examine the current status and future promise, highlighting how each might reach its promise. Steps to better EHRs include a universal programming interface, universal patient identifiers, improved documentation and improved data analysis. HIEs require federal subsidies for sustainability and support from EHR vendors, targeting seamless sharing of EHR data. Patient portals must bring patients into the EHR with better design and training, greater provider engagement and leveraging HIEs. Telemedicine needs sustainable payment models, clear rules of engagement, quality measures and monitoring. Social media needs consensus on rules of engagement for providers, better data mining tools and approaches to counter disinformation. Mobile and wearable devices benefit from a universal programming interface, improved infrastructure, more rigorous research and integration with EHRs and HIEs. Laws for privacy and security need updating to match current technologies, and data stewards should share information on breaches and standardize best practices. ICT tools are evolving quickly in healthcare and require a rational and well-funded national agenda for development, use and assessment. Copyright © 2016 Southern Society for Clinical Investigation. Published by Elsevier Inc. All rights reserved.

  18. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  19. Extracting Databases from Dark Data with DeepDive

    PubMed Central

    Zhang, Ce; Shin, Jaeho; Ré, Christopher; Cafarella, Michael; Niu, Feng

    2016-01-01

    DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data — scientific papers, Web classified ads, customer service notes, and so on — were instead in a relational database, it would give analysts a massive and valuable new set of “big data.” DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference. PMID:28316365

  20. Five shared decision-making tools in 5 months: use of rapid reviews to develop decision boxes for seniors living with dementia and their caregivers.

    PubMed

    Lawani, Moulikatou Adouni; Valéra, Béatriz; Fortier-Brochu, Émilie; Légaré, France; Carmichael, Pierre-Hugues; Côté, Luc; Voyer, Philippe; Kröger, Edeltraut; Witteman, Holly; Rodriguez, Charo; Giguere, Anik M C

    2017-03-15

    Decision support tools build upon comprehensive and timely syntheses of literature. Rapid reviews may allow supporting their development by omitting certain components of traditional systematic reviews. We thus aimed to describe a rapid review approach underlying the development of decision support tools, i.e., five decision boxes (DB) for shared decision-making between seniors living with dementia, their caregivers, and healthcare providers. We included studies based on PICO questions (Participant, Intervention, Comparison, Outcome) describing each of the five specific decision. We gave priority to higher quality evidence (e.g., systematic reviews). For each DB, we first identified secondary sources of literature, namely, clinical summaries, clinical practice guidelines, and systematic reviews. After an initial extraction, we searched for primary studies in academic databases and grey literature to fill gaps in evidence. We extracted study designs, sample sizes, populations, and probabilities of benefits/harms of the health options. A single reviewer conducted the literature search and study selection. The data extracted by one reviewer was verified by a second experienced reviewer. Two reviewers assessed the quality of the evidence. We converted all probabilities into absolute risks for ease of understanding. Two to five experts validated the content of each DB. We conducted descriptive statistical analyses on the review processes and resources required. The approach allowed screening of a limited number of references (range: 104 to 406/review). For each review, we included 15 to 26 studies, 2 to 10 health options, 11 to 62 health outcomes and we conducted 9 to 47 quality assessments. A team of ten reviewers with varying levels of expertise was supported at specific steps by an information specialist, a biostatistician, and a graphic designer. The time required to complete a rapid review varied from 7 to 31 weeks per review (mean ± SD, 19 ± 10 weeks). Data extraction required the most time (8 ± 6.8 weeks). The average estimated cost of a rapid review was C$11,646 (SD = C$10,914). This approach enabled the development of clinical tools more rapidly than with a traditional systematic review. Future studies should evaluate the applicability of this approach to other teams/tools.

  1. Seeing is believing: on the use of image databases for visually exploring plant organelle dynamics.

    PubMed

    Mano, Shoji; Miwa, Tomoki; Nishikawa, Shuh-ichi; Mimura, Tetsuro; Nishimura, Mikio

    2009-12-01

    Organelle dynamics vary dramatically depending on cell type, developmental stage and environmental stimuli, so that various parameters, such as size, number and behavior, are required for the description of the dynamics of each organelle. Imaging techniques are superior to other techniques for describing organelle dynamics because these parameters are visually exhibited. Therefore, as the results can be seen immediately, investigators can more easily grasp organelle dynamics. At present, imaging techniques are emerging as fundamental tools in plant organelle research, and the development of new methodologies to visualize organelles and the improvement of analytical tools and equipment have allowed the large-scale generation of image and movie data. Accordingly, image databases that accumulate information on organelle dynamics are an increasingly indispensable part of modern plant organelle research. In addition, image databases are potentially rich data sources for computational analyses, as image and movie data reposited in the databases contain valuable and significant information, such as size, number, length and velocity. Computational analytical tools support image-based data mining, such as segmentation, quantification and statistical analyses, to extract biologically meaningful information from each database and combine them to construct models. In this review, we outline the image databases that are dedicated to plant organelle research and present their potential as resources for image-based computational analyses.

  2. IBM's Health Analytics and Clinical Decision Support.

    PubMed

    Kohn, M S; Sun, J; Knoop, S; Shabo, A; Carmeli, B; Sow, D; Syed-Mahmood, T; Rapp, W

    2014-08-15

    This survey explores the role of big data and health analytics developed by IBM in supporting the transformation of healthcare by augmenting evidence-based decision-making. Some problems in healthcare and strategies for change are described. It is argued that change requires better decisions, which, in turn, require better use of the many kinds of healthcare information. Analytic resources that address each of the information challenges are described. Examples of the role of each of the resources are given. There are powerful analytic tools that utilize the various kinds of big data in healthcare to help clinicians make more personalized, evidenced-based decisions. Such resources can extract relevant information and provide insights that clinicians can use to make evidence-supported decisions. There are early suggestions that these resources have clinical value. As with all analytic tools, they are limited by the amount and quality of data. Big data is an inevitable part of the future of healthcare. There is a compelling need to manage and use big data to make better decisions to support the transformation of healthcare to the personalized, evidence-supported model of the future. Cognitive computing resources are necessary to manage the challenges in employing big data in healthcare. Such tools have been and are being developed. The analytic resources, themselves, do not drive, but support healthcare transformation.

  3. PubChemSR: A search and retrieval tool for PubChem

    PubMed Central

    Hur, Junguk; Wild, David J

    2008-01-01

    Background Recent years have seen an explosion in the amount of publicly available chemical and related biological information. A significant step has been the emergence of PubChem, which contains property information for millions of chemical structures, and acts as a repository of compounds and bioassay screening data for the NIH Roadmap. There is a strong need for tools designed for scientists that permit easy download and use of these data. We present one such tool, PubChemSR. Implementation PubChemSR (Search and Retrieve) is a freely available desktop application written for Windows using Microsoft .NET that is designed to assist scientists in search, retrieval and organization of chemical and biological data from the PubChem database. It employs SOAP web services made available by NCBI for extraction of information from PubChem. Results and Discussion The program supports a wide range of searching techniques, including queries based on assay or compound keywords and chemical substructures. Results can be examined individually or downloaded and exported in batch for use in other programs such as Microsoft Excel. We believe that PubChemSR makes it straightforward for researchers to utilize the chemical, biological and screening data available in PubChem. We present several examples of how it can be used. PMID:18482452

  4. Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.

    PubMed

    Stockton, David B; Santamaria, Fidel

    2017-10-01

    We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.

  5. Evaluation of a web based informatics system with data mining tools for predicting outcomes with quantitative imaging features in stroke rehabilitation clinical trials

    NASA Astrophysics Data System (ADS)

    Wang, Ximing; Kim, Bokkyu; Park, Ji Hoon; Wang, Erik; Forsyth, Sydney; Lim, Cody; Ravi, Ragini; Karibyan, Sarkis; Sanchez, Alexander; Liu, Brent

    2017-03-01

    Quantitative imaging biomarkers are used widely in clinical trials for tracking and evaluation of medical interventions. Previously, we have presented a web based informatics system utilizing quantitative imaging features for predicting outcomes in stroke rehabilitation clinical trials. The system integrates imaging features extraction tools and a web-based statistical analysis tool. The tools include a generalized linear mixed model(GLMM) that can investigate potential significance and correlation based on features extracted from clinical data and quantitative biomarkers. The imaging features extraction tools allow the user to collect imaging features and the GLMM module allows the user to select clinical data and imaging features such as stroke lesion characteristics from the database as regressors and regressands. This paper discusses the application scenario and evaluation results of the system in a stroke rehabilitation clinical trial. The system was utilized to manage clinical data and extract imaging biomarkers including stroke lesion volume, location and ventricle/brain ratio. The GLMM module was validated and the efficiency of data analysis was also evaluated.

  6. Development of Mackintosh Probe Extractor

    NASA Astrophysics Data System (ADS)

    Rahman, Noor Khazanah A.; Kaamin, Masiri; Suwandi, Amir Khan; Sahat, Suhaila; Jahaya Kesot, Mohd

    2016-11-01

    Dynamic probing is a continuous soil investigation technique, which is one of the simplest soil penetration test. It basically consist of repeatedly driving a metal tipped probe into the ground using a drop weight of fixed mass and travel. Testing was carried out continuously from ground level to the final penetration depth. Once the soil investigation work done, it is difficult to pull out the probe rod from the ground, due to strong soil structure grip against probe cone and prevent the probe rod out from the ground. Thus, in this case, a tool named Extracting Probe was created to assist in the process of retracting the probe rod from the ground. In addition, Extracting Probe also can reduce the time to extract the probe rod from the ground compare with the conventional method. At the same time, it also can reduce manpower cost because only one worker involve to handle this tool compare with conventional method used two or more workers. From experiment that have been done we found that the time difference between conventional tools and extracting probe is significant, average time difference is 155 minutes. In addition the extracting probe can reduce manpower usage, and also labour cost for operating the tool. With all these advantages makes this tool has the potential to be marketed.

  7. A fresh approach to forecasting in astroparticle physics and dark matter searches

    NASA Astrophysics Data System (ADS)

    Edwards, Thomas D. P.; Weniger, Christoph

    2018-02-01

    We present a toolbox of new techniques and concepts for the efficient forecasting of experimental sensitivities. These are applicable to a large range of scenarios in (astro-)particle physics, and based on the Fisher information formalism. Fisher information provides an answer to the question 'what is the maximum extractable information from a given observation?'. It is a common tool for the forecasting of experimental sensitivities in many branches of science, but rarely used in astroparticle physics or searches for particle dark matter. After briefly reviewing the Fisher information matrix of general Poisson likelihoods, we propose very compact expressions for estimating expected exclusion and discovery limits ('equivalent counts method'). We demonstrate by comparison with Monte Carlo results that they remain surprisingly accurate even deep in the Poisson regime. We show how correlated background systematics can be efficiently accounted for by a treatment based on Gaussian random fields. Finally, we introduce the novel concept of Fisher information flux. It can be thought of as a generalization of the commonly used signal-to-noise ratio, while accounting for the non-local properties and saturation effects of background and instrumental uncertainties. It is a powerful and flexible tool ready to be used as core concept for informed strategy development in astroparticle physics and searches for particle dark matter.

  8. Text Mining for Protein Docking

    PubMed Central

    Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.

    2015-01-01

    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466

  9. Definition of information technology architectures for continuous data management and medical device integration in diabetes.

    PubMed

    Hernando, M Elena; Pascual, Mario; Salvador, Carlos H; García-Sáez, Gema; Rodríguez-Herrero, Agustín; Martínez-Sarriegui, Iñaki; Gómez, Enrique J

    2008-09-01

    The growing availability of continuous data from medical devices in diabetes management makes it crucial to define novel information technology architectures for efficient data storage, data transmission, and data visualization. The new paradigm of care demands the sharing of information in interoperable systems as the only way to support patient care in a continuum of care scenario. The technological platforms should support all the services required by the actors involved in the care process, located in different scenarios and managing diverse information for different purposes. This article presents basic criteria for defining flexible and adaptive architectures that are capable of interoperating with external systems, and integrating medical devices and decision support tools to extract all the relevant knowledge to support diabetes care.

  10. A brief understanding of process optimisation in microwave-assisted extraction of botanical materials: options and opportunities with chemometric tools.

    PubMed

    Das, Anup Kumar; Mandal, Vivekananda; Mandal, Subhash C

    2014-01-01

    Extraction forms the very basic step in research on natural products for drug discovery. A poorly optimised and planned extraction methodology can jeopardise the entire mission. To provide a vivid picture of different chemometric tools and planning for process optimisation and method development in extraction of botanical material, with emphasis on microwave-assisted extraction (MAE) of botanical material. A review of studies involving the application of chemometric tools in combination with MAE of botanical materials was undertaken in order to discover what the significant extraction factors were. Optimising a response by fine-tuning those factors, experimental design or statistical design of experiment (DoE), which is a core area of study in chemometrics, was then used for statistical analysis and interpretations. In this review a brief explanation of the different aspects and methodologies related to MAE of botanical materials that were subjected to experimental design, along with some general chemometric tools and the steps involved in the practice of MAE, are presented. A detailed study on various factors and responses involved in the optimisation is also presented. This article will assist in obtaining a better insight into the chemometric strategies of process optimisation and method development, which will in turn improve the decision-making process in selecting influential extraction parameters. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Kudi: A free open-source python library for the analysis of properties along reaction paths.

    PubMed

    Vogt-Geisse, Stefan

    2016-05-01

    With increasing computational capabilities, an ever growing amount of data is generated in computational chemistry that contains a vast amount of chemically relevant information. It is therefore imperative to create new computational tools in order to process and extract this data in a sensible way. Kudi is an open source library that aids in the extraction of chemical properties from reaction paths. The straightforward structure of Kudi makes it easy to use for users and allows for effortless implementation of new capabilities, and extension to any quantum chemistry package. A use case for Kudi is shown for the tautomerization reaction of formic acid. Kudi is available free of charge at www.github.com/stvogt/kudi.

  12. Making a protein extract from plant pathogenic fungi for gel- and LC-based proteomics.

    PubMed

    Fernández, Raquel González; Redondo, Inmaculada; Jorrin-Novo, Jesus V

    2014-01-01

    Proteomic technologies have become a successful tool to provide relevant information on fungal biology. In the case of plant pathogenic fungi, this approach would allow a deeper knowledge of the interaction and the biological cycle of the pathogen, as well as the identification of pathogenicity and virulence factors. These two elements open up new possibilities for crop disease diagnosis and environment-friendly crop protection. Phytopathogenic fungi, due to its particular cellular characteristics, can be considered as a recalcitrant biological material, which makes it difficult to obtain quality protein samples for proteomic analysis. This chapter focuses on protein extraction for gel- and LC-based proteomics with specific protocols of our current research with Botrytis cinerea.

  13. The potential of satellite data to study individual wildfire events

    NASA Astrophysics Data System (ADS)

    Benali, Akli; López-Saldana, Gerardo; Russo, Ana; Sá, Ana C. L.; Pinto, Renata M. S.; Nikos, Koutsias; Owen, Price; Pereira, Jose M. C.

    2014-05-01

    Large wildfires have important social, economic and environmental impacts. In order to minimize their impacts, understand their main drivers and study their dynamics, different approaches have been used. The reconstruction of individual wildfire events is usually done by collection of field data, interviews and by implementing fire spread simulations. All these methods have clear limitations in terms of spatial and temporal coverage, accuracy, subjectivity of the collected information and lack of objective independent validation information. In this sense, remote sensing is a promising tool with the potential to provide relevant information for stakeholders and the research community, by complementing or filling gaps in existing information and providing independent accurate quantitative information. In this work we show the potential of satellite data to provide relevant information regarding the dynamics of individual large wildfire events, filling an important gap in wildfire research. We show how MODIS active-fire data, acquired up to four times per day, and satellite-derived burnt perimeters can be combined to extract relevant information wildfire events by describing the methods involved and presenting results for four regions of the world: Portugal, Greece, SE Australia and California. The information that can be retrieved encompasses the start and end date of a wildfire event and its ignition area. We perform an evaluation of the information retrieved by comparing the satellite-derived parameters with national databases, highlighting the strengths and weaknesses of both and showing how the former can complement the latter leading to more complete and accurate datasets. We also show how the spatio-temporal distribution of wildfire spread dynamics can be reconstructed using satellite-derived active-fires and how relevant descriptors can be extracted. Applying graph theory to satellite active-fire data, we define the major fire spread paths that yield information about the major spatial corridors through which fires spread, and their relative importance in the full fire event. These major fire paths are then used to extract relevant descriptors, such as the distribution of fire spread direction, rate of spread and fire intensity (i.e. energy emitted). The reconstruction of the fire spread is shown for some case studies for Portugal and is also compared with fire progressions obtained by air-borne sensors for SE Australia. The approach shows solid results, providing a valuable tool for the reconstruction of individual fire events, understand their complex spread patterns and their main drivers of fire propagation. The major fire pathsand the spatio-temporal distribution of active fires are being currently combined with fire spread simulations within the scope oftheFIRE-MODSATproject, to provideuseful information to support and improve fire suppression strategies.

  14. SERDP and ESTCP Expert Panel Workshop on Research and Development Needs for the Environmental Remediation Application of Molecular Biological Tools

    DTIC Science & Technology

    2005-10-01

    used to infer metabolic rates in marine systems. For example, there is evidence from both pure cultures and environmental samples that rbcL...It includes many useful bioinformatics features such as constructing a neighbor-joining tree for a subset of sequences, downloading a subset of...further provide software that allow users to extract useful information from sequences. The most commonly used feature is probe/primer design

  15. ReGaTE: Registration of Galaxy Tools in Elixir.

    PubMed

    Doppelt-Azeroual, Olivia; Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé

    2017-06-01

    Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE . © The Author 2017. Published by Oxford University Press.

  16. Surface EMG signals based motion intent recognition using multi-layer ELM

    NASA Astrophysics Data System (ADS)

    Wang, Jianhui; Qi, Lin; Wang, Xiao

    2017-11-01

    The upper-limb rehabilitation robot is regard as a useful tool to help patients with hemiplegic to do repetitive exercise. The surface electromyography (sEMG) contains motion information as the electric signals are generated and related to nerve-muscle motion. These sEMG signals, representing human's intentions of active motions, are introduced into the rehabilitation robot system to recognize upper-limb movements. Traditionally, the feature extraction is an indispensable part of drawing significant information from original signals, which is a tedious task requiring rich and related experience. This paper employs a deep learning scheme to extract the internal features of the sEMG signals using an advanced Extreme Learning Machine based auto-encoder (ELMAE). The mathematical information contained in the multi-layer structure of the ELM-AE is used as the high-level representation of the internal features of the sEMG signals, and thus a simple ELM can post-process the extracted features, formulating the entire multi-layer ELM (ML-ELM) algorithm. The method is employed for the sEMG based neural intentions recognition afterwards. The case studies show the adopted deep learning algorithm (ELM-AE) is capable of yielding higher classification accuracy compared to the Principle Component Analysis (PCA) scheme in 5 different types of upper-limb motions. This indicates the effectiveness and the learning capability of the ML-ELM in such motion intent recognition applications.

  17. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association

    PubMed Central

    Ma, Jian; Casey, Cameron P.; Zheng, Xueyun; Ibrahim, Yehia M.; Wilkins, Christopher S.; Renslow, Ryan S.; Thomas, Dennis G.; Payne, Samuel H.; Monroe, Matthew E.; Smith, Richard D.; Teeguarden, Justin G.; Baker, Erin S.; Metz, Thomas O.

    2017-01-01

    Abstract Motivation: Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. Results: We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. Availability and implementation: PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library. Contact: erin.baker@pnnl.gov or thomas.metz@pnnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28505286

  18. Development of a UPLC-MS/MS method for the determination of ten anticancer drugs in hospital and urban wastewaters, and its application for the screening of human metabolites assisted by information-dependent acquisition tool (IDA) in sewage samples.

    PubMed

    Ferrando-Climent, L; Rodriguez-Mozaz, S; Barceló, D

    2013-07-01

    In the present work, the development, optimization, and validation (including a whole stability study) of a fast, reliable, and comprehensive method for the analysis of ten anticancer drugs in hospital and urban wastewater is described. Extraction of these pharmaceutical compounds was performed using automated off-line solid-phase extraction followed by their determination by ultra-performance liquid chromatography coupled to a triple quadrupole-linear ion trap mass spectrometer. Target compounds include nine cytotoxic agents: cyclophosphamide, ifosfamide, docetaxel, paclitaxel, etoposide, vincristine, tamoxifen, methotrexate, and azathioprine; and the cytotoxic quinolone, ciprofloxacin. Method detection limits (MDL) ranged from 0.8 to 24 ng/L. Levels found of cytostatic agents in the hospital and wastewater influents did not differ significantly, and therefore, hospitals cannot be considered as the primary source of this type of contaminants. All the target compounds were detected in at least one of the influent samples analyzed: Ciprofloxacin, cyclophosphamide, tamoxifen, and azathioprine were found in most of them and achieving maximum levels of 14.725, 0.201, 0.133, and 0.188 μg/L, respectively. The rest of target cancer drugs were less frequently detected and at values ranging between MDL and 0.406 μg/L. Furthermore, a feasible, useful, and advantageous approach based on information acquisition tool (information-dependent acquisition) was used for the screening of human metabolites in hospital effluents, where the hydroxy tamoxifen, endoxifen, and carboxyphosphamide were detected.

  19. Entropic Profiler – detection of conservation in genomes using information theory

    PubMed Central

    Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

    2009-01-01

    Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538

  20. Automated detection of qualitative spatio-temporal features in electrocardiac activation maps.

    PubMed

    Ironi, Liliana; Tentoni, Stefania

    2007-02-01

    This paper describes a piece of work aiming at the realization of a tool for the automated interpretation of electrocardiac maps. Such maps can capture a number of electrical conduction pathologies, such as arrhytmia, that can be missed by the analysis of traditional electrocardiograms. But, their introduction into the clinical practice is still far away as their interpretation requires skills that belongs to very few experts. Then, an automated interpretation tool would bridge the gap between the established research outcome and clinical practice with a consequent great impact on health care. Qualitative spatial reasoning can play a crucial role in the identification of spatio-temporal patterns and salient features that characterize the heart electrical activity. We adopted the spatial aggregation (SA) conceptual framework and an interplay of numerical and qualitative information to extract features from epicardial maps, and to make them available for reasoning tasks. Our focus is on epicardial activation isochrone maps as they are a synthetic representation of spatio-temporal aspects of the propagation of the electrical excitation. We provide a computational SA-based methodology to extract, from 3D epicardial data gathered over time, (1) the excitation wavefront structure, and (2) the salient features that characterize wavefront propagation and visually correspond to specific geometric objects. The proposed methodology provides a robust and efficient way to identify salient pieces of information in activation time maps. The hierarchical structure of the abstracted geometric objects, crucial in capturing the prominent information, facilitates the definition of general rules necessary to infer the correlation between pathophysiological patterns and wavefront structure and propagation.

  1. Extraction, integration and analysis of alternative splicing and protein structure distributed information

    PubMed Central

    D'Antonio, Matteo; Masseroli, Marco

    2009-01-01

    Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075

  2. Automated Fluid Feature Extraction from Transient Simulations

    NASA Technical Reports Server (NTRS)

    Haimes, Robert

    1998-01-01

    In the past, feature extraction and identification were interesting concepts, but not required to understand the underlying physics of a steady flow field. This is because the results of the more traditional tools like iso-surfaces, cuts and streamlines were more interactive and easily abstracted so they could be represented to the investigator. These tools worked and properly conveyed the collected information at the expense of much interaction. For unsteady flow-fields, the investigator does not have the luxury of spending time scanning only one 'snap-shot' of the simulation. Automated assistance is required in pointing out areas of potential interest contained within the flow. This must not require a heavy compute burden (the visualization should not significantly slow down the solution procedure for co-processing environments like pV3). And methods must be developed to abstract the feature and display it in a manner that physically makes sense. The following is a list of the important physical phenomena found in transient (and steady-state) fluid flow: Shocks; Vortex ores; Regions of Recirculation; Boundary Layers; Wakes.

  3. An application of Chan-Vese method used to determine the ROI area in CT lung screening

    NASA Astrophysics Data System (ADS)

    Prokop, Paweł; Surtel, Wojciech

    2016-09-01

    The article presents two approaches of determining the ROI area in CT lung screening. First approach is based on a classic method of framing the image in order to determine the ROI by using a MaZda tool. Second approach is based on segmentation of CT images of the lungs and reducing the redundant information from the image. Of the two approaches of an Active Contour, it was decided to choose the Chan-Vese method. In order to determine the effectiveness of the approach, it was performed an analysis of received ROI texture and extraction of textural features. In order to determine the effectiveness of the method, it was performed an analysis of the received ROI textures and extraction of the texture features, by using a Mazda tool. The results were compared and presented in the form of the radar graphs. The second approach proved to be effective and appropriate and consequently it is used for further analysis of CT images, in the computer-aided diagnosis of sarcoidosis.

  4. SACA: Software Assisted Call Analysis--an interactive tool supporting content exploration, online guidance and quality improvement of counseling dialogues.

    PubMed

    Trinkaus, Hans L; Gaisser, Andrea E

    2010-09-01

    Nearly 30,000 individual inquiries are answered annually by the telephone cancer information service (CIS, KID) of the German Cancer Research Center (DKFZ). The aim was to develop a tool for evaluating these calls, and to support the complete counseling process interactively. A novel software tool is introduced, based on a structure similar to a music score. Treating the interaction as a "duet", guided by the CIS counselor, the essential contents of the dialogue are extracted automatically. For this, "trained speech recognition" is applied to the (known) counselor's part, and "keyword spotting" is used on the (unknown) client's part to pick out specific items from the "word streams". The outcomes fill an abstract score representing the dialogue. Pilot tests performed on a prototype of SACA (Software Assisted Call Analysis) resulted in a basic proof of concept: Demographic data as well as information regarding the situation of the caller could be identified. The study encourages following up on the vision of an integrated SACA tool for supporting calls online and performing statistics on its knowledge database offline. Further research perspectives are to check SACA's potential in comparison with established interaction analysis systems like RIAS. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.

  5. Towards elicitation of users requirements for hospital information system: from a care process modelling technique to a web based collaborative tool.

    PubMed Central

    Staccini, Pascal M.; Joubert, Michel; Quaranta, Jean-Francois; Fieschi, Marius

    2002-01-01

    Growing attention is being given to the use of process modeling methodology for user requirements elicitation. In the analysis phase of hospital information systems, the usefulness of care-process models has been investigated to evaluate the conceptual applicability and practical understandability by clinical staff and members of users teams. Nevertheless, there still remains a gap between users and analysts in their mutual ability to share conceptual views and vocabulary, keeping the meaning of clinical context while providing elements for analysis. One of the solutions for filling this gap is to consider the process model itself in the role of a hub as a centralized means of facilitating communication between team members. Starting with a robust and descriptive technique for process modeling called IDEF0/SADT, we refined the basic data model by extracting concepts from ISO 9000 process analysis and from enterprise ontology. We defined a web-based architecture to serve as a collaborative tool and implemented it using an object-oriented database. The prospects of such a tool are discussed notably regarding to its ability to generate data dictionaries and to be used as a navigation tool through the medium of hospital-wide documentation. PMID:12463921

  6. Towards elicitation of users requirements for hospital information system: from a care process modelling technique to a web based collaborative tool.

    PubMed

    Staccini, Pascal M; Joubert, Michel; Quaranta, Jean-Francois; Fieschi, Marius

    2002-01-01

    Growing attention is being given to the use of process modeling methodology for user requirements elicitation. In the analysis phase of hospital information systems, the usefulness of care-process models has been investigated to evaluate the conceptual applicability and practical understandability by clinical staff and members of users teams. Nevertheless, there still remains a gap between users and analysts in their mutual ability to share conceptual views and vocabulary, keeping the meaning of clinical context while providing elements for analysis. One of the solutions for filling this gap is to consider the process model itself in the role of a hub as a centralized means of facilitating communication between team members. Starting with a robust and descriptive technique for process modeling called IDEF0/SADT, we refined the basic data model by extracting concepts from ISO 9000 process analysis and from enterprise ontology. We defined a web-based architecture to serve as a collaborative tool and implemented it using an object-oriented database. The prospects of such a tool are discussed notably regarding to its ability to generate data dictionaries and to be used as a navigation tool through the medium of hospital-wide documentation.

  7. Fluctuating Finite Element Analysis (FFEA): A continuum mechanics software tool for mesoscale simulation of biomolecules.

    PubMed

    Solernou, Albert; Hanson, Benjamin S; Richardson, Robin A; Welch, Robert; Read, Daniel J; Harlen, Oliver G; Harris, Sarah A

    2018-03-01

    Fluctuating Finite Element Analysis (FFEA) is a software package designed to perform continuum mechanics simulations of proteins and other globular macromolecules. It combines conventional finite element methods with stochastic thermal noise, and is appropriate for simulations of large proteins and protein complexes at the mesoscale (length-scales in the range of 5 nm to 1 μm), where there is currently a paucity of modelling tools. It requires 3D volumetric information as input, which can be low resolution structural information such as cryo-electron tomography (cryo-ET) maps or much higher resolution atomistic co-ordinates from which volumetric information can be extracted. In this article we introduce our open source software package for performing FFEA simulations which we have released under a GPLv3 license. The software package includes a C ++ implementation of FFEA, together with tools to assist the user to set up the system from Electron Microscopy Data Bank (EMDB) or Protein Data Bank (PDB) data files. We also provide a PyMOL plugin to perform basic visualisation and additional Python tools for the analysis of FFEA simulation trajectories. This manuscript provides a basic background to the FFEA method, describing the implementation of the core mechanical model and how intermolecular interactions and the solvent environment are included within this framework. We provide prospective FFEA users with a practical overview of how to set up an FFEA simulation with reference to our publicly available online tutorials and manuals that accompany this first release of the package.

  8. A Model-Driven Visualization Tool for Use with Model-Based Systems Engineering Projects

    NASA Technical Reports Server (NTRS)

    Trase, Kathryn; Fink, Eric

    2014-01-01

    Model-Based Systems Engineering (MBSE) promotes increased consistency between a system's design and its design documentation through the use of an object-oriented system model. The creation of this system model facilitates data presentation by providing a mechanism from which information can be extracted by automated manipulation of model content. Existing MBSE tools enable model creation, but are often too complex for the unfamiliar model viewer to easily use. These tools do not yet provide many opportunities for easing into the development and use of a system model when system design documentation already exists. This study creates a Systems Modeling Language (SysML) Document Traceability Framework (SDTF) for integrating design documentation with a system model, and develops an Interactive Visualization Engine for SysML Tools (InVEST), that exports consistent, clear, and concise views of SysML model data. These exported views are each meaningful to a variety of project stakeholders with differing subjects of concern and depth of technical involvement. InVEST allows a model user to generate multiple views and reports from a MBSE model, including wiki pages and interactive visualizations of data. System data can also be filtered to present only the information relevant to the particular stakeholder, resulting in a view that is both consistent with the larger system model and other model views. Viewing the relationships between system artifacts and documentation, and filtering through data to see specialized views improves the value of the system as a whole, as data becomes information

  9. Sieve-based relation extraction of gene regulatory networks from biological literature

    PubMed Central

    2015-01-01

    Background Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. Results We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Conclusions Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. PMID:26551454

  10. Sieve-based relation extraction of gene regulatory networks from biological literature.

    PubMed

    Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko

    2015-01-01

    Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

  11. Data Content and Exchange in General Practice: a Review

    PubMed Central

    Kalankesh, Leila R; Farahbakhsh, Mostafa; Rahimi, Niloofar

    2014-01-01

    Background: efficient communication of data is inevitable requirement for general practice. Any issue in data content and its exchange among GP and other related entities hinders continuity of patient care. Methods: literature search for this review was conducted on three electronic databases including Medline, Scopus and Science Direct. Results: through reviewing papers, we extracted information on the GP data content, use cases of GP information exchange, its participants, tools and methods, incentives and barriers. Conclusion: considering importance of data content and exchange for GP systems, it seems that more research is needed to be conducted toward providing a comprehensive framework for data content and exchange in GP systems. PMID:25648317

  12. Protocols for the Investigation of Information Processing in Human Assessment of Fundamental Movement Skills.

    PubMed

    Ward, Brodie J; Thornton, Ashleigh; Lay, Brendan; Rosenberg, Michael

    2017-01-01

    Fundamental movement skill (FMS) assessment remains an important tool in classifying individuals' level of FMS proficiency. The collection of FMS performances for assessment and monitoring has remained unchanged over the last few decades, but new motion capture technologies offer opportunities to automate this process. To achieve this, a greater understanding of the human process of movement skill assessment is required. The authors present the rationale and protocols of a project in which they aim to investigate the visual search patterns and information extraction employed by human assessors during FMS assessment, as well as the implementation of the Kinect system for FMS capture.

  13. 3D Feature Extraction for Unstructured Grids

    NASA Technical Reports Server (NTRS)

    Silver, Deborah

    1996-01-01

    Visualization techniques provide tools that help scientists identify observed phenomena in scientific simulation. To be useful, these tools must allow the user to extract regions, classify and visualize them, abstract them for simplified representations, and track their evolution. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This article explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and those from Finite Element Analysis.

  14. Ontologies in medicinal chemistry: current status and future challenges.

    PubMed

    Gómez-Pérez, Asunción; Martínez-Romero, Marcos; Rodríguez-González, Alejandro; Vázquez, Guillermo; Vázquez-Naya, José M

    2013-01-01

    Recent years have seen a dramatic increase in the amount and availability of data in the diverse areas of medicinal chemistry, making it possible to achieve significant advances in fields such as the design, synthesis and biological evaluation of compounds. However, with this data explosion, the storage, management and analysis of available data to extract relevant information has become even a more complex task that offers challenging research issues to Artificial Intelligence (AI) scientists. Ontologies have emerged in AI as a key tool to formally represent and semantically organize aspects of the real world. Beyond glossaries or thesauri, ontologies facilitate communication between experts and allow the application of computational techniques to extract useful information from available data. In medicinal chemistry, multiple ontologies have been developed during the last years which contain knowledge about chemical compounds and processes of synthesis of pharmaceutical products. This article reviews the principal standards and ontologies in medicinal chemistry, analyzes their main applications and suggests future directions.

  15. Using Best Practices to Extract, Organize, and Reuse Embedded Decision Support Content Knowledge Rules from Mature Clinical Systems

    PubMed Central

    DesAutels, Spencer J.; Fox, Zachary E.; Giuse, Dario A.; Williams, Annette M.; Kou, Qing-hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia

    2016-01-01

    Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems. PMID:28269846

  16. Utilization of remotely-sensed data in the management of inland wetlands

    NASA Technical Reports Server (NTRS)

    Carter, V.; Smith, D. G.

    1973-01-01

    The author has identified the following significant results. ERTS-1 data and aerial photography are proving to be a useful tool for the inventory and management of inland wetlands. Two examples of the application of remotely-sensed data to specific wetland management needs or requirements are discussed. Studies of the Great Dismal Swamp are utilizing ERTS-1 imagery and color IR photography in: (1) study area selection; (2) field inspection; (3) vegetation mapping; (4) identification of drainage characteristics and moisture regime; (5) location of intensive study areas; and (6) detection of change. Thematic extractions of ERTS-1 data made using the United States Geological Survey's Autographic Theme Extraction System are aiding analyses of swamp hydrologic regime and providing information pertinent to quick recognition and inventory of wetlands from ERTS-1. DCP'S in south Florida wetlands provide near-real time data for water resources managers. Data relayed by satellite can be entered into models to provide predictive data and water storage information for long-term and short-term decision making.

  17. An evaluation of the suitability of ERTS data for the purposes of petroleum exploration. [Anadarko Basin of Texas and Oklahoma

    NASA Technical Reports Server (NTRS)

    Collins, R. J.; Mccown, F. P.; Stonis, L. P.; Petzel, G.; Everett, J. R.

    1974-01-01

    This experiment was designed to determine the types and amounts of information valuable to petroleum exploration extractable from ERTS data and the cost of obtaining the information using traditional or conventional means. It was desired that an evaluation of this new petroleum exploration tool be made in a geologically well known area in order to assess its usefulness in an unknown area. The Anadarko Basin lies in western Oklahoma and the panhandle of Texas. It was chosen as a test site because there is a great deal of published information available on the surface and subsurface geology of the area, and there are many known structures that act as traps for hydrocarbons. This basin is similar to several other large epicontinental sedimentary basins. It was found that ERTS imagery is an excellent tool for reconnaissance exploration of large sedimentary basins or new exploration provinces. For the first time, small and medium size oil companies can rapidly and effectively analyze exploration provinces as a whole.

  18. Text mining applications in psychiatry: a systematic literature review.

    PubMed

    Abbe, Adeline; Grouin, Cyril; Zweigenbaum, Pierre; Falissard, Bruno

    2016-06-01

    The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  19. RadSearch: a RIS/PACS integrated query tool

    NASA Astrophysics Data System (ADS)

    Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

    2008-03-01

    Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.

  20. CRF: detection of CRISPR arrays using random forest.

    PubMed

    Wang, Kai; Liang, Chun

    2017-01-01

    CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.

  1. NET: a new framework for the vectorization and examination of network data.

    PubMed

    Lasser, Jana; Katifori, Eleni

    2017-01-01

    The analysis of complex networks both in general and in particular as pertaining to real biological systems has been the focus of intense scientific attention in the past and present. In this paper we introduce two tools that provide fast and efficient means for the processing and quantification of biological networks like Drosophila tracheoles or leaf venation patterns: the Network Extraction Tool ( NET ) to extract data and the Graph-edit-GUI ( GeGUI ) to visualize and modify networks. NET is especially designed for high-throughput semi-automated analysis of biological datasets containing digital images of networks. The framework starts with the segmentation of the image and then proceeds to vectorization using methodologies from optical character recognition. After a series of steps to clean and improve the quality of the extracted data the framework produces a graph in which the network is represented only by its nodes and neighborhood-relations. The final output contains information about the adjacency matrix of the graph, the width of the edges and the positions of the nodes in space. NET also provides tools for statistical analysis of the network properties, such as the number of nodes or total network length. Other, more complex metrics can be calculated by importing the vectorized network to specialized network analysis packages. GeGUI is designed to facilitate manual correction of non-planar networks as these may contain artifacts or spurious junctions due to branches crossing each other. It is tailored for but not limited to the processing of networks from microscopy images of Drosophila tracheoles. The networks extracted by NET closely approximate the network depicted in the original image. NET is fast, yields reproducible results and is able to capture the full geometry of the network, including curved branches. Additionally GeGUI allows easy handling and visualization of the networks.

  2. SKIMMR: facilitating knowledge discovery in life sciences by machine-aided skim reading

    PubMed Central

    Burns, Gully A.P.C.

    2014-01-01

    Background. Unlike full reading, ‘skim-reading’ involves the process of looking quickly over information in an attempt to cover more material whilst still being able to retain a superficial view of the underlying content. Within this work, we specifically emulate this natural human activity by providing a dynamic graph-based view of entities automatically extracted from text. For the extraction, we use shallow parsing, co-occurrence analysis and semantic similarity computation techniques. Our main motivation is to assist biomedical researchers and clinicians in coping with increasingly large amounts of potentially relevant articles that are being published ongoingly in life sciences. Methods. To construct the high-level network overview of articles, we extract weighted binary statements from the text. We consider two types of these statements, co-occurrence and similarity, both organised in the same distributional representation (i.e., in a vector-space model). For the co-occurrence weights, we use point-wise mutual information that indicates the degree of non-random association between two co-occurring entities. For computing the similarity statement weights, we use cosine distance based on the relevant co-occurrence vectors. These statements are used to build fuzzy indices of terms, statements and provenance article identifiers, which support fuzzy querying and subsequent result ranking. These indexing and querying processes are then used to construct a graph-based interface for searching and browsing entity networks extracted from articles, as well as articles relevant to the networks being browsed. Last but not least, we describe a methodology for automated experimental evaluation of the presented approach. The method uses formal comparison of the graphs generated by our tool to relevant gold standards based on manually curated PubMed, TREC challenge and MeSH data. Results. We provide a web-based prototype (called ‘SKIMMR’) that generates a network of inter-related entities from a set of documents which a user may explore through our interface. When a particular area of the entity network looks interesting to a user, the tool displays the documents that are the most relevant to those entities of interest currently shown in the network. We present this as a methodology for browsing a collection of research articles. To illustrate the practical applicability of SKIMMR, we present examples of its use in the domains of Spinal Muscular Atrophy and Parkinson’s Disease. Finally, we report on the results of experimental evaluation using the two domains and one additional dataset based on the TREC challenge. The results show how the presented method for machine-aided skim reading outperforms tools like PubMed regarding focused browsing and informativeness of the browsing context. PMID:25097821

  3. Ad-Hoc Queries over Document Collections - A Case Study

    NASA Astrophysics Data System (ADS)

    Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

    We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.

  4. DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.

    PubMed

    Wang, Tianmin; Mori, Hiroshi; Zhang, Chong; Kurokawa, Ken; Xing, Xin-Hui; Yamada, Takuji

    2015-03-21

    Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature-based enzyme functional prediction tool to assign Enzyme Commission (EC) digits. DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes. Our results offer preliminarily confirmation of the existence of the hypothesized huge number of "hidden enzymes" in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences.

  5. CRIE: An automated analyzer for Chinese texts.

    PubMed

    Sung, Yao-Ting; Chang, Tao-Hsing; Lin, Wei-Chun; Hsieh, Kuan-Sheng; Chang, Kuo-En

    2016-12-01

    Textual analysis has been applied to various fields, such as discourse analysis, corpus studies, text leveling, and automated essay evaluation. Several tools have been developed for analyzing texts written in alphabetic languages such as English and Spanish. However, currently there is no tool available for analyzing Chinese-language texts. This article introduces a tool for the automated analysis of simplified and traditional Chinese texts, called the Chinese Readability Index Explorer (CRIE). Composed of four subsystems and incorporating 82 multilevel linguistic features, CRIE is able to conduct the major tasks of segmentation, syntactic parsing, and feature extraction. Furthermore, the integration of linguistic features with machine learning models enables CRIE to provide leveling and diagnostic information for texts in language arts, texts for learning Chinese as a foreign language, and texts with domain knowledge. The usage and validation of the functions provided by CRIE are also introduced.

  6. The Simple Video Coder: A free tool for efficiently coding social video data.

    PubMed

    Barto, Daniel; Bird, Clark W; Hamilton, Derek A; Fink, Brandi C

    2017-08-01

    Videotaping of experimental sessions is a common practice across many disciplines of psychology, ranging from clinical therapy, to developmental science, to animal research. Audio-visual data are a rich source of information that can be easily recorded; however, analysis of the recordings presents a major obstacle to project completion. Coding behavior is time-consuming and often requires ad-hoc training of a student coder. In addition, existing software is either prohibitively expensive or cumbersome, which leaves researchers with inadequate tools to quickly process video data. We offer the Simple Video Coder-free, open-source software for behavior coding that is flexible in accommodating different experimental designs, is intuitive for students to use, and produces outcome measures of event timing, frequency, and duration. Finally, the software also offers extraction tools to splice video into coded segments suitable for training future human coders or for use as input for pattern classification algorithms.

  7. De-blending deep Herschel surveys: A multi-wavelength approach

    NASA Astrophysics Data System (ADS)

    Pearson, W. J.; Wang, L.; van der Tak, F. F. S.; Hurley, P. D.; Burgarella, D.; Oliver, S. J.

    2017-07-01

    Aims: Cosmological surveys in the far-infrared are known to suffer from confusion. The Bayesian de-blending tool, XID+, currently provides one of the best ways to de-confuse deep Herschel SPIRE images, using a flat flux density prior. This work is to demonstrate that existing multi-wavelength data sets can be exploited to improve XID+ by providing an informed prior, resulting in more accurate and precise extracted flux densities. Methods: Photometric data for galaxies in the COSMOS field were used to constrain spectral energy distributions (SEDs) using the fitting tool CIGALE. These SEDs were used to create Gaussian prior estimates in the SPIRE bands for XID+. The multi-wavelength photometry and the extracted SPIRE flux densities were run through CIGALE again to allow us to compare the performance of the two priors. Inferred ALMA flux densities (FinferALMA), at 870 μm and 1250 μm, from the best fitting SEDs from the second CIGALE run were compared with measured ALMA flux densities (FmeasALMA) as an independent performance validation. Similar validations were conducted with the SED modelling and fitting tool MAGPHYS and modified black-body functions to test for model dependency. Results: We demonstrate a clear improvement in agreement between the flux densities extracted with XID+ and existing data at other wavelengths when using the new informed Gaussian prior over the original uninformed prior. The residuals between FmeasALMA and FinferALMA were calculated. For the Gaussian priors these residuals, expressed as a multiple of the ALMA error (σ), have a smaller standard deviation, 7.95σ for the Gaussian prior compared to 12.21σ for the flat prior; reduced mean, 1.83σ compared to 3.44σ; and have reduced skew to positive values, 7.97 compared to 11.50. These results were determined to not be significantly model dependent. This results in statistically more reliable SPIRE flux densities and hence statistically more reliable infrared luminosity estimates. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.

  8. a R-Shiny Based Phenology Analysis System and Case Study Using Digital Camera Dataset

    NASA Astrophysics Data System (ADS)

    Zhou, Y. K.

    2018-05-01

    Accurate extracting of the vegetation phenology information play an important role in exploring the effects of climate changes on vegetation. Repeated photos from digital camera is a useful and huge data source in phonological analysis. Data processing and mining on phenological data is still a big challenge. There is no single tool or a universal solution for big data processing and visualization in the field of phenology extraction. In this paper, we proposed a R-shiny based web application for vegetation phenological parameters extraction and analysis. Its main functions include phenological site distribution visualization, ROI (Region of Interest) selection, vegetation index calculation and visualization, data filtering, growth trajectory fitting, phenology parameters extraction, etc. the long-term observation photography data from Freemanwood site in 2013 is processed by this system as an example. The results show that: (1) this system is capable of analyzing large data using a distributed framework; (2) The combination of multiple parameter extraction and growth curve fitting methods could effectively extract the key phenology parameters. Moreover, there are discrepancies between different combination methods in unique study areas. Vegetation with single-growth peak is suitable for using the double logistic module to fit the growth trajectory, while vegetation with multi-growth peaks should better use spline method.

  9. Quality evaluation of Hypericum ascyron extract by two-dimensional high-performance liquid chromatography coupled with the colorimetric 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide method.

    PubMed

    Li, Xiu-Mei; Luo, Xue-Gang; Zhang, Chao-Zheng; Wang, Nan; Zhang, Tong-Cun

    2015-02-01

    In this paper, a heart-cutting two-dimensional high-performance liquid chromatography coupled with the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) method was established for controlling the quality of different batches of Hypericum ascyron extract for the first time. In comparison with the common one-dimensional fingerprint, the second-dimensional fingerprint compiled additional spectral data and was hence more informative. The quality of H. ascyron extract was further evaluated by similarity measures and the same results were achieved, the correlation coefficients of the similarity of ten batches of H. ascyron extract were >0.99. Furthermore, we also evaluated the quality of the ten batches of H. ascyron extract by antibacterial activity. The result demonstrated that the quality of the ten batches of H. ascyron extract was not significantly different by MTT. Finally, we demonstrated that the second-dimensional fingerprint coupled with the MTT method was a more powerful tool to characterize the quality of samples of batch to batch. Therefore the proposed method could be used to comprehensively conduct the quality control of traditional Chinese medicines. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Etna_NETVIS: A dedicated tool for automatically pre-processing high frequency data useful to extract geometrical parameters and track the evolution of the lava field

    NASA Astrophysics Data System (ADS)

    Marsella, Maria; Junior Valentino D'Aranno, Peppe; De Bonis, Roberto; Nardinocchi, Carla; Scifoni, Silvia; Scutti, Marianna; Sonnessa, Alberico; Wahbeh, Wissam; Biale, Emilio; Coltelli, Mauro; Pecora, Emilio; Prestifilippo, Michele; Proietti, Cristina

    2016-04-01

    In volcanic areas, where it could be difficult to gain access to the most critical zones for carrying out direct surveys, digital photogrammetry techniques are rarely experimented, although in many cases they proved to have remarkable potentialities, as the possibility to follow the evolution of volcanic (fracturing, vent positions, lava fields, lava front positions) and deformation processes (inflation/deflation and instability phenomena induced by volcanic activity). These results can be obtained, in the framework of standard surveillance activities, by acquiring multi-temporal datasets including Digital Orthophotos (DO) and Digital Elevation Models (DEM) to be used for implementing a quantitative and comparative analysis. The frequency of the surveys can be intensified during emergency phases to implement a quasi real-time monitoring for supporting civil protection actions. The high level of accuracy and the short time required for image processing make digital photogrammetry a suitable tool for controlling the evolution of volcanic processes which are usually characterized by large and rapid mass displacements. In order to optimize and extend the existing permanent ground NEtwork of Thermal and VIsible Sensors located on Mt. Etna (Etna_NETVIS) and to improve the observation of the most active areas, an approach for monitoring surface sin-eruptive processes was implemented. A dedicated tool for automatically pre-processing high frequency data, useful to extract geometrical parameters as well as to track the evolution of the lava field, was developed and tested both in simulated and real scenarios. The tool allows to extract a coherent multi-temporal dataset of orthophotos useful to evaluate active flow area and to estimate effusion rates. Furthermore, Etna_NETVIS data were used to downscale the information derived from satellite data and/or to integrate the satellite datasets in case of incomplete coverage or missing acquisitions. This work was developed in the framework of the EU-FP7 project "MED-SUV" (MEDiterranean SUpersite Volcanoes).

  11. Web-Based Tools for Data Visualization and Decision Support for South Asia

    NASA Astrophysics Data System (ADS)

    Jones, N.; Nelson, J.; Pulla, S. T.; Ames, D. P.; Souffront, M.; David, C. H.; Zaitchik, B. F.; Gatlin, P. N.; Matin, M. A.

    2017-12-01

    The objective of the NASA SERVIR project is to assist developing countries in using information provided by Earth observing satellites to assess and manage climate risks, land use, and water resources. We present a collection of web apps that integrate earth observations and in situ data to facilitate deployment of data and water resources models as decision-making tools in support of this effort. The interactive nature of web apps makes this an excellent medium for creating decision support tools that harness cutting edge modeling techniques. Thin client apps hosted in a cloud portal eliminates the need for the decision makers to procure and maintain the high performance hardware required by the models, deal with issues related to software installation and platform incompatibilities, or monitor and install software updates, a problem that is exacerbated for many of the regional SERVIR hubs where both financial and technical capacity may be limited. All that is needed to use the system is an Internet connection and a web browser. We take advantage of these technologies to develop tools which can be centrally maintained but openly accessible. Advanced mapping and visualization make results intuitive and information derived actionable. We also take advantage of the emerging standards for sharing water information across the web using the OGC and WMO approved WaterML standards. This makes our tools interoperable and extensible via application programming interfaces (APIs) so that tools and data from other projects can both consume and share the tools developed in our project. Our approach enables the integration of multiple types of data and models, thus facilitating collaboration between science teams in SERVIR. The apps developed thus far by our team process time-varying netCDF files from Earth observations and large-scale computer simulations and allow visualization and exploration via raster animation and extraction of time series at selected points and/or regions.

  12. Development of the SOFIA Image Processing Tool

    NASA Technical Reports Server (NTRS)

    Adams, Alexander N.

    2011-01-01

    The Stratospheric Observatory for Infrared Astronomy (SOFIA) is a Boeing 747SP carrying a 2.5 meter infrared telescope capable of operating between at altitudes of between twelve and fourteen kilometers, which is above more than 99 percent of the water vapor in the atmosphere. The ability to make observations above most water vapor coupled with the ability to make observations from anywhere, anytime, make SOFIA one of the world s premiere infrared observatories. SOFIA uses three visible light CCD imagers to assist in pointing the telescope. The data from these imagers is stored in archive files as is housekeeping data, which contains information such as boresight and area of interest locations. A tool that could both extract and process data from the archive files was developed.

  13. Real-Time Aerodynamic Flow and Data Visualization in an Interactive Virtual Environment

    NASA Technical Reports Server (NTRS)

    Schwartz, Richard J.; Fleming, Gary A.

    2005-01-01

    Significant advances have been made to non-intrusive flow field diagnostics in the past decade. Camera based techniques are now capable of determining physical qualities such as surface deformation, surface pressure and temperature, flow velocities, and molecular species concentration. In each case, extracting the pertinent information from the large volume of acquired data requires powerful and efficient data visualization tools. The additional requirement for real time visualization is fueled by an increased emphasis on minimizing test time in expensive facilities. This paper will address a capability titled LiveView3D, which is the first step in the development phase of an in depth, real time data visualization and analysis tool for use in aerospace testing facilities.

  14. Supernova bangs as a tool to study big bang

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blinnikov, S. I., E-mail: Sergei.Blinnikov@itep.ru

    Supernovae and gamma-ray bursts are the most powerful explosions in observed Universe. This educational review tells about supernovae and their applications in cosmology. It is explained how to understand the production of light in the most luminous events with minimum required energy of explosion. These most luminous phenomena can serve as primary cosmological distance indicators. Comparing the observed distance dependence on red shift with theoretical models one can extract information on evolution of the Universe from Big Bang until our epoch.

  15. Two-Dimensional Spectroscopy Is Being Used to Address Core Scientific Questions in Biology and Materials Science.

    PubMed

    Petti, Megan K; Lomont, Justin P; Maj, Michał; Zanni, Martin T

    2018-02-15

    Two-dimensional spectroscopy is a powerful tool for extracting structural and dynamic information from a wide range of chemical systems. We provide a brief overview of the ways in which two-dimensional visible and infrared spectroscopies are being applied to elucidate fundamental details of important processes in biological and materials science. The topics covered include amyloid proteins, photosynthetic complexes, ion channels, photovoltaics, batteries, as well as a variety of promising new methods in two-dimensional spectroscopy.

  16. Open-Source Programming for Automated Generation of Graphene Raman Spectral Maps

    NASA Astrophysics Data System (ADS)

    Vendola, P.; Blades, M.; Pierre, W.; Jedlicka, S.; Rotkin, S. V.

    Raman microscopy is a useful tool for studying the structural characteristics of graphene deposited onto substrates. However, extracting useful information from the Raman spectra requires data processing and 2D map generation. An existing home-built confocal Raman microscope was optimized for graphene samples and programmed to automatically generate Raman spectral maps across a specified area. In particular, an open source data collection scheme was generated to allow the efficient collection and analysis of the Raman spectral data for future use. NSF ECCS-1509786.

  17. Final Report on the Creation of the Wind Integration National Dataset (WIND) Toolkit and API: October 1, 2013 - September 30, 2015

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hodge, Bri-Mathias

    2016-04-08

    The primary objective of this work was to create a state-of-the-art national wind resource data set and to provide detailed wind plant output data for specific sites based on that data set. Corresponding retrospective wind forecasts were also included at all selected locations. The combined information from these activities was used to create the Wind Integration National Dataset (WIND), and an extraction tool was developed to allow web-based data access.

  18. Perspectives in astrophysical databases

    NASA Astrophysics Data System (ADS)

    Frailis, Marco; de Angelis, Alessandro; Roberto, Vito

    2004-07-01

    Astrophysics has become a domain extremely rich of scientific data. Data mining tools are needed for information extraction from such large data sets. This asks for an approach to data management emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Moreover, clustering and classification techniques on large data sets pose additional requirements in terms of computation and memory scalability and interpretability of results. In this study we review some possible solutions.

  19. An application programming interface for extreme precipitation and hazard products

    NASA Astrophysics Data System (ADS)

    Kirschbaum, D.; Stanley, T.; Cappelaere, P. G.; Reed, J.; Lammers, M.

    2016-12-01

    Remote sensing data provides situational awareness of extreme events and hazards over large areas in a way that is impossible to achieve with in situ data. However, more valuable than raw data is actionable information based on user needs. This information can take the form of derived products, extraction of a subset of variables in a larger data matrix, or data processing for a specific goal. These products can then stream to the end users, who can use these data to improve local to global decision making. This presentation will outline both the science and methodology of two new data products and tools that can provide relevant climate and hazard data for response and support. The Global Precipitation Measurement (GPM) mission provides near real-time information on rain and snow around the world every thirty minutes. Through a new applications programing interface (API), this data can be freely accessed by consumers to visualize, analyze, and communicate where, when and how much rain is falling worldwide. The second tool is a global landslide model that provides situational awareness of potential landslide activity in near real-time, utilizing several remotely sensed data products. This hazard information is also provided through an API and is being ingested by the emergency response community, international aid organizations, and others around the world. This presentation will highlight lessons learned through the development, implementation, and communication of these products and tools with the goal of enabling better and more effective decision making.

  20. Identification of Cichlid Fishes from Lake Malawi Using Computer Vision

    PubMed Central

    Joo, Deokjin; Kwan, Ye-seul; Song, Jongwoo; Pinho, Catarina; Hey, Jody; Won, Yong-Jin

    2013-01-01

    Background The explosively radiating evolution of cichlid fishes of Lake Malawi has yielded an amazing number of haplochromine species estimated as many as 500 to 800 with a surprising degree of diversity not only in color and stripe pattern but also in the shape of jaw and body among them. As these morphological diversities have been a central subject of adaptive speciation and taxonomic classification, such high diversity could serve as a foundation for automation of species identification of cichlids. Methodology/Principal Finding Here we demonstrate a method for automatic classification of the Lake Malawi cichlids based on computer vision and geometric morphometrics. For this end we developed a pipeline that integrates multiple image processing tools to automatically extract informative features of color and stripe patterns from a large set of photographic images of wild cichlids. The extracted information was evaluated by statistical classifiers Support Vector Machine and Random Forests. Both classifiers performed better when body shape information was added to the feature of color and stripe. Besides the coloration and stripe pattern, body shape variables boosted the accuracy of classification by about 10%. The programs were able to classify 594 live cichlid individuals belonging to 12 different classes (species and sexes) with an average accuracy of 78%, contrasting to a mere 42% success rate by human eyes. The variables that contributed most to the accuracy were body height and the hue of the most frequent color. Conclusions Computer vision showed a notable performance in extracting information from the color and stripe patterns of Lake Malawi cichlids although the information was not enough for errorless species identification. Our results indicate that there appears an unavoidable difficulty in automatic species identification of cichlid fishes, which may arise from short divergence times and gene flow between closely related species. PMID:24204918

  1. Layout pattern analysis using the Voronoi diagram of line segments

    NASA Astrophysics Data System (ADS)

    Dey, Sandeep Kumar; Cheilaris, Panagiotis; Gabrani, Maria; Papadopoulou, Evanthia

    2016-01-01

    Early identification of problematic patterns in very large scale integration (VLSI) designs is of great value as the lithographic simulation tools face significant timing challenges. To reduce the processing time, such a tool selects only a fraction of possible patterns which have a probable area of failure, with the risk of missing some problematic patterns. We introduce a fast method to automatically extract patterns based on their structure and context, using the Voronoi diagram of line-segments as derived from the edges of VLSI design shapes. Designers put line segments around the problematic locations in patterns called "gauges," along which the critical distance is measured. The gauge center is the midpoint of a gauge. We first use the Voronoi diagram of VLSI shapes to identify possible problematic locations, represented as gauge centers. Then we use the derived locations to extract windows containing the problematic patterns from the design layout. The problematic locations are prioritized by the shape and proximity information of the design polygons. We perform experiments for pattern selection in a portion of a 22-nm random logic design layout. The design layout had 38,584 design polygons (consisting of 199,946 line segments) on layer Mx, and 7079 markers generated by an optical rule checker (ORC) tool. The optical rules specify requirements for printing circuits with minimum dimension. Markers are the locations of some optical rule violations in the layout. We verify our approach by comparing the coverage of our extracted patterns to the ORC-generated markers. We further derive a similarity measure between patterns and between layouts. The similarity measure helps to identify a set of representative gauges that reduces the number of patterns for analysis.

  2. Advanced metrology by offline SEM data processing

    NASA Astrophysics Data System (ADS)

    Lakcher, Amine; Schneider, Loïc.; Le-Gratiet, Bertrand; Ducoté, Julien; Farys, Vincent; Besacier, Maxime

    2017-06-01

    Today's technology nodes contain more and more complex designs bringing increasing challenges to chip manufacturing process steps. It is necessary to have an efficient metrology to assess process variability of these complex patterns and thus extract relevant data to generate process aware design rules and to improve OPC models. Today process variability is mostly addressed through the analysis of in-line monitoring features which are often designed to support robust measurements and as a consequence are not always very representative of critical design rules. CD-SEM is the main CD metrology technique used in chip manufacturing process but it is challenged when it comes to measure metrics like tip to tip, tip to line, areas or necking in high quantity and with robustness. CD-SEM images contain a lot of information that is not always used in metrology. Suppliers have provided tools that allow engineers to extract the SEM contours of their features and to convert them into a GDS. Contours can be seen as the signature of the shape as it contains all the dimensional data. Thus the methodology is to use the CD-SEM to take high quality images then generate SEM contours and create a data base out of them. Contours are used to feed an offline metrology tool that will process them to extract different metrics. It was shown in two previous papers that it is possible to perform complex measurements on hotspots at different process steps (lithography, etch, copper CMP) by using SEM contours with an in-house offline metrology tool. In the current paper, the methodology presented previously will be expanded to improve its robustness and combined with the use of phylogeny to classify the SEM images according to their geometrical proximities.

  3. A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals

    PubMed Central

    2014-01-01

    Background Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. Results The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Conclusion Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database. PMID:24970564

  4. Development and applications of the Veterans Health Administration's Stratification Tool for Opioid Risk Mitigation (STORM) to improve opioid safety and prevent overdose and suicide.

    PubMed

    Oliva, Elizabeth M; Bowe, Thomas; Tavakoli, Sara; Martins, Susana; Lewis, Eleanor T; Paik, Meenah; Wiechers, Ilse; Henderson, Patricia; Harvey, Michael; Avoundjian, Tigran; Medhanie, Amanuel; Trafton, Jodie A

    2017-02-01

    Concerns about opioid-related adverse events, including overdose, prompted the Veterans Health Administration (VHA) to launch an Opioid Safety Initiative and Overdose Education and Naloxone Distribution program. To mitigate risks associated with opioid prescribing, a holistic approach that takes into consideration both risk factors (e.g., dose, substance use disorders) and risk mitigation interventions (e.g., urine drug screening, psychosocial treatment) is needed. This article describes the Stratification Tool for Opioid Risk Mitigation (STORM), a tool developed in VHA that reflects this holistic approach and facilitates patient identification and monitoring. STORM prioritizes patients for review and intervention according to their modeled risk for overdose/suicide-related events and displays risk factors and risk mitigation interventions obtained from VHA electronic medical record (EMR)-data extracts. Patients' estimated risk is based on a predictive risk model developed using fiscal year 2010 (FY2010: 10/1/2009-9/30/2010) EMR-data extracts and mortality data among 1,135,601 VHA patients prescribed opioid analgesics to predict risk for an overdose/suicide-related event in FY2011 (2.1% experienced an event). Cross-validation was used to validate the model, with receiver operating characteristic curves for the training and test data sets performing well (>.80 area under the curve). The predictive risk model distinguished patients based on risk for overdose/suicide-related adverse events, allowing for identification of high-risk patients and enrichment of target populations of patients with greater safety concerns for proactive monitoring and application of risk mitigation interventions. Results suggest that clinical informatics can leverage EMR-extracted data to identify patients at-risk for overdose/suicide-related events and provide clinicians with actionable information to mitigate risk. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. A comparative study of the SVM and K-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.

    PubMed

    Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian

    2014-06-27

    Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.

  6. PLACNETw: a web-based tool for plasmid reconstruction from bacterial genomes.

    PubMed

    Vielva, Luis; de Toro, María; Lanza, Val F; de la Cruz, Fernando

    2017-12-01

    PLACNET is a graph-based tool for reconstruction of plasmids from next generation sequence pair-end datasets. PLACNET graphs contain two types of nodes (assembled contigs and reference genomes) and two types of edges (scaffold links and homology to references). Manual pruning of the graphs is a necessary requirement in PLACNET, but this is difficult for users without solid bioinformatic background. PLACNETw, a webtool based on PLACNET, provides an interactive graphic interface, automates BLAST searches, and extracts the relevant information for decision making. It allows a user with domain expertise to visualize the scaffold graphs and related information of contigs as well as reference sequences, so that the pruning operations can be done interactively from a personal computer without the need for additional tools. After successful pruning, each plasmid becomes a separate connected component subgraph. The resulting data are automatically downloaded by the user. PLACNETw is freely available at https://castillo.dicom.unican.es/upload/. delacruz@unican.es. A tutorial video and several solved examples are available at https://castillo.dicom.unican.es/placnetw_video/ and https://castillo.dicom.unican.es/examples/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Neuronize: a tool for building realistic neuronal cell morphologies

    PubMed Central

    Brito, Juan P.; Mata, Susana; Bayona, Sofia; Pastor, Luis; DeFelipe, Javier; Benavides-Piccione, Ruth

    2013-01-01

    This study presents a tool, Neuronize, for building realistic three-dimensional models of neuronal cells from the morphological information extracted through computer-aided tracing applications. Neuronize consists of a set of methods designed to build 3D neural meshes that approximate the cell membrane at different resolution levels, allowing a balance to be reached between the complexity and the quality of the final model. The main contribution of the present study is the proposal of a novel approach to build a realistic and accurate 3D shape of the soma from the incomplete information stored in the digitally traced neuron, which usually consists of a 2D cell body contour. This technique is based on the deformation of an initial shape driven by the position and thickness of the first order dendrites. The addition of a set of spines along the dendrites completes the model, building a final 3D neuronal cell suitable for its visualization in a wide range of 3D environments. PMID:23761740

  8. A computational study on outliers in world music.

    PubMed

    Panteli, Maria; Benetos, Emmanouil; Dixon, Simon

    2017-01-01

    The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as 'outliers'. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the 'uniqueness' of the music of each country.

  9. A computational study on outliers in world music

    PubMed Central

    Benetos, Emmanouil; Dixon, Simon

    2017-01-01

    The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country. PMID:29253027

  10. Online Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases

    PubMed Central

    2005-01-01

    Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB. PMID:16046824

  11. Beverage and water intake of healthy adults in some European countries.

    PubMed

    Nissensohn, Mariela; Castro-Quezada, Itandehui; Serra-Majem, Lluis

    2013-11-01

    Nutritional surveys frequently collect some data of consumption of beverages; however, information from different sources and different methodologies raises issues of comparability. The main objective of this review was to examine the available techniques used for assessing beverage intake in European epidemiological studies and to describe the most frequent method applied to assess it. Information of beverage intake available from European surveys and nutritional epidemiological investigations was obtained from gray literature. Twelve articles were included and relevant data were extracted. The studies were carried out on healthy adults by different types of assessments. The most frequent tool used was a 7-d dietary record. Only Germany used a specific beverage assessment tool (Beverage Dietary History). From the limited data available and the diversity of the methodology used, the results show that consumption of beverages is different between countries. Current epidemiological studies in Europe focusing on beverage intake are scarce. Further research is needed to clarify the amount of beverage intake in European population.

  12. Neuronize: a tool for building realistic neuronal cell morphologies.

    PubMed

    Brito, Juan P; Mata, Susana; Bayona, Sofia; Pastor, Luis; Defelipe, Javier; Benavides-Piccione, Ruth

    2013-01-01

    This study presents a tool, Neuronize, for building realistic three-dimensional models of neuronal cells from the morphological information extracted through computer-aided tracing applications. Neuronize consists of a set of methods designed to build 3D neural meshes that approximate the cell membrane at different resolution levels, allowing a balance to be reached between the complexity and the quality of the final model. The main contribution of the present study is the proposal of a novel approach to build a realistic and accurate 3D shape of the soma from the incomplete information stored in the digitally traced neuron, which usually consists of a 2D cell body contour. This technique is based on the deformation of an initial shape driven by the position and thickness of the first order dendrites. The addition of a set of spines along the dendrites completes the model, building a final 3D neuronal cell suitable for its visualization in a wide range of 3D environments.

  13. Interactive access to LP DAAC satellite data archives through a combination of open-source and custom middleware web services

    USGS Publications Warehouse

    Davis, Brian N.; Werpy, Jason; Friesz, Aaron M.; Impecoven, Kevin; Quenzer, Robert; Maiersperger, Tom; Meyer, David J.

    2015-01-01

    Current methods of searching for and retrieving data from satellite land remote sensing archives do not allow for interactive information extraction. Instead, Earth science data users are required to download files over low-bandwidth networks to local workstations and process data before science questions can be addressed. New methods of extracting information from data archives need to become more interactive to meet user demands for deriving increasingly complex information from rapidly expanding archives. Moving the tools required for processing data to computer systems of data providers, and away from systems of the data consumer, can improve turnaround times for data processing workflows. The implementation of middleware services was used to provide interactive access to archive data. The goal of this middleware services development is to enable Earth science data users to access remote sensing archives for immediate answers to science questions instead of links to large volumes of data to download and process. Exposing data and metadata to web-based services enables machine-driven queries and data interaction. Also, product quality information can be integrated to enable additional filtering and sub-setting. Only the reduced content required to complete an analysis is then transferred to the user.

  14. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  15. Videomicroscopic extraction of specific information on cell proliferation and migration in vitro

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Debeir, Olivier; Megalizzi, Veronique; Warzee, Nadine

    2008-10-01

    In vitro cell imaging is a useful exploratory tool for cell behavior monitoring with a wide range of applications in cell biology and pharmacology. Combined with appropriate image analysis techniques, this approach has been shown to provide useful information on the detection and dynamic analysis of cell events. In this context, numerous efforts have been focused on cell migration analysis. In contrast, the cell division process has been the subject of fewer investigations. The present work focuses on this latter aspect and shows that, in complement to cell migration data, interesting information related to cell division can be extracted frommore » phase-contrast time-lapse image series, in particular cell division duration, which is not provided by standard cell assays using endpoint analyses. We illustrate our approach by analyzing the effects induced by two sigma-1 receptor ligands (haloperidol and 4-IBP) on the behavior of two glioma cell lines using two in vitro cell models, i.e., the low-density individual cell model and the high-density scratch wound model. This illustration also shows that the data provided by our approach are suggestive as to the mechanism of action of compounds, and are thus capable of informing the appropriate selection of further time-consuming and more expensive biological evaluations required to elucidate a mechanism.« less

  16. Quantitative photothermal phase imaging of red blood cells using digital holographic photothermal microscope.

    PubMed

    Vasudevan, Srivathsan; Chen, George C K; Lin, Zhiping; Ng, Beng Koon

    2015-05-10

    Photothermal microscopy (PTM), a noninvasive pump-probe high-resolution microscopy, has been applied as a bioimaging tool in many biomedical studies. PTM utilizes a conventional phase contrast microscope to obtain highly resolved photothermal images. However, phase information cannot be extracted from these photothermal images, as they are not quantitative. Moreover, the problem of halos inherent in conventional phase contrast microscopy needs to be tackled. Hence, a digital holographic photothermal microscopy technique is proposed as a solution to obtain quantitative phase images. The proposed technique is demonstrated by extracting phase values of red blood cells from their photothermal images. These phase values can potentially be used to determine the temperature distribution of the photothermal images, which is an important study in live cell monitoring applications.

  17. Tools and Data Services from the NASA Earth Satellite Observations for Climate Applications

    NASA Technical Reports Server (NTRS)

    Vicente, Gilberto A.

    2005-01-01

    Climate science and applications require access to vast amounts of archived high quality data, software tools and services for data manipulation and information extraction. These on the other hand require gaining detailed understanding of the data's internal structure and physical implementation to data reduction, combination and data product production. This time-consuming task must be undertaken before the core investigation can begin and is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets of different formats, structures, and resolutions. In order to address these issues the Goddard Space Flight Center (GSFC) Earth Sciences (GES), Data and Information Service Center (DISC) Distributed Active Archive Center (DAAC) has made great progress in facilitating science and applications research by developing innovative tools and data services applied to the Earth sciences atmospheric and climate data. The GES/DISC/DAAC has successfully implemented and maintained a long-term climate satellite data archive and developed tools and services to a variety of atmospheric science missions including AIRS, AVHRR, MODIS, SeaWiFS, SORCE, TOMS, TOVS, TRMM, and UARS and Aura instruments providing researchers with excellent opportunities to acquire accurate and continuous atmospheric measurements. Since the number of climate science products from these various missions is steadily increasing as a result of more sophisticated sensors and new science algorithms, the main challenge for data centers like the GES/DISC/DAAC is to guide users through the variety of data sets and products, provide tools to visualize and reduce the volume of the data and secure uninterrupted and reliable access to data and related products. This presentation will describe the effort at the GES/DISC/DAAC to build a bridge between multi-sensor data and the effective scientific use of the data, with an emphasis on the heritage satellite observations and science products for climate applications. The intent is to inform users of the existence of this large collection of data and products; suggest starting points for cross-platform science projects and data mining activities and provide data services and tools information. More information about the GES/DISC/DAAC satellite data and products, tools, and services can be found at http://daac.gsfc.nasa.gov.

  18. Using latent semantic analysis and the predication algorithm to improve extraction of meanings from a diagnostic corpus.

    PubMed

    Jorge-Botana, Guillermo; Olmos, Ricardo; León, José Antonio

    2009-11-01

    There is currently a widespread interest in indexing and extracting taxonomic information from large text collections. An example is the automatic categorization of informally written medical or psychological diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as an heuristic tool for helping doctors. Vector space models have been successfully used to this end (Lee, Cimino, Zhu, Sable, Shanker, Ely & Yu, 2006; Pakhomov, Buntrock & Chute, 2006). In this study we use a computational model known as Latent Semantic Analysis (LSA) on a diagnostic corpus with the aim of retrieving definitions (in the form of lists of semantic neighbors) of common structures it contains (e.g. "storm phobia", "dog phobia") or less common structures that might be formed by logical combinations of categories and diagnostic symptoms (e.g. "gun personality" or "germ personality"). In the quest to bring definitions into line with the meaning of structures and make them in some way representative, various problems commonly arise while recovering content using vector space models. We propose some approaches which bypass these problems, such as Kintsch's (2001) predication algorithm and some corrections to the way lists of neighbors are obtained, which have already been tested on semantic spaces in a non-specific domain (Jorge-Botana, León, Olmos & Hassan-Montero, under review). The results support the idea that the predication algorithm may also be useful for extracting more precise meanings of certain structures from scientific corpora, and that the introduction of some corrections based on vector length may increases its efficiency on non-representative terms.

  19. Multiplexed Sequence Encoding: A Framework for DNA Communication

    PubMed Central

    Zakeri, Bijan; Carr, Peter A.; Lu, Timothy K.

    2016-01-01

    Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication—data encoding, data transfer & data extraction—and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system—Multiplexed Sequence Encoding (MuSE)—that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA. PMID:27050646

  20. A study on building data warehouse of hospital information system.

    PubMed

    Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo

    2011-08-01

    Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

  1. CURB-65 Score is Equal to NEWS for Identifying Mortality Risk of Pneumonia Patients: An Observational Study.

    PubMed

    Brabrand, Mikkel; Henriksen, Daniel Pilsgaard

    2018-06-01

    The CURB-65 score is widely implemented as a prediction tool for identifying patients with community-acquired pneumonia (cap) at increased risk of 30-day mortality. However, since most ingredients of CURB-65 are used as general prediction tools, it is likely that other prediction tools, e.g. the British National Early Warning Score (NEWS), could be as good as CURB-65 at predicting the fate of CAP patients. To determine whether NEWS is better than CURB-65 at predicting 30-day mortality of CAP patients. This was a single-centre, 6-month observational study using patients' vital signs and demographic information registered upon admission, survival status extracted from the Danish Civil Registration System after discharge and blood test results extracted from a local database. The study was conducted in the medical admission unit (MAU) at the Hospital of South West Jutland, a regional teaching hospital in Denmark. The participants consisted of 570 CAP patients, 291 female and 279 male, median age 74 (20-102) years. The CURB-65 score had a discriminatory power of 0.728 (0.667-0.789) and NEWS 0.710 (0.645-0.775), both with good calibration and no statistical significant difference. CURB-65 was not demonstrated to be significantly statistically better than NEWS at identifying CAP patients at risk of 30-day mortality.

  2. Parent experiences and information needs relating to procedural pain in children: a systematic review protocol.

    PubMed

    Gates, Allison; Shave, Kassi; Featherstone, Robin; Buckreus, Kelli; Ali, Samina; Scott, Shannon; Hartling, Lisa

    2017-06-06

    There exist many evidence-based interventions available to manage procedural pain in children and neonates, yet they are severely underutilized. Parents play an important role in the management of their child's pain; however, many do not possess adequate knowledge of how to effectively do so. The purpose of the planned study is to systematically review and synthesize current knowledge of the experiences and information needs of parents with regard to the management of their child's pain and distress related to medical procedures in the emergency department. We will conduct a systematic review using rigorous methods and reporting based on the PRISMA statement. We will conduct a comprehensive search of literature published between 2000 and 2016 reporting on parents' experiences and information needs with regard to helping their child manage procedural pain and distress. Ovid MEDLINE, Ovid PsycINFO, CINAHL, and PubMed will be searched. We will also search reference lists of key studies and gray literature sources. Two reviewers will screen the articles following inclusion criteria defined a priori. One reviewer will then extract the data from each article following a data extraction form developed by the study team. The second reviewer will check the data extraction for accuracy and completeness. Any disagreements with regard to study inclusion or data extraction will be resolved via discussion. Data from qualitative studies will be summarized thematically, while those from quantitative studies will be summarized narratively. The second reviewer will confirm the overarching themes resulting from the qualitative and quantitative data syntheses. The Critical Appraisal Skills Programme Qualitative Research Checklist and the Quality Assessment Tool for Quantitative Studies will be used to assess the quality of the evidence from each included study. To our knowledge, no published review exists that comprehensively reports on the experiences and information needs of parents related to the management of their child's procedural pain and distress. A systematic review of parents' experiences and information needs will help to inform strategies to empower them with the knowledge necessary to ensure their child's comfort during a painful procedure. PROSPERO CRD42016043698.

  3. IBM’s Health Analytics and Clinical Decision Support

    PubMed Central

    Sun, J.; Knoop, S.; Shabo, A.; Carmeli, B.; Sow, D.; Syed-Mahmood, T.; Rapp, W.

    2014-01-01

    Summary Objectives This survey explores the role of big data and health analytics developed by IBM in supporting the transformation of healthcare by augmenting evidence-based decision-making. Methods Some problems in healthcare and strategies for change are described. It is argued that change requires better decisions, which, in turn, require better use of the many kinds of healthcare information. Analytic resources that address each of the information challenges are described. Examples of the role of each of the resources are given. Results There are powerful analytic tools that utilize the various kinds of big data in healthcare to help clinicians make more personalized, evidenced-based decisions. Such resources can extract relevant information and provide insights that clinicians can use to make evidence-supported decisions. There are early suggestions that these resources have clinical value. As with all analytic tools, they are limited by the amount and quality of data. Conclusion Big data is an inevitable part of the future of healthcare. There is a compelling need to manage and use big data to make better decisions to support the transformation of healthcare to the personalized, evidence-supported model of the future. Cognitive computing resources are necessary to manage the challenges in employing big data in healthcare. Such tools have been and are being developed. The analytic resources, themselves, do not drive, but support healthcare transformation. PMID:25123736

  4. One- and two-dimensional dopant/carrier profiling for ULSI

    NASA Astrophysics Data System (ADS)

    Vandervorst, W.; Clarysse, T.; De Wolf, P.; Trenkler, T.; Hantschel, T.; Stephenson, R.; Janssens, T.

    1998-11-01

    Dopant/carrier profiles constitute the basis of the operation of a semiconductor device and thus play a decisive role in the performance of a transistor and are subjected to the same scaling laws as the other constituents of a modern semiconductor device and continuously evolve towards shallower and more complex configurations. This evolution has increased the demands on the profiling techniques in particular in terms of resolution and quantification such that a constant reevaluation and improvement of the tools is required. As no single technique provides all the necessary information (dopant distribution, electrical activation,..) with the requested spatial and depth resolution, the present paper attempts to provide an assessment of those tools which can be considered as the main metrology technologies for ULSI-applications. For 1D-dopant profiling secondary ion mass spectrometry (SIMS) has progressed towards a generally accepted tool meeting the requirements. For 1D-carrier profiling spreading resistance profiling and microwave surface impedance profiling are envisaged as the best choices but extra developments are required to promote them to routinely applicable methods. As no main metrology tool exist for 2D-dopant profiling, main emphasis is on 2D-carrier profiling tools based on scanning probe microscopy. Scanning spreading resistance (SSRM) and scanning capacitance microscopy (SCM) are the preferred methods although neither of them already meets all the requirements. Complementary information can be extracted from Nanopotentiometry which samples the device operation in more detail. Concurrent use of carrier profiling tools, Nanopotentiometry, analysis of device characteristics and simulations is required to provide a complete characterization of deep submicron devices.

  5. An innovative approach for characteristic analysis and state-of-health diagnosis for a Li-ion cell based on the discrete wavelet transform

    NASA Astrophysics Data System (ADS)

    Kim, Jonghoon; Cho, B. H.

    2014-08-01

    This paper introduces an innovative approach to analyze electrochemical characteristics and state-of-health (SOH) diagnosis of a Li-ion cell based on the discrete wavelet transform (DWT). In this approach, the DWT has been applied as a powerful tool in the analysis of the discharging/charging voltage signal (DCVS) with non-stationary and transient phenomena for a Li-ion cell. Specifically, DWT-based multi-resolution analysis (MRA) is used for extracting information on the electrochemical characteristics in both time and frequency domain simultaneously. Through using the MRA with implementation of the wavelet decomposition, the information on the electrochemical characteristics of a Li-ion cell can be extracted from the DCVS over a wide frequency range. Wavelet decomposition based on the selection of the order 3 Daubechies wavelet (dB3) and scale 5 as the best wavelet function and the optimal decomposition scale is implemented. In particular, this present approach develops these investigations one step further by showing low and high frequency components (approximation component An and detail component Dn, respectively) extracted from variable Li-ion cells with different electrochemical characteristics caused by aging effect. Experimental results show the clearness of the DWT-based approach for the reliable diagnosis of the SOH for a Li-ion cell.

  6. Visualization of DNA in highly processed botanical materials.

    PubMed

    Lu, Zhengfei; Rubinsky, Maria; Babajanian, Silva; Zhang, Yanjun; Chang, Peter; Swanson, Gary

    2018-04-15

    DNA-based methods have been gaining recognition as a tool for botanical authentication in herbal medicine; however, their application in processed botanical materials is challenging due to the low quality and quantity of DNA left after extensive manufacturing processes. The low amount of DNA recovered from processed materials, especially extracts, is "invisible" by current technology, which has casted doubt on the presence of amplifiable botanical DNA. A method using adapter-ligation and PCR amplification was successfully applied to visualize the "invisible" DNA in botanical extracts. The size of the "invisible" DNA fragments in botanical extracts was around 20-220 bp compared to fragments of around 600 bp for the more easily visualized DNA in botanical powders. This technique is the first to allow characterization and visualization of small fragments of DNA in processed botanical materials and will provide key information to guide the development of appropriate DNA-based botanical authentication methods in the future. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Automated software system for checking the structure and format of ACM SIG documents

    NASA Astrophysics Data System (ADS)

    Mirza, Arsalan Rahman; Sah, Melike

    2017-04-01

    Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.

  8. [Systematic Readability Analysis of Medical Texts on Websites of German University Clinics for General and Abdominal Surgery].

    PubMed

    Esfahani, B Janghorban; Faron, A; Roth, K S; Grimminger, P P; Luers, J C

    2016-12-01

    Background: Besides the function as one of the main contact points, websites of hospitals serve as medical information portals. As medical information texts should be understood by any patients independent of the literacy skills and educational level, online texts should have an appropriate structure to ease understandability. Materials and Methods: Patient information texts on websites of clinics for general surgery at German university hospitals (n = 36) were systematically analysed. For 9 different surgical topics representative medical information texts were extracted from each website. Using common readability tools and 5 different readability indices the texts were analysed concerning their readability and structure. The analysis was furthermore stratified in relation to geographical regions in Germany. Results: For the definite analysis the texts of 196 internet websites could be used. On average the texts consisted of 25 sentences and 368 words. The reading analysis tools congruously showed that all texts showed a rather low readability demanding a high literacy level from the readers. Conclusion: Patient information texts on German university hospital websites are difficult to understand for most patients. To fulfill the ambition of informing the general population in an adequate way about medical issues, a revision of most medical texts on websites of German surgical hospitals is recommended. Georg Thieme Verlag KG Stuttgart · New York.

  9. Using Airborne Remote Sensing to Increase Situational Awareness in Civil Protection and Humanitarian Relief - the Importance of User Involvement

    NASA Astrophysics Data System (ADS)

    Römer, H.; Kiefl, R.; Henkel, F.; Wenxi, C.; Nippold, R.; Kurz, F.; Kippnich, U.

    2016-06-01

    Enhancing situational awareness in real-time (RT) civil protection and emergency response scenarios requires the development of comprehensive monitoring concepts combining classical remote sensing disciplines with geospatial information science. In the VABENE++ project of the German Aerospace Center (DLR) monitoring tools are being developed by which innovative data acquisition approaches are combined with information extraction as well as the generation and dissemination of information products to a specific user. DLR's 3K and 4k camera system which allow for a RT acquisition and pre-processing of high resolution aerial imagery are applied in two application examples conducted with end users: a civil protection exercise with humanitarian relief organisations and a large open-air music festival in cooperation with a festival organising company. This study discusses how airborne remote sensing can significantly contribute to both, situational assessment and awareness, focussing on the downstream processes required for extracting information from imagery and for visualising and disseminating imagery in combination with other geospatial information. Valuable user feedback and impetus for further developments has been obtained from both applications, referring to innovations in thematic image analysis (supporting festival site management) and product dissemination (editable web services). Thus, this study emphasises the important role of user involvement in application-related research, i.e. by aligning it closer to user's requirements.

  10. Building Knowledge Graphs for NASA's Earth Science Enterprise

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, T. J.; Ramachandran, R.; Shi, R.; Bao, Q.; Gatlin, P. N.; Weigel, A. M.; Maskey, M.; Miller, J. J.

    2016-12-01

    Inspired by Google Knowledge Graph, we have been building a prototype Knowledge Graph for Earth scientists, connecting information and data in NASA's Earth science enterprise. Our primary goal is to advance the state-of-the-art NASA knowledge extraction capability by going beyond traditional catalog search and linking different distributed information (such as data, publications, services, tools and people). This will enable a more efficient pathway to knowledge discovery. While Google Knowledge Graph provides impressive semantic-search and aggregation capabilities, it is limited to search topics for general public. We use the similar knowledge graph approach to semantically link information gathered from a wide variety of sources within the NASA Earth Science enterprise. Our prototype serves as a proof of concept on the viability of building an operational "knowledge base" system for NASA Earth science. Information is pulled from structured sources (such as NASA CMR catalog, GCMD, and Climate and Forecast Conventions) and unstructured sources (such as research papers). Leveraging modern techniques of machine learning, information retrieval, and deep learning, we provide an integrated data mining and information discovery environment to help Earth scientists to use the best data, tools, methodologies, and models available to answer a hypothesis. Our knowledge graph would be able to answer questions like: Which articles discuss topics investigating similar hypotheses? How have these methods been tested for accuracy? Which approaches have been highly cited within the scientific community? What variables were used for this method and what datasets were used to represent them? What processing was necessary to use this data? These questions then lead researchers and citizen scientists to investigate the sources where data can be found, available user guides, information on how the data was acquired, and available tools and models to use with this data. As a proof of concept, we focus on a well-defined domain - Hurricane Science linking research articles and their findings, data, people and tools/services. Modern information retrieval, natural language processing machine learning and deep learning techniques are applied to build the knowledge network.

  11. Development of a comprehensive list of criteria for evaluating consumer education materials on colorectal cancer screening

    PubMed Central

    2013-01-01

    Background Appropriate patient information materials may support the consumer’s decision to attend or not to attend colorectal cancer (CRC) screening tests (fecal occult blood test and screening colonoscopy). The aim of this study was to develop a list of criteria to assess whether written health information materials on CRC screening provide balanced, unbiased, quantified, understandable, and evidence-based health information (EBHI) about CRC and CRC screening. Methods The list of criteria was developed based on recommendations and assessment tools for health information in the following steps: (1) Systematic literature search in 13 electronic databases (search period: 2000–2010) and completed by an Internet search (2) Extraction of identified criteria (3) Grouping of criteria into categories and domains (4) Compilation of a manual of adequate answers derived from systematic reviews and S3 guidelines (5) Review by external experts (6) Modification (7) Final discussion with external experts. Results Thirty-one publications on health information tools and recommendations were identified. The final list of criteria includes a total of 230 single criteria in three generic domains (formal issues, presentation and understandability, and neutrality and balance) and one CRC-specific domain. A multi-dimensional rating approach was used whenever appropriate (e.g., rating for the presence, correctness, presentation and level of evidence of information). Free text input was allowed to ensure the transparency of assessment. The answer manual proved to be essential to the rating process. Quantitative analyses can be made depending on the level and dimensions of criteria. Conclusions This comprehensive list of criteria clearly has a wider range of evaluation than previous assessment tools. It is not intended as a final quality assessment tool, but as a first step toward thorough evaluation of specific information materials for their adherence to EBHI requirements. This criteria list may also be used to revise leaflets and to develop evidence-based health information on CRC screening. After adjustment for different procedure-specific criteria, the list of criteria can also be applied to other cancer screening procedures. PMID:24028691

  12. Collective Intelligence Generation from User Contributed Content

    NASA Astrophysics Data System (ADS)

    Solachidis, Vassilios; Mylonas, Phivos; Geyer-Schulz, Andreas; Hoser, Bettina; Chapman, Sam; Ciravegna, Fabio; Lanfranchi, Vita; Scherp, Ansgar; Staab, Steffen; Contopoulos, Costis; Gkika, Ioanna; Bakaimis, Byron; Smrz, Pavel; Kompatsiaris, Yiannis; Avrithis, Yannis

    In this paper we provide a foundation for a new generation of services and tools. We define new ways of capturing, sharing and reusing information and intelligence provided by single users and communities, as well as organizations by enabling the extraction, generation, interpretation and management of Collective Intelligence from user generated digital multimedia content. Different layers of intelligence are generated, which together constitute the notion of Collective Intelligence. The automatic generation of Collective Intelligence constitutes a departure from traditional methods for information sharing, since information from both the multimedia content and social aspects will be merged, while at the same time the social dynamics will be taken into account. In the context of this work, we present two case studies: an Emergency Response and a Consumers Social Group case study.

  13. Using texts in science education: cognitive processes and knowledge representation.

    PubMed

    van den Broek, Paul

    2010-04-23

    Texts form a powerful tool in teaching concepts and principles in science. How do readers extract information from a text, and what are the limitations in this process? Central to comprehension of and learning from a text is the construction of a coherent mental representation that integrates the textual information and relevant background knowledge. This representation engenders learning if it expands the reader's existing knowledge base or if it corrects misconceptions in this knowledge base. The Landscape Model captures the reading process and the influences of reader characteristics (such as working-memory capacity, reading goal, prior knowledge, and inferential skills) and text characteristics (such as content/structure of presented information, processing demands, and textual cues). The model suggests factors that can optimize--or jeopardize--learning science from text.

  14. Semiautomated Device for Batch Extraction of Metabolites from Tissue Samples

    PubMed Central

    2012-01-01

    Metabolomics has become a mainstream analytical strategy for investigating metabolism. The quality of data derived from these studies is proportional to the consistency of the sample preparation. Although considerable research has been devoted to finding optimal extraction protocols, most of the established methods require extensive sample handling. Manual sample preparation can be highly effective in the hands of skilled technicians, but an automated tool for purifying metabolites from complex biological tissues would be of obvious utility to the field. Here, we introduce the semiautomated metabolite batch extraction device (SAMBED), a new tool designed to simplify metabolomics sample preparation. We discuss SAMBED’s design and show that SAMBED-based extractions are of comparable quality to extracts produced through traditional methods (13% mean coefficient of variation from SAMBED versus 16% from manual extractions). Moreover, we show that aqueous SAMBED-based methods can be completed in less than a quarter of the time required for manual extractions. PMID:22292466

  15. Non-label bioimaging utilizing scattering lights

    NASA Astrophysics Data System (ADS)

    Watanabe, Tomonobu M.; Ichimura, Taro; Fujita, Hideaki

    2017-04-01

    Optical microscopy is an indispensable tool for medical and life sciences. Especially, the microscopes utilized with scattering light offer a detailed internal observation of living specimens in real time because of their non-labeling and non-invasive capability. We here focus on two kinds of scattering lights, Raman scattering light and second harmonic generation light. Raman scattering light includes the information of all the molecular vibration modes of the molecules, and can be used to distinguish types and/or state of cell. Second harmonic generation light is derived from electric polarity of proteins in the specimen, and enables to detect their structural change. In this conference, we would like to introduce our challenges to extract biological information from those scattering lights.

  16. Aqueous biphasic systems in the separation of food colorants.

    PubMed

    Santos, João H P M; Capela, Emanuel V; Boal-Palheiros, Isabel; Coutinho, João A P; Freire, Mara G; Ventura, Sónia P M

    2018-04-25

    Aqueous biphasic systems (ABS) composed of polypropylene glycol and carbohydrates, two benign substances are proposed to separate two food colorants (E122 and E133). ABS are promising extractive platforms, particularly for biomolecules, due to their aqueous and mild nature (pH and temperature), reduced environmental impact and processing costs. Another major aspect considered, particularly useful in downstream processing, is the "tuning" ability for the extraction and purification of these systems by a proper choice of the ABS components. In this work, our intention is to show the concept of ABS as an alternative and volatile organic solvent-free tool to separate two different biomolecules in a simple way, so simple that teachers can effectively adopt it in their classes to explain the concept of bioseparation processes. Informative documents and general information about the preparation of binodal curves and their use in the partition of biomolecules is available in this work to be used by teachers in their classes. In this sense, the students use different carbohydrates to build ABS, then study the partition of two food color dyes (synthetic origin), thus evaluating their ability on the separation of both food colorants. Through these experiments, the students get acquainted with ABS, learn how to determine solubility curves and perform extraction procedures using colorant food additives, that can also be applied in the extraction of various (bio)molecules. © 2018 by The International Union of Biochemistry and Molecular Biology, 2018. © 2018 The International Union of Biochemistry and Molecular Biology.

  17. miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

    PubMed

    Gupta, Samir; Ross, Karen E; Tudor, Catalina O; Wu, Cathy H; Schmidt, Carl J; Vijay-Shanker, K

    2016-04-29

    MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list. miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."

  18. dSED: A database tool for modeling sediment early diagenesis

    NASA Astrophysics Data System (ADS)

    Katsev, S.; Rancourt, D. G.; L'Heureux, I.

    2003-04-01

    Sediment early diagenesis reaction transport models (RTMs) are becoming powerful tools in providing kinetic descriptions of the metal and nutrient diagenetic cycling in marine, lacustrine, estuarine, and other aquatic sediments, as well as of exchanges with the water column. Whereas there exist several good database/program combinations for thermodynamic equilibrium calculations in aqueous systems, at present there exist no database tools for classification and analysis of the kinetic data essential to RTM development. We present a database tool that is intended to serve as an online resource for information about chemical reactions, solid phase and solute reactants, sorption reactions, transport mechanisms, and kinetic and equilibrium parameters that are relevant to sediment diagenesis processes. The list of reactive substances includes but is not limited to organic matter, Fe and Mn oxides and oxyhydroxides, sulfides and sulfates, calcium, iron, and manganese carbonates, phosphorus-bearing minerals, and silicates. Aqueous phases include dissolved carbon dioxide, oxygen, methane, hydrogen sulfide, sulfate, nitrate, phosphate, some organic compounds, and dissolved metal species. A number of filters allow extracting information according to user-specified criteria, e.g., about a class of substances contributing to the cycling of iron. The database also includes bibliographic information about published diagenetic models and the reactions and processes that they consider. At the time of preparing this abstract, dSED contained 128 reactions and 12 pre-defined filters. dSED is maintained by the Lake Sediment Structure and Evolution (LSSE) group at the University of Ottawa (www.science.uottawa.ca/LSSE/dSED) and we invite input from the geochemical community.

  19. Fluctuating Finite Element Analysis (FFEA): A continuum mechanics software tool for mesoscale simulation of biomolecules

    PubMed Central

    Solernou, Albert

    2018-01-01

    Fluctuating Finite Element Analysis (FFEA) is a software package designed to perform continuum mechanics simulations of proteins and other globular macromolecules. It combines conventional finite element methods with stochastic thermal noise, and is appropriate for simulations of large proteins and protein complexes at the mesoscale (length-scales in the range of 5 nm to 1 μm), where there is currently a paucity of modelling tools. It requires 3D volumetric information as input, which can be low resolution structural information such as cryo-electron tomography (cryo-ET) maps or much higher resolution atomistic co-ordinates from which volumetric information can be extracted. In this article we introduce our open source software package for performing FFEA simulations which we have released under a GPLv3 license. The software package includes a C ++ implementation of FFEA, together with tools to assist the user to set up the system from Electron Microscopy Data Bank (EMDB) or Protein Data Bank (PDB) data files. We also provide a PyMOL plugin to perform basic visualisation and additional Python tools for the analysis of FFEA simulation trajectories. This manuscript provides a basic background to the FFEA method, describing the implementation of the core mechanical model and how intermolecular interactions and the solvent environment are included within this framework. We provide prospective FFEA users with a practical overview of how to set up an FFEA simulation with reference to our publicly available online tutorials and manuals that accompany this first release of the package. PMID:29570700

  20. Automated Comparative Metabolite Profiling of Large LC-ESIMS Data Sets in an ACD/MS Workbook Suite Add-in, and Data Clustering on a New Open-Source Web Platform FreeClust.

    PubMed

    Božičević, Alen; Dobrzyński, Maciej; De Bie, Hans; Gafner, Frank; Garo, Eliane; Hamburger, Matthias

    2017-12-05

    The technological development of LC-MS instrumentation has led to significant improvements of performance and sensitivity, enabling high-throughput analysis of complex samples, such as plant extracts. Most software suites allow preprocessing of LC-MS chromatograms to obtain comprehensive information on single constituents. However, more advanced processing needs, such as the systematic and unbiased comparative metabolite profiling of large numbers of complex LC-MS chromatograms remains a challenge. Currently, users have to rely on different tools to perform such data analyses. We developed a two-step protocol comprising a comparative metabolite profiling tool integrated in ACD/MS Workbook Suite, and a web platform developed in R language designed for clustering and visualization of chromatographic data. Initially, all relevant chromatographic and spectroscopic data (retention time, molecular ions with the respective ion abundance, and sample names) are automatically extracted and assembled in an Excel spreadsheet. The file is then loaded into an online web application that includes various statistical algorithms and provides the user with tools to compare and visualize the results in intuitive 2D heatmaps. We applied this workflow to LC-ESIMS profiles obtained from 69 honey samples. Within few hours of calculation with a standard PC, honey samples were preprocessed and organized in clusters based on their metabolite profile similarities, thereby highlighting the common metabolite patterns and distributions among samples. Implementation in the ACD/Laboratories software package enables ulterior integration of other analytical data, and in silico prediction tools for modern drug discovery.

  1. Interrater agreement of an observational tool to code knockouts and technical knockouts in mixed martial arts.

    PubMed

    Lawrence, David W; Hutchison, Michael G; Cusimano, Michael D; Singh, Tanveer; Li, Luke

    2014-09-01

    Interrater agreement evaluation of a tool to document and code the situational factors and mechanisms of knockouts (KOs) and technical knockouts (TKOs) in mixed martial arts (MMA). Retrospective case series. Professional MMA matches from the Ultimate Fighting Championship-2006-2012. Two nonmedically trained independent raters. The MMA Knockout Tool (MMA-KT) consists of 20 factors and captures and codes information on match characteristics, situational context preceding KOs and TKOs, as well as describing competitor states during these outcomes. The MMA-KT also evaluates the mechanism of action and subsequent events surrounding a KO. The 2 raters coded 125 unique events for a total of 250 events. The 8 factors of Part A had an average κ of 0.87 (SD = 0.10; range = 0.65-0.98); 7 were considered "substantial" agreement and 1 "moderate." Part B consists of 12 factors with an average κ of 0.84 (SD = 0.16; range = 0.59-1.0); 7 classified as "substantial" agreement, 4 "moderate," and 1 "fair." The majority of the factors in the MMA-KT demonstrated substantial interrater agreement, with an average κ of 0.86 (SD = 0.13; range = 0.59-1.0). The MMA-KT is a reliable tool to extract and code relevant information to investigate the situational factors and mechanism of KOs and TKOs in MMA competitions.

  2. NEFI: Network Extraction From Images

    PubMed Central

    Dirnberger, M.; Kehl, T.; Neumann, A.

    2015-01-01

    Networks are amongst the central building blocks of many systems. Given a graph of a network, methods from graph theory enable a precise investigation of its properties. Software for the analysis of graphs is widely available and has been applied to study various types of networks. In some applications, graph acquisition is relatively simple. However, for many networks data collection relies on images where graph extraction requires domain-specific solutions. Here we introduce NEFI, a tool that extracts graphs from images of networks originating in various domains. Regarding previous work on graph extraction, theoretical results are fully accessible only to an expert audience and ready-to-use implementations for non-experts are rarely available or insufficiently documented. NEFI provides a novel platform allowing practitioners to easily extract graphs from images by combining basic tools from image processing, computer vision and graph theory. Thus, NEFI constitutes an alternative to tedious manual graph extraction and special purpose tools. We anticipate NEFI to enable time-efficient collection of large datasets. The analysis of these novel datasets may open up the possibility to gain new insights into the structure and function of various networks. NEFI is open source and available at http://nefi.mpi-inf.mpg.de. PMID:26521675

  3. Assessment of ecotoxicity and total volatile organic compound (TVOC) emissions from food and children's toy products.

    PubMed

    Szczepańska, Natalia; Marć, Mariusz; Kudłak, Błażej; Simeonov, Vasil; Tsakovski, Stefan; Namieśnik, Jacek

    2018-09-30

    The development of new methods for identifying a broad spectrum of analytes, as well as highly selective tools to provide the most accurate information regarding the processes and relationships in the world, has been an area of interest for researchers for many years. The information obtained with these tools provides valuable data to complement existing knowledge but, above all, to identify and determine previously unknown hazards. Recently, attention has been paid to the migration of xenobiotics from the surfaces of various everyday objects and the resulting impacts on human health. Since children are among those most vulnerable to health consequences, one of the main subjects of interest is the migration of low-molecular-weight compounds from toys and products intended for children. This migration has become a stimulus for research aimed at determining the degree of release of compounds from popular commercially available chocolate/toy sets. One of main objectives of this research was to determine the impact of time on the ecotoxicity (with Vibrio fischeri bioluminescent bacteria) of extracts of products intended for children and to assess the correlation with total volatile organic compound emissions using basic chemometric methods. The studies on endocrine potential (with XenoScreen YES/YAS) of the extracts and showed that compounds released from the studied objects (including packaging foils, plastic capsules storing toys, most of toys studied and all chocolate samples) exhibit mostly androgenic antagonistic behavior while using artificial saliva as extraction medium increased the impact observed. The impact of time in most cases was positive one and increased with prolonging extraction time. The small-scale stationary environmental test chambers - μ-CTE™ 250 system was employed to perform the studies aimed at determining the profile of total volatile organic compounds (TVOCs) emissions. Due to this it was possible to state that objects from which the greatest amounts of contaminants are released are plastic containers (with emission rate falling down from 3273 to 2280 ng/g of material at 6 h of conditioning in elevated temperature). Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Validation of a general practice audit and data extraction tool.

    PubMed

    Peiris, David; Agaliotis, Maria; Patel, Bindu; Patel, Anushka

    2013-11-01

    We assessed how accurately a common general practitioner (GP) audit tool extracts data from two software systems. First, pathology test codes were audited at 33 practices covering nine companies. Second, a manual audit of chronic disease data from 200 random patient records at two practices was compared with audit tool data. Pathology review: all companies assigned correct codes for cholesterol, creatinine and glycated haemoglobin; four companies assigned incorrect codes for albuminuria tests, precluding accurate detection with the audit tool. Case record review: there was strong agreement between the manual audit and the tool for all variables except chronic kidney disease diagnoses, which was due to a tool-related programming error. The audit tool accurately detected most chronic disease data in two GP record systems. The one exception, however, highlights the importance of surveillance systems to promptly identify errors. This will maximise potential for audit tools to improve healthcare quality.

  5. Dynamic quantitative photothermal monitoring of cell death of individual human red blood cells upon glucose depletion

    NASA Astrophysics Data System (ADS)

    Vasudevan, Srivathsan; Chen, George Chung Kit; Andika, Marta; Agarwal, Shuchi; Chen, Peng; Olivo, Malini

    2010-09-01

    Red blood cells (RBCs) have been found to undergo ``programmed cell death,'' or eryptosis, and understanding this process can provide more information about apoptosis of nucleated cells. Photothermal (PT) response, a label-free photothermal noninvasive technique, is proposed as a tool to monitor the cell death process of living human RBCs upon glucose depletion. Since the physiological status of the dying cells is highly sensitive to photothermal parameters (e.g., thermal diffusivity, absorption, etc.), we applied linear PT response to continuously monitor the death mechanism of RBC when depleted of glucose. The kinetics of the assay where the cell's PT response transforms from linear to nonlinear regime is reported. In addition, quantitative monitoring was performed by extracting the relevant photothermal parameters from the PT response. Twofold increases in thermal diffusivity and size reduction were found in the linear PT response during cell death. Our results reveal that photothermal parameters change earlier than phosphatidylserine externalization (used for fluorescent studies), allowing us to detect the initial stage of eryptosis in a quantitative manner. Hence, the proposed tool, in addition to detection of eryptosis earlier than fluorescence, could also reveal physiological status of the cells through quantitative photothermal parameter extraction.

  6. Development of a data entry auditing protocol and quality assurance for a tissue bank database.

    PubMed

    Khushi, Matloob; Carpenter, Jane E; Balleine, Rosemary L; Clarke, Christine L

    2012-03-01

    Human transcription error is an acknowledged risk when extracting information from paper records for entry into a database. For a tissue bank, it is critical that accurate data are provided to researchers with approved access to tissue bank material. The challenges of tissue bank data collection include manual extraction of data from complex medical reports that are accessed from a number of sources and that differ in style and layout. As a quality assurance measure, the Breast Cancer Tissue Bank (http:\\\\www.abctb.org.au) has implemented an auditing protocol and in order to efficiently execute the process, has developed an open source database plug-in tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range from 0.01% when entering donor's clinical follow-up details, to 0.53% when entering pathological details, highlighting the importance of an audit protocol tool such as eAuditor in a tissue bank database. eAuditor was developed and tested on the Caisis open source clinical-research database; however, it can be integrated in other databases where similar functionality is required.

  7. BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

    PubMed

    Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

    2018-06-05

    With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.

  8. New Caledonian crows attend to multiple functional properties of complex tools

    PubMed Central

    St Clair, James J. H.; Rutz, Christian

    2013-01-01

    The ability to attend to the functional properties of foraging tools should affect energy-intake rates, fitness components and ultimately the evolutionary dynamics of tool-related behaviour. New Caledonian crows Corvus moneduloides use three distinct tool types for extractive foraging: non-hooked stick tools, hooked stick tools and tools cut from the barbed edges of Pandanus spp. leaves. The latter two types exhibit clear functional polarity, because of (respectively) a single terminal, crow-manufactured hook and natural barbs running along one edge of the leaf strip; in each case, the ‘hooks’ can only aid prey capture if the tool is oriented correctly by the crow during deployment. A previous experimental study of New Caledonian crows found that subjects paid little attention to the barbs of supplied (wide) pandanus tools, resulting in non-functional tool orientation during foraging. This result is puzzling, given the presumed fitness benefits of consistently orienting tools functionally in the wild. We investigated whether the lack of discrimination with respect to (wide) pandanus tool orientation also applies to hooked stick tools. We experimentally provided subjects with naturalistic replica tools in a range of orientations and found that all subjects used these tools correctly, regardless of how they had been presented. In a companion experiment, we explored the extent to which normally co-occurring tool features (terminal hook, curvature of the tool shaft and stripped bark at the hooked end) inform tool-orientation decisions, by forcing birds to deploy ‘unnatural’ tools, which exhibited these traits at opposite ends. Our subjects attended to at least two of the three tool features, although, as expected, the location of the hook was of paramount importance. We discuss these results in the context of earlier research and propose avenues for future work. PMID:24101625

  9. New Caledonian crows attend to multiple functional properties of complex tools.

    PubMed

    St Clair, James J H; Rutz, Christian

    2013-11-19

    The ability to attend to the functional properties of foraging tools should affect energy-intake rates, fitness components and ultimately the evolutionary dynamics of tool-related behaviour. New Caledonian crows Corvus moneduloides use three distinct tool types for extractive foraging: non-hooked stick tools, hooked stick tools and tools cut from the barbed edges of Pandanus spp. leaves. The latter two types exhibit clear functional polarity, because of (respectively) a single terminal, crow-manufactured hook and natural barbs running along one edge of the leaf strip; in each case, the 'hooks' can only aid prey capture if the tool is oriented correctly by the crow during deployment. A previous experimental study of New Caledonian crows found that subjects paid little attention to the barbs of supplied (wide) pandanus tools, resulting in non-functional tool orientation during foraging. This result is puzzling, given the presumed fitness benefits of consistently orienting tools functionally in the wild. We investigated whether the lack of discrimination with respect to (wide) pandanus tool orientation also applies to hooked stick tools. We experimentally provided subjects with naturalistic replica tools in a range of orientations and found that all subjects used these tools correctly, regardless of how they had been presented. In a companion experiment, we explored the extent to which normally co-occurring tool features (terminal hook, curvature of the tool shaft and stripped bark at the hooked end) inform tool-orientation decisions, by forcing birds to deploy 'unnatural' tools, which exhibited these traits at opposite ends. Our subjects attended to at least two of the three tool features, although, as expected, the location of the hook was of paramount importance. We discuss these results in the context of earlier research and propose avenues for future work.

  10. Building a diabetes screening population data repository using electronic medical records.

    PubMed

    Tuan, Wen-Jan; Sheehy, Ann M; Smith, Maureen A

    2011-05-01

    There has been a rapid advancement of information technology in the area of clinical and population health data management since 2000. However, with the fast growth of electronic medical records (EMRs) and the increasing complexity of information systems, it has become challenging for researchers to effectively access, locate, extract, and analyze information critical to their research. This article introduces an outpatient encounter data framework designed to construct an EMR-based population data repository for diabetes screening research. The outpatient encounter data framework is developed on a hybrid data structure of entity-attribute-value models, dimensional models, and relational models. This design preserves a small number of subject-specific tables essential to key clinical constructs in the data repository. It enables atomic information to be maintained in a transparent and meaningful way to researchers and health care practitioners who need to access data and still achieve the same performance level as conventional data warehouse models. A six-layer information processing strategy is developed to extract and transform EMRs to the research data repository. The data structure also complies with both Health Insurance Portability and Accountability Act regulations and the institutional review board's requirements. Although developed for diabetes screening research, the design of the outpatient encounter data framework is suitable for other types of health service research. It may also provide organizations a tool to improve health care quality and efficiency, consistent with the "meaningful use" objectives of the Health Information Technology for Economic and Clinical Health Act. © 2011 Diabetes Technology Society.

  11. Spectrum image analysis tool - A flexible MATLAB solution to analyze EEL and CL spectrum images.

    PubMed

    Schmidt, Franz-Philipp; Hofer, Ferdinand; Krenn, Joachim R

    2017-02-01

    Spectrum imaging techniques, gaining simultaneously structural (image) and spectroscopic data, require appropriate and careful processing to extract information of the dataset. In this article we introduce a MATLAB based software that uses three dimensional data (EEL/CL spectrum image in dm3 format (Gatan Inc.'s DigitalMicrograph ® )) as input. A graphical user interface enables a fast and easy mapping of spectral dependent images and position dependent spectra. First, data processing such as background subtraction, deconvolution and denoising, second, multiple display options including an EEL/CL moviemaker and, third, the applicability on a large amount of data sets with a small work load makes this program an interesting tool to visualize otherwise hidden details. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Social Media Use Among Nurses: Literature Review.

    PubMed

    Cordoş, Ariana Anamaria; Bolboacă, Sorana D

    2016-01-01

    The scope of the research was to increase the understanding of social media's influence among nurses while highlighting gaps in the literature and areas for further research. The search of PubMed database was performed in November 2015, using terms to identify peer-reviewed articles that describe the use of social media for nursing students or nurse practitioners. A systematic approach was used to retrieve papers and extract relevant data. There were identified 23 full text articles involving social media and nurse-related terminology. The majority of the studies were interventional (n = 20) that assessed social media as a teaching tool. Podcasts, Multiplayer virtual worlds and mixed social media platforms has also been assessed. Social media is used as a tool of information for nurses mainly as the means for engaging and communicating.

  13. A Haptic-Enhanced System for Molecular Sensing

    NASA Astrophysics Data System (ADS)

    Comai, Sara; Mazza, Davide

    The science of haptics has received an enormous attention in the last decade. One of the major application trends of haptics technology is data visualization and training. In this paper, we present a haptically-enhanced system for manipulation and tactile exploration of molecules.The geometrical models of molecules is extracted either from theoretical or empirical data using file formats widely adopted in chemical and biological fields. The addition of information computed with computational chemistry tools, allows users to feel the interaction forces between an explored molecule and a charge associated to the haptic device, and to visualize a huge amount of numerical data in a more comprehensible way. The developed tool can be used either for teaching or research purposes due to its high reliance on both theoretical and experimental data.

  14. GLYDE-II: The GLYcan data exchange format

    PubMed Central

    Ranzinger, Rene; Kochut, Krys J.; Miller, John A.; Eavenson, Matthew; Lütteke, Thomas; York, William S.

    2017-01-01

    Summary The GLYcan Data Exchange (GLYDE) standard has been developed for the representation of the chemical structures of monosaccharides, glycans and glycoconjugates using a connection table formalism formatted in XML. This format allows structures, including those that do not exist in any database, to be unambiguously represented and shared by diverse computational tools. GLYDE implements a partonomy model based on human language along with rules that provide consistent structural representations, including a robust namespace for specifying monosaccharides. This approach facilitates the reuse of data processing software at the level of granularity that is most appropriate for extraction of the desired information. GLYDE-II has already been used as a key element of several glycoinformatics tools. The philosophical and technical underpinnings of GLYDE-II and recent implementation of its enhanced features are described. PMID:28955652

  15. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  16. The BioLexicon: a large-scale terminological resource for biomedical text mining

    PubMed Central

    2011-01-01

    Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. PMID:21992002

  17. The BioLexicon: a large-scale terminological resource for biomedical text mining.

    PubMed

    Thompson, Paul; McNaught, John; Montemagni, Simonetta; Calzolari, Nicoletta; del Gratta, Riccardo; Lee, Vivian; Marchi, Simone; Monachini, Monica; Pezik, Piotr; Quochi, Valeria; Rupp, C J; Sasaki, Yutaka; Venturi, Giulia; Rebholz-Schuhmann, Dietrich; Ananiadou, Sophia

    2011-10-12

    Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.

  18. Relation extraction for biological pathway construction using node2vec.

    PubMed

    Kim, Munui; Baek, Seung Han; Song, Min

    2018-06-13

    Systems biology is an important field for understanding whole biological mechanisms composed of interactions between biological components. One approach for understanding complex and diverse mechanisms is to analyze biological pathways. However, because these pathways consist of important interactions and information on these interactions is disseminated in a large number of biomedical reports, text-mining techniques are essential for extracting these relationships automatically. In this study, we applied node2vec, an algorithmic framework for feature learning in networks, for relationship extraction. To this end, we extracted genes from paper abstracts using pkde4j, a text-mining tool for detecting entities and relationships. Using the extracted genes, a co-occurrence network was constructed and node2vec was used with the network to generate a latent representation. To demonstrate the efficacy of node2vec in extracting relationships between genes, performance was evaluated for gene-gene interactions involved in a type 2 diabetes pathway. Moreover, we compared the results of node2vec to those of baseline methods such as co-occurrence and DeepWalk. Node2vec outperformed existing methods in detecting relationships in the type 2 diabetes pathway, demonstrating that this method is appropriate for capturing the relatedness between pairs of biological entities involved in biological pathways. The results demonstrated that node2vec is useful for automatic pathway construction.

  19. PylotDB - A Database Management, Graphing, and Analysis Tool Written in Python

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnette, Daniel W.

    2012-01-04

    PylotDB, written completely in Python, provides a user interface (UI) with which to interact with, analyze, graph data from, and manage open source databases such as MySQL. The UI mitigates the user having to know in-depth knowledge of the database application programming interface (API). PylotDB allows the user to generate various kinds of plots from user-selected data; generate statistical information on text as well as numerical fields; backup and restore databases; compare database tables across different databases as well as across different servers; extract information from any field to create new fields; generate, edit, and delete databases, tables, and fields;more » generate or read into a table CSV data; and similar operations. Since much of the database information is brought under control of the Python computer language, PylotDB is not intended for huge databases for which MySQL and Oracle, for example, are better suited. PylotDB is better suited for smaller databases that might be typically needed in a small research group situation. PylotDB can also be used as a learning tool for database applications in general.« less

  20. Integrating Data Sources for Process Sustainability ...

    EPA Pesticide Factsheets

    To perform a chemical process sustainability assessment requires significant data about chemicals, process design specifications, and operating conditions. The required information includes the identity of the chemicals used, the quantities of the chemicals within the context of the sustainability assessment, physical properties of these chemicals, equipment inventory, as well as health, environmental, and safety properties of the chemicals. Much of this data are currently available to the process engineer either from the process design in the chemical process simulation software or online through chemical property and environmental, health, and safety databases. Examples of these databases include the U.S. Environmental Protection Agency’s (USEPA’s) Aggregated Computational Toxicology Resource (ACToR), National Institute for Occupational Safety and Health’s (NIOSH’s) Hazardous Substance Database (HSDB), and National Institute of Standards and Technology’s (NIST’s) Chemistry Webbook. This presentation will provide methods and procedures for extracting chemical identity and flow information from process design tools (such as chemical process simulators) and chemical property information from the online databases. The presentation will also demonstrate acquisition and compilation of the data for use in the EPA’s GREENSCOPE process sustainability analysis tool. This presentation discusses acquisition of data for use in rapid LCI development.

  1. An evaluation of the suitability of ERTS data for the purposes of petroleum exploration. [Anadarko Basin in Oklahoma and Texas

    NASA Technical Reports Server (NTRS)

    Everett, J. R.; Petzel, G.

    1974-01-01

    This investigation was undertaken to determine the types and amounts of information valuable to petroleum exploration that are extractable from ERTS data and to determine the cost of obtaining the information from ERTS relative to costs using traditional or conventional means. In particular, it was desirable to evaluate this new petroleum exploration tool in a geologically well-known area in order to assess its potential usefulness in an unknown area. In light of the current energy situation, it is felt that such an evaluation is important in order to best utilize technical efforts with customary exploration tools, by rapidly focusing attention on the most promising areas in order to reduce the time required to go through the exploration cycle and to maximize cost savings. The Anadarko Basin lies in western Oklahoma and the panhandle of Texas (Figure 1). It was chosen as a test site because there is a great deal of published information available on the surface and subsurface geology of the area, there are many known structures that act as traps for hydrocarbons, and it is similar to several other large epicontinental sedimentary basins.

  2. Mapping care processes within a hospital: a web-based proposal merging enterprise modelling and ISO normative principles.

    PubMed

    Staccini, Pascal; Joubert, Michel; Quaranta, Jean-François; Fieschi, Marius

    2003-01-01

    Today, the economic and regulatory environment are pressuring hospitals and healthcare professionals to account for their results and methods of care delivery. The evaluation of the quality and the safety of care, the traceability of the acts performed and the evaluation of practices are some of the reasons underpinning current interest in clinical and hospital information systems. The structured collection of users' needs and system requirements is fundamental when installing such systems. This stage takes time and is generally misconstrued by caregivers and is of limited efficacy to analysis. We used a modelling technique designed for manufacturing processes (SADT: Structured Analysis and Design Technique). We enhanced the initial model of activity of this method and programmed a web-based tool in an object-oriented environment. This tool makes it possible to extract the data dictionary from the description of a given process and to locate documents (procedures, recommendations, instructions). Aimed at structuring needs and storing information provided by teams directly involved regarding the workings of an institution (or at least part of it), the process mapping approach has an important contribution to make in the analysis of clinical information systems.

  3. Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature.

    PubMed

    Ozyurt, Ibrahim Burak; Grethe, Jeffrey S; Martone, Maryann E; Bandrowski, Anita E

    2016-01-01

    The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.

  4. Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature

    PubMed Central

    Ozyurt, Ibrahim Burak; Grethe, Jeffrey S.; Martone, Maryann E.; Bandrowski, Anita E.

    2016-01-01

    The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking. PMID:26730820

  5. Challenges for automatically extracting molecular interactions from full-text articles.

    PubMed

    McIntosh, Tara; Curran, James R

    2009-09-24

    The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.

  6. PHOXTRACK-a tool for interpreting comprehensive datasets of post-translational modifications of proteins.

    PubMed

    Weidner, Christopher; Fischer, Cornelius; Sauer, Sascha

    2014-12-01

    We introduce PHOXTRACK (PHOsphosite-X-TRacing Analysis of Causal Kinases), a user-friendly freely available software tool for analyzing large datasets of post-translational modifications of proteins, such as phosphorylation, which are commonly gained by mass spectrometry detection. In contrast to other currently applied data analysis approaches, PHOXTRACK uses full sets of quantitative proteomics data and applies non-parametric statistics to calculate whether defined kinase-specific sets of phosphosite sequences indicate statistically significant concordant differences between various biological conditions. PHOXTRACK is an efficient tool for extracting post-translational information of comprehensive proteomics datasets to decipher key regulatory proteins and to infer biologically relevant molecular pathways. PHOXTRACK will be maintained over the next years and is freely available as an online tool for non-commercial use at http://phoxtrack.molgen.mpg.de. Users will also find a tutorial at this Web site and can additionally give feedback at https://groups.google.com/d/forum/phoxtrack-discuss. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts.

    PubMed

    Goetz, Thomas; von der Lieth, Claus-Wilhelm

    2005-07-01

    Since it is becoming increasingly laborious to manually extract useful information embedded in the ever-growing volumes of literature, automated intelligent text analysis tools are becoming more and more essential to assist in this task. PubFinder (www.glycosciences.de/tools/PubFinder) is a publicly available web tool designed to improve the retrieval rate of scientific abstracts relevant for a specific scientific topic. Only the selection of a representative set of abstracts is required, which are central for a scientific topic. No special knowledge concerning the query-syntax is necessary. Based on the selected abstracts, a list of discriminating words is automatically calculated, which is subsequently used for scoring all defined PubMed abstracts for their probability of belonging to the defined scientific topic. This results in a hit-list of references in the descending order of their likelihood score. The algorithms and procedures implemented in PubFinder facilitate the perpetual task for every scientist of staying up-to-date with current publications dealing with a specific subject in biomedicine.

  8. Global Tsunami Database: Adding Geologic Deposits, Proxies, and Tools

    NASA Astrophysics Data System (ADS)

    Brocko, V. R.; Varner, J.

    2007-12-01

    A result of collaboration between NOAA's National Geophysical Data Center (NGDC) and the Cooperative Institute for Research in the Environmental Sciences (CIRES), the Global Tsunami Database includes instrumental records, human observations, and now, information inferred from the geologic record. Deep Ocean Assessment and Reporting of Tsunamis (DART) data, historical reports, and information gleaned from published tsunami deposit research build a multi-faceted view of tsunami hazards and their history around the world. Tsunami history provides clues to what might happen in the future, including frequency of occurrence and maximum wave heights. However, instrumental and written records commonly span too little time to reveal the full range of a region's tsunami hazard. The sedimentary deposits of tsunamis, identified with the aid of modern analogs, increasingly complement instrumental and human observations. By adding the component of tsunamis inferred from the geologic record, the Global Tsunami Database extends the record of tsunamis backward in time. Deposit locations, their estimated age and descriptions of the deposits themselves fill in the tsunami record. Tsunamis inferred from proxies, such as evidence for coseismic subsidence, are included to estimate recurrence intervals, but are flagged to highlight the absence of a physical deposit. Authors may submit their own descriptions and upload digital versions of publications. Users may sort by any populated field, including event, location, region, age of deposit, author, publication type (extract information from peer reviewed publications only, if you wish), grain size, composition, presence/absence of plant material. Users may find tsunami deposit references for a given location, event or author; search for particular properties of tsunami deposits; and even identify potential collaborators. Users may also download public-domain documents. Data and information may be viewed using tools designed to extract and display data from the Oracle database (selection forms, Web Map Services, and Web Feature Services). In addition, the historic tsunami archive (along with related earthquakes and volcanic eruptions) is available in KML (Keyhole Markup Language) format for use with Google Earth and similar geo-viewers.

  9. Polarization Observables T and F in the yp -> pi p Reaction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Hao

    The theory that describes the interaction of quarks is Quantum Chromodynamics (QCD), but how quarks are bound inside a nucleon is not yet well understood. Pion photoproduction experiments reveal important information about the nucleon excited states and the dynamics of the quarks within it and thus provide a useful tool to study QCD. Detailed information about this reaction can be obtained in experiments that utilize polarized photon beams and polarized targets. Pion photoproduction in the γρ -> π0ρ reaction has been measured in the FROST experiment at the Thomas Jefferson National Accelerator Facility. In this experiment circularly polarized photons withmore » electron-beam energies up to 3.082 GeV impinged on a transversely polarized frozen-spin target. Final-state protons were detected in the CEBAF Large Acceptance Spectrometer. Results of the polarization observables T and F have been extracted. The data generally agree with predictions of present partial wave analyses, but also show marked differences. The data will constrain further partial wave analyses and improve the extraction of proton resonance properties.« less

  10. Direct extraction of electron parameters from magnetoconductance analysis in mesoscopic ring array structures

    NASA Astrophysics Data System (ADS)

    Sawada, A.; Faniel, S.; Mineshige, S.; Kawabata, S.; Saito, K.; Kobayashi, K.; Sekine, Y.; Sugiyama, H.; Koga, T.

    2018-05-01

    We report an approach for examining electron properties using information about the shape and size of a nanostructure as a measurement reference. This approach quantifies the spin precession angles per unit length directly by considering the time-reversal interferences on chaotic return trajectories within mesoscopic ring arrays (MRAs). Experimentally, we fabricated MRAs using nanolithography in InGaAs quantum wells which had a gate-controllable spin-orbit interaction (SOI). As a result, we observed an Onsager symmetry related to relativistic magnetic fields, which provided us with indispensable information for the semiclassical billiard ball simulation. Our simulations, developed based on the real-space formalism of the weak localization/antilocalization effect including the degree of freedom for electronic spin, reproduced the experimental magnetoconductivity (MC) curves with high fidelity. The values of five distinct electron parameters (Fermi wavelength, spin precession angles per unit length for two different SOIs, impurity scattering length, and phase coherence length) were thereby extracted from a single MC curve. The methodology developed here is applicable to wide ranges of nanomaterials and devices, providing a diagnostic tool for exotic properties of two-dimensional electron systems.

  11. Aggregation of Electric Current Consumption Features to Extract Maintenance KPIs

    NASA Astrophysics Data System (ADS)

    Simon, Victor; Johansson, Carl-Anders; Galar, Diego

    2017-09-01

    All electric powered machines offer the possibility of extracting information and calculating Key Performance Indicators (KPIs) from the electric current signal. Depending on the time window, sampling frequency and type of analysis, different indicators from the micro to macro level can be calculated for such aspects as maintenance, production, energy consumption etc. On the micro-level, the indicators are generally used for condition monitoring and diagnostics and are normally based on a short time window and a high sampling frequency. The macro indicators are normally based on a longer time window with a slower sampling frequency and are used as indicators for overall performance, cost or consumption. The indicators can be calculated directly from the current signal but can also be based on a combination of information from the current signal and operational data like rpm, position etc. One or several of those indicators can be used for prediction and prognostics of a machine's future behavior. This paper uses this technique to calculate indicators for maintenance and energy optimization in electric powered machines and fleets of machines, especially machine tools.

  12. A data-driven approach for quality assessment of radiologic interpretations.

    PubMed

    Hsu, William; Han, Simon X; Arnold, Corey W; Bui, Alex At; Enzmann, Dieter R

    2016-04-01

    Given the increasing emphasis on delivering high-quality, cost-efficient healthcare, improved methodologies are needed to measure the accuracy and utility of ordered diagnostic examinations in achieving the appropriate diagnosis. Here, we present a data-driven approach for performing automated quality assessment of radiologic interpretations using other clinical information (e.g., pathology) as a reference standard for individual radiologists, subspecialty sections, imaging modalities, and entire departments. Downstream diagnostic conclusions from the electronic medical record are utilized as "truth" to which upstream diagnoses generated by radiology are compared. The described system automatically extracts and compares patient medical data to characterize concordance between clinical sources. Initial results are presented in the context of breast imaging, matching 18 101 radiologic interpretations with 301 pathology diagnoses and achieving a precision and recall of 84% and 92%, respectively. The presented data-driven method highlights the challenges of integrating multiple data sources and the application of information extraction tools to facilitate healthcare quality improvement. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. A fast algorithm for vertex-frequency representations of signals on graphs

    PubMed Central

    Jestrović, Iva; Coyle, James L.; Sejdić, Ervin

    2016-01-01

    The windowed Fourier transform (short time Fourier transform) and the S-transform are widely used signal processing tools for extracting frequency information from non-stationary signals. Previously, the windowed Fourier transform had been adopted for signals on graphs and has been shown to be very useful for extracting vertex-frequency information from graphs. However, high computational complexity makes these algorithms impractical. We sought to develop a fast windowed graph Fourier transform and a fast graph S-transform requiring significantly shorter computation time. The proposed schemes have been tested with synthetic test graph signals and real graph signals derived from electroencephalography recordings made during swallowing. The results showed that the proposed schemes provide significantly lower computation time in comparison with the standard windowed graph Fourier transform and the fast graph S-transform. Also, the results showed that noise has no effect on the results of the algorithm for the fast windowed graph Fourier transform or on the graph S-transform. Finally, we showed that graphs can be reconstructed from the vertex-frequency representations obtained with the proposed algorithms. PMID:28479645

  14. Dynamic analysis and pattern visualization of forest fires.

    PubMed

    Lopes, António M; Tenreiro Machado, J A

    2014-01-01

    This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.

  15. Dynamic Analysis and Pattern Visualization of Forest Fires

    PubMed Central

    Lopes, António M.; Tenreiro Machado, J. A.

    2014-01-01

    This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns. PMID:25137393

  16. MsViz: A Graphical Software Tool for In-Depth Manual Validation and Quantitation of Post-translational Modifications.

    PubMed

    Martín-Campos, Trinidad; Mylonas, Roman; Masselot, Alexandre; Waridel, Patrice; Petricevic, Tanja; Xenarios, Ioannis; Quadroni, Manfredo

    2017-08-04

    Mass spectrometry (MS) has become the tool of choice for the large scale identification and quantitation of proteins and their post-translational modifications (PTMs). This development has been enabled by powerful software packages for the automated analysis of MS data. While data on PTMs of thousands of proteins can nowadays be readily obtained, fully deciphering the complexity and combinatorics of modification patterns even on a single protein often remains challenging. Moreover, functional investigation of PTMs on a protein of interest requires validation of the localization and the accurate quantitation of its changes across several conditions, tasks that often still require human evaluation. Software tools for large scale analyses are highly efficient but are rarely conceived for interactive, in-depth exploration of data on individual proteins. We here describe MsViz, a web-based and interactive software tool that supports manual validation of PTMs and their relative quantitation in small- and medium-size experiments. The tool displays sequence coverage information, peptide-spectrum matches, tandem MS spectra and extracted ion chromatograms through a single, highly intuitive interface. We found that MsViz greatly facilitates manual data inspection to validate PTM location and quantitate modified species across multiple samples.

  17. [Application of regular expression in extracting key information from Chinese medicine literatures about re-evaluation of post-marketing surveillance].

    PubMed

    Wang, Zhifei; Xie, Yanming; Wang, Yongyan

    2011-10-01

    Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.

  18. A novel framework for change detection in bi-temporal polarimetric SAR images

    NASA Astrophysics Data System (ADS)

    Pirrone, Davide; Bovolo, Francesca; Bruzzone, Lorenzo

    2016-10-01

    Last years have seen relevant increase of polarimetric Synthetic Aperture Radar (SAR) data availability, thanks to satellite sensors like Sentinel-1 or ALOS-2 PALSAR-2. The augmented information lying in the additional polarimetric channels represents a possibility for better discriminate different classes of changes in change detection (CD) applications. This work aims at proposing a framework for CD in multi-temporal multi-polarization SAR data. The framework includes both a tool for an effective visual representation of the change information and a method for extracting the multiple-change information. Both components are designed to effectively handle the multi-dimensionality of polarimetric data. In the novel representation, multi-temporal intensity SAR data are employed to compute a polarimetric log-ratio. The multitemporal information of the polarimetric log-ratio image is represented in a multi-dimensional features space, where changes are highlighted in terms of magnitude and direction. This representation is employed to design a novel unsupervised multi-class CD approach. This approach considers a sequential two-step analysis of the magnitude and the direction information for separating non-changed and changed samples. The proposed approach has been validated on a pair of Sentinel-1 data acquired before and after the flood in Tamil-Nadu in 2015. Preliminary results demonstrate that the representation tool is effective and that the use of polarimetric SAR data is promising in multi-class change detection applications.

  19. Automated Interpretation and Extraction of Topographic Information from Time of Flight Secondary Ion Mass Spectrometry Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ievlev, Anton V.; Belianinov, Alexei; Jesse, Stephen

    Time of flight secondary ion mass spectrometry (ToF SIMS) is one of the most powerful characterization tools allowing imaging of the chemical properties of various systems and materials. It allows precise studies of the chemical composition with sub-100-nm lateral and nanometer depth spatial resolution. However, comprehensive interpretation of ToF SIMS results is challengeable, because of the data volume and its multidimensionality. Furthermore, investigation of the samples with pronounced topographical features are complicated by the spectral shift. In this work we developed approach for the comprehensive ToF SIMS data interpretation based on the data analytics and automated extraction of the samplemore » topography based on time of flight shift. We further applied this approach to investigate correlation between biological function and chemical composition in Arabidopsis roots.« less

  20. Automated Interpretation and Extraction of Topographic Information from Time of Flight Secondary Ion Mass Spectrometry Data

    DOE PAGES

    Ievlev, Anton V.; Belianinov, Alexei; Jesse, Stephen; ...

    2017-12-06

    Time of flight secondary ion mass spectrometry (ToF SIMS) is one of the most powerful characterization tools allowing imaging of the chemical properties of various systems and materials. It allows precise studies of the chemical composition with sub-100-nm lateral and nanometer depth spatial resolution. However, comprehensive interpretation of ToF SIMS results is challengeable, because of the data volume and its multidimensionality. Furthermore, investigation of the samples with pronounced topographical features are complicated by the spectral shift. In this work we developed approach for the comprehensive ToF SIMS data interpretation based on the data analytics and automated extraction of the samplemore » topography based on time of flight shift. We further applied this approach to investigate correlation between biological function and chemical composition in Arabidopsis roots.« less

  1. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    PubMed

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  2. Key Relation Extraction from Biomedical Publications.

    PubMed

    Huang, Lan; Wang, Ye; Gong, Leiguang; Kulikowski, Casimir; Bai, Tian

    2017-01-01

    Within the large body of biomedical knowledge, recent findings and discoveries are most often presented as research articles. Their number has been increasing sharply since the turn of the century, presenting ever-growing challenges for search and discovery of knowledge and information related to specific topics of interest, even with the help of advanced online search tools. This is especially true when the goal of a search is to find or discover key relations between important concepts or topic words. We have developed an innovative method for extracting key relations between concepts from abstracts of articles. The method focuses on relations between keywords or topic words in the articles. Early experiments with the method on PubMed publications have shown promising results in searching and discovering keywords and their relationships that are strongly related to the main topic of an article.

  3. Improving the automated detection of refugee/IDP dwellings using the multispectral bands of the WorldView-2 satellite

    NASA Astrophysics Data System (ADS)

    Kemper, Thomas; Gueguen, Lionel; Soille, Pierre

    2012-06-01

    The enumeration of the population remains a critical task in the management of refugee/IDP camps. Analysis of very high spatial resolution satellite data proofed to be an efficient and secure approach for the estimation of dwellings and the monitoring of the camp over time. In this paper we propose a new methodology for the automated extraction of features based on differential morphological decomposition segmentation for feature extraction and interactive training sample selection from the max-tree and min-tree structures. This feature extraction methodology is tested on a WorldView-2 scene of an IDP camp in Darfur Sudan. Special emphasis is given to the additional available bands of the WorldView-2 sensor. The results obtained show that the interactive image information tool is performing very well by tuning the feature extraction to the local conditions. The analysis of different spectral subsets shows that it is possible to obtain good results already with an RGB combination, but by increasing the number of spectral bands the detection of dwellings becomes more accurate. Best results were obtained using all eight bands of WorldView-2 satellite.

  4. Seizure classification in EEG signals utilizing Hilbert-Huang transform

    PubMed Central

    2011-01-01

    Background Classification method capable of recognizing abnormal activities of the brain functionality are either brain imaging or brain signal analysis. The abnormal activity of interest in this study is characterized by a disturbance caused by changes in neuronal electrochemical activity that results in abnormal synchronous discharges. The method aims at helping physicians discriminate between healthy and seizure electroencephalographic (EEG) signals. Method Discrimination in this work is achieved by analyzing EEG signals obtained from freely accessible databases. MATLAB has been used to implement and test the proposed classification algorithm. The analysis in question presents a classification of normal and ictal activities using a feature relied on Hilbert-Huang Transform. Through this method, information related to the intrinsic functions contained in the EEG signal has been extracted to track the local amplitude and the frequency of the signal. Based on this local information, weighted frequencies are calculated and a comparison between ictal and seizure-free determinant intrinsic functions is then performed. Methods of comparison used are the t-test and the Euclidean clustering. Results The t-test results in a P-value < 0.02 and the clustering leads to accurate (94%) and specific (96%) results. The proposed method is also contrasted against the Multivariate Empirical Mode Decomposition that reaches 80% accuracy. Comparison results strengthen the contribution of this paper not only from the accuracy point of view but also with respect to its fast response and ease to use. Conclusion An original tool for EEG signal processing giving physicians the possibility to diagnose brain functionality abnormalities is presented in this paper. The proposed system bears the potential of providing several credible benefits such as fast diagnosis, high accuracy, good sensitivity and specificity, time saving and user friendly. Furthermore, the classification of mode mixing can be achieved using the extracted instantaneous information of every IMF, but it would be most likely a hard task if only the average value is used. Extra benefits of this proposed system include low cost, and ease of interface. All of that indicate the usefulness of the tool and its use as an efficient diagnostic tool. PMID:21609459

  5. Seizure classification in EEG signals utilizing Hilbert-Huang transform.

    PubMed

    Oweis, Rami J; Abdulhay, Enas W

    2011-05-24

    Classification method capable of recognizing abnormal activities of the brain functionality are either brain imaging or brain signal analysis. The abnormal activity of interest in this study is characterized by a disturbance caused by changes in neuronal electrochemical activity that results in abnormal synchronous discharges. The method aims at helping physicians discriminate between healthy and seizure electroencephalographic (EEG) signals. Discrimination in this work is achieved by analyzing EEG signals obtained from freely accessible databases. MATLAB has been used to implement and test the proposed classification algorithm. The analysis in question presents a classification of normal and ictal activities using a feature relied on Hilbert-Huang Transform. Through this method, information related to the intrinsic functions contained in the EEG signal has been extracted to track the local amplitude and the frequency of the signal. Based on this local information, weighted frequencies are calculated and a comparison between ictal and seizure-free determinant intrinsic functions is then performed. Methods of comparison used are the t-test and the Euclidean clustering. The t-test results in a P-value < 0.02 and the clustering leads to accurate (94%) and specific (96%) results. The proposed method is also contrasted against the Multivariate Empirical Mode Decomposition that reaches 80% accuracy. Comparison results strengthen the contribution of this paper not only from the accuracy point of view but also with respect to its fast response and ease to use. An original tool for EEG signal processing giving physicians the possibility to diagnose brain functionality abnormalities is presented in this paper. The proposed system bears the potential of providing several credible benefits such as fast diagnosis, high accuracy, good sensitivity and specificity, time saving and user friendly. Furthermore, the classification of mode mixing can be achieved using the extracted instantaneous information of every IMF, but it would be most likely a hard task if only the average value is used. Extra benefits of this proposed system include low cost, and ease of interface. All of that indicate the usefulness of the tool and its use as an efficient diagnostic tool.

  6. Image processing tools dedicated to quantification in 3D fluorescence microscopy

    NASA Astrophysics Data System (ADS)

    Dieterlen, A.; De Meyer, A.; Colicchio, B.; Le Calvez, S.; Haeberlé, O.; Jacquey, S.

    2006-05-01

    3-D optical fluorescent microscopy now becomes an efficient tool for the volume investigation of living biological samples. Developments in instrumentation have permitted to beat off the conventional Abbe limit. In any case the recorded image can be described by the convolution equation between the original object and the Point Spread Function (PSF) of the acquisition system. Due to the finite resolution of the instrument, the original object is recorded with distortions and blurring, and contaminated by noise. This induces that relevant biological information cannot be extracted directly from raw data stacks. If the goal is 3-D quantitative analysis, then to assess optimal performance of the instrument and to ensure the data acquisition reproducibility, the system characterization is mandatory. The PSF represents the properties of the image acquisition system; we have proposed the use of statistical tools and Zernike moments to describe a 3-D PSF system and to quantify the variation of the PSF. This first step toward standardization is helpful to define an acquisition protocol optimizing exploitation of the microscope depending on the studied biological sample. Before the extraction of geometrical information and/or intensities quantification, the data restoration is mandatory. Reduction of out-of-focus light is carried out computationally by deconvolution process. But other phenomena occur during acquisition, like fluorescence photo degradation named "bleaching", inducing an alteration of information needed for restoration. Therefore, we have developed a protocol to pre-process data before the application of deconvolution algorithms. A large number of deconvolution methods have been described and are now available in commercial package. One major difficulty to use this software is the introduction by the user of the "best" regularization parameters. We have pointed out that automating the choice of the regularization level; also greatly improves the reliability of the measurements although it facilitates the use. Furthermore, to increase the quality and the repeatability of quantitative measurements a pre-filtering of images improves the stability of deconvolution process. In the same way, the PSF prefiltering stabilizes the deconvolution process. We have shown that Zemike polynomials can be used to reconstruct experimental PSF, preserving system characteristics and removing the noise contained in the PSF.

  7. Open Source Software Tool Skyline Reaches Key Agreement with Mass Spectrometer Vendors | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    The full proteomics analysis of a small tumor sample (similar in mass to a few grains of rice) produces well over 500 megabytes of unprocessed "raw" data when analyzed on a mass spectrometer (MS). Thus, for every proteomics experiment there is a vast amount of raw data that must be analyzed and interrogated in order to extract biological information. Moreover, the raw data output from different MS vendors are generally in different formats inhibiting the ability of labs to productively work together.

  8. Automatic design of conformal cooling channels in injection molding tooling

    NASA Astrophysics Data System (ADS)

    Zhang, Yingming; Hou, Binkui; Wang, Qian; Li, Yang; Huang, Zhigao

    2018-02-01

    The generation of cooling system plays an important role in injection molding design. A conformal cooling system can effectively improve molding efficiency and product quality. This paper provides a generic approach for building conformal cooling channels. The centrelines of these channels are generated in two steps. First, we extract conformal loops based on geometric information of product. Second, centrelines in spiral shape are built by blending these loops. We devise algorithms to implement the entire design process. A case study verifies the feasibility of this approach.

  9. Gateways to the FANTOM5 promoter level mammalian expression atlas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lizio, Marina; Harshbarger, Jayson; Shimoji, Hisashi

    The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). In conclusion, this resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

  10. Gateways to the FANTOM5 promoter level mammalian expression atlas

    DOE PAGES

    Lizio, Marina; Harshbarger, Jayson; Shimoji, Hisashi; ...

    2015-01-05

    The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). In conclusion, this resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

  11. A systematic review on popularity, application and characteristics of protein secondary structure prediction tools.

    PubMed

    Kashani-Amin, Elaheh; Tabatabaei-Malazy, Ozra; Sakhteman, Amirhossein; Larijani, Bagher; Ebrahim-Habibi, Azadeh

    2018-02-27

    Prediction of proteins' secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple secondary structure prediction (SSP) options is challenging. The current study is an insight onto currently favored methods and tools, within various contexts. A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of 209 studies were finally found eligible to extract data. Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating a SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. This study provides a comprehensive insight about the recent usage of SSP tools which could be helpful for selecting a proper tool's choice. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. ACME, a GIS tool for Automated Cirque Metric Extraction

    NASA Astrophysics Data System (ADS)

    Spagnolo, Matteo; Pellitero, Ramon; Barr, Iestyn D.; Ely, Jeremy C.; Pellicer, Xavier M.; Rea, Brice R.

    2017-02-01

    Regional scale studies of glacial cirque metrics provide key insights on the (palaeo) environment related to the formation of these erosional landforms. The growing availability of high resolution terrain models means that more glacial cirques can be identified and mapped in the future. However, the extraction of their metrics still largely relies on time consuming manual techniques or the combination of, more or less obsolete, GIS tools. In this paper, a newly coded toolbox is provided for the automated, and comparatively quick, extraction of 16 key glacial cirque metrics; including length, width, circularity, planar and 3D area, elevation, slope, aspect, plan closure and hypsometry. The set of tools, named ACME (Automated Cirque Metric Extraction), is coded in Python, runs in one of the most commonly used GIS packages (ArcGIS) and has a user friendly interface. A polygon layer of mapped cirques is required for all metrics, while a Digital Terrain Model and a point layer of cirque threshold midpoints are needed to run some of the tools. Results from ACME are comparable to those from other techniques and can be obtained rapidly, allowing large cirque datasets to be analysed and potentially important regional trends highlighted.

  13. SU-F-T-458: Tracking Trends of TG-142 Parameters Via Analysis of Data Recorded by 2D Chamber Array

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alexandrian, A; Kabat, C; Defoor, D

    Purpose: With increasing QA demands of medical physicists in clinical radiation oncology, the need for an effective method of tracking clinical data has become paramount. A tool was produced which scans through data automatically recorded by a 2D chamber array and extracts relevant information recommended by TG-142. Using this extracted information a timely and comprehensive analysis of QA parameters can be easily performed enabling efficient monthly checks on multiple linear accelerators simultaneously. Methods: A PTW STARCHECK chamber array was used to record several months of beam outputs from two Varian 2100 series linear accelerators and a Varian NovalisTx−. In conjunctionmore » with the chamber array, a beam quality phantom was used to simultaneously to determine beam quality. A minimalist GUI was created in MatLab that allows a user to set the file path of the data for each modality to be analyzed. These file paths are recorded to a MatLab structure and then subsequently accessed by a script written in Python (version 3.5.1) which then extracts values required to perform monthly checks as outlined by recommendations from TG-142. The script incorporates calculations to determine if the values recorded by the chamber array fall within an acceptable threshold. Results: Values obtained by the script are written to a spreadsheet where results can be easily viewed and annotated with a “pass” or “fail” and saved for further analysis. In addition to creating a new scheme for reviewing monthly checks, this application allows for able to succinctly store data for follow up analysis. Conclusion: By utilizing this tool, parameters recommended by TG-142 for multiple linear accelerators can be rapidly obtained and analyzed which can be used for evaluation of monthly checks.« less

  14. Image edge detection based tool condition monitoring with morphological component analysis.

    PubMed

    Yu, Xiaolong; Lin, Xin; Dai, Yiquan; Zhu, Kunpeng

    2017-07-01

    The measurement and monitoring of tool condition are keys to the product precision in the automated manufacturing. To meet the need, this study proposes a novel tool wear monitoring approach based on the monitored image edge detection. Image edge detection has been a fundamental tool to obtain features of images. This approach extracts the tool edge with morphological component analysis. Through the decomposition of original tool wear image, the approach reduces the influence of texture and noise for edge measurement. Based on the target image sparse representation and edge detection, the approach could accurately extract the tool wear edge with continuous and complete contour, and is convenient in charactering tool conditions. Compared to the celebrated algorithms developed in the literature, this approach improves the integrity and connectivity of edges, and the results have shown that it achieves better geometry accuracy and lower error rate in the estimation of tool conditions. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  15. Challenges in Managing Information Extraction

    ERIC Educational Resources Information Center

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  16. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

    PubMed

    Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

    2010-07-02

    The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.

  17. Harnessing Biomedical Natural Language Processing Tools to Identify Medicinal Plant Knowledge from Historical Texts.

    PubMed

    Sharma, Vivekanand; Law, Wayne; Balick, Michael J; Sarkar, Indra Neil

    2017-01-01

    The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement.

  18. Harnessing Biomedical Natural Language Processing Tools to Identify Medicinal Plant Knowledge from Historical Texts

    PubMed Central

    Sharma, Vivekanand; Law, Wayne; Balick, Michael J.; Sarkar, Indra Neil

    2017-01-01

    The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement. PMID:29854223

  19. FCDD: A Database for Fruit Crops Diseases.

    PubMed

    Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal

    2014-01-01

    Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/

  20. Versatile electrophoresis-based self-test platform.

    PubMed

    Guijt, Rosanne M

    2015-03-01

    Lab on a Chip technology offers the possibility to extract chemical information from a complex sample in a simple, automated way without the need for a laboratory setting. In the health care sector, this chemical information could be used as a diagnostic tool for example to inform dosing. In this issue, the research underpinning a family of electrophoresis-based point-of-care devices for self-testing of ionic analytes in various sample matrices is described [Electrophoresis 2015, 36, 712-721.]. Hardware, software, and methodological chances made to improve the overall analytical performance in terms of accuracy, precision, detection limit, and reliability are discussed. In addition to the main focus of lithium monitoring, new applications including the use of the platform for veterinary purposes, sodium, and for creatinine measurements are included. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013

    PubMed Central

    2015-01-01

    Background Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two event extraction tasks introduced in the BioNLP Shared Task 2013. The CG task focuses on cancer, emphasizing the extraction of physiological and pathological processes at various levels of biological organization, and the PC task targets reactions relevant to the development of biomolecular pathway models, defining its extraction targets on the basis of established pathway representations and ontologies. Results Six groups participated in the CG task and two groups in the PC task, together applying a wide range of extraction approaches including both established state-of-the-art systems and newly introduced extraction methods. The best-performing systems achieved F-scores of 55% on the CG task and 53% on the PC task, demonstrating a level of performance comparable to the best results achieved in similar previously proposed tasks. Conclusions The results indicate that existing event extraction technology can generalize to meet the novel challenges represented by the CG and PC task settings, suggesting that extraction methods are capable of supporting the construction of knowledge bases on the molecular mechanisms of cancer and the curation of biomolecular pathway models. The CG and PC tasks continue as open challenges for all interested parties, with data, tools and resources available from the shared task homepage. PMID:26202570

  2. Fractal characteristic in the wearing of cutting tool

    NASA Astrophysics Data System (ADS)

    Mei, Anhua; Wang, Jinghui

    1995-11-01

    This paper studies the cutting tool wear with fractal geometry. The wearing image of the flank has been collected by machine vision which consists of CCD camera and personal computer. After being processed by means of preserving smoothing, binary making and edge extracting, the clear boundary enclosing the worn area has been obtained. The fractal dimension of the worn surface is calculated by the methods called `Slit Island' and `Profile'. The experiments and calciating give the conclusion that the worn surface is enclosed by a irregular boundary curve with some fractal dimension and characteristics of self-similarity. Furthermore, the relation between the cutting velocity and the fractal dimension of the worn region has been submitted. This paper presents a series of methods for processing and analyzing the fractal information in the blank wear, which can be applied to research the projective relation between the fractal structure and the wear state, and establish the fractal model of the cutting tool wear.

  3. Microchip-Based Single-Cell Functional Proteomics for Biomedical Applications

    PubMed Central

    Lu, Yao; Yang, Liu; Wei, Wei; Shi, Qihui

    2017-01-01

    Cellular heterogeneity has been widely recognized but only recently have single cell tools become available that allow characterizing heterogeneity at the genomic and proteomic levels. We review the technological advances in microchip-based toolkits for single-cell functional proteomics. Each of these tools has distinct advantages and limitations, and a few have advanced toward being applied to address biological or clinical problems that fail to be addressed by traditional population-based methods. High-throughput single-cell proteomic assays generate high-dimensional data sets that contain new information and thus require developing new analytical framework to extract new biology. In this review article, we highlight a few biological and clinical applications in which the microchip-based single-cell proteomic tools provide unique advantages. The examples include resolving functional heterogeneity and dynamics of immune cells, dissecting cell-cell interaction by creating well-contolled on-chip microenvironment, capturing high-resolution snapshots of immune system functions in patients for better immunotherapy and elucidating phosphoprotein signaling networks in cancer cells for guiding effective molecularly targeted therapies. PMID:28280819

  4. bwtool: a tool for bigWig files

    PubMed Central

    Pohl, Andy; Beato, Miguel

    2014-01-01

    BigWig files are a compressed, indexed, binary format for genome-wide signal data for calculations (e.g. GC percent) or experiments (e.g. ChIP-seq/RNA-seq read depth). bwtool is a tool designed to read bigWig files rapidly and efficiently, providing functionality for extracting data and summarizing it in several ways, globally or at specific regions. Additionally, the tool enables the conversion of the positions of signal data from one genome assembly to another, also known as ‘lifting’. We believe bwtool can be useful for the analyst frequently working with bigWig data, which is becoming a standard format to represent functional signals along genomes. The article includes supplementary examples of running the software. Availability and implementation: The C source code is freely available under the GNU public license v3 at http://cromatina.crg.eu/bwtool. Contact: andrew.pohl@crg.eu, andypohl@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24489365

  5. Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability.

    PubMed

    Ho, Daniel W H; Sze, Karen M F; Ng, Irene O L

    2015-08-28

    Viral integration into the human genome upon infection is an important risk factor for various human malignancies. We developed viral integration site detection tool called Virus-Clip, which makes use of information extracted from soft-clipped sequencing reads to identify exact positions of human and virus breakpoints of integration events. With initial read alignment to virus reference genome and streamlined procedures, Virus-Clip delivers a simple, fast and memory-efficient solution to viral integration site detection. Moreover, it can also automatically annotate the integration events with the corresponding affected human genes. Virus-Clip has been verified using whole-transcriptome sequencing data and its detection was validated to have satisfactory sensitivity and specificity. Marked advancement in performance was detected, compared to existing tools. It is applicable to versatile types of data including whole-genome sequencing, whole-transcriptome sequencing, and targeted sequencing. Virus-Clip is available at http://web.hku.hk/~dwhho/Virus-Clip.zip.

  6. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes

    PubMed Central

    Cañada, Andres; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso

    2017-01-01

    Abstract A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es PMID:28531339

  7. Systematic review to inform prevention and management of chronic disease for Indigenous Australians: overview and priorities.

    PubMed

    Gomersall, Judith Streak; Canuto, Karla; Aromataris, Edoardo; Braunack-Mayer, Annette; Brown, Alex

    2016-02-01

    To describe the main characteristics of systematic reviews addressing questions of chronic disease and related risk factors for Indigenous Australians. We searched databases for systematic reviews meeting inclusion criteria. Two reviewers assessed quality and extracted characteristics using pre-defined tools. We identified 14 systematic reviews. Seven synthesised evidence about health intervention effectiveness; four addressed chronic disease or risk factor prevalence; and six conducted critical appraisal as per current best practice. Only three reported steps to align the review with standards for ethical research with Indigenous Australians and/or capture Indigenous-specific knowledge. Most called for more high-quality research. Systematic review is an under-utilised method for gathering evidence to inform chronic disease prevention and management for Indigenous Australians. Relevance of future systematic reviews could be improved by: 1) aligning questions with community priorities as well as decision maker needs; 2) involvement of, and leadership by, Indigenous researchers with relevant cultural and contextual knowledge; iii) use of critical appraisal tools that include traditional risk of bias assessment criteria and criteria that reflect Indigenous standards of appropriate research. Systematic review method guidance, tools and reporting standards are required to ensure alignment with ethical obligations and promote rigor and relevance. © 2015 Public Health Association of Australia.

  8. MitoRes: a resource of nuclear-encoded mitochondrial genes and their products in Metazoa.

    PubMed

    Catalano, Domenico; Licciulli, Flavio; Turi, Antonio; Grillo, Giorgio; Saccone, Cecilia; D'Elia, Domenica

    2006-01-24

    Mitochondria are sub-cellular organelles that have a central role in energy production and in other metabolic pathways of all eukaryotic respiring cells. In the last few years, with more and more genomes being sequenced, a huge amount of data has been generated providing an unprecedented opportunity to use the comparative analysis approach in studies of evolution and functional genomics with the aim of shedding light on molecular mechanisms regulating mitochondrial biogenesis and metabolism. In this context, the problem of the optimal extraction of representative datasets of genomic and proteomic data assumes a crucial importance. Specialised resources for nuclear-encoded mitochondria-related proteins already exist; however, no mitochondrial database is currently available with the same features of MitoRes, which is an update of the MitoNuc database extensively modified in its structure, data sources and graphical interface. It contains data on nuclear-encoded mitochondria-related products for any metazoan species for which this type of data is available and also provides comprehensive sequence datasets (gene, transcript and protein) as well as useful tools for their extraction and export. MitoRes http://www2.ba.itb.cnr.it/MitoRes/ consolidates information from publicly external sources and automatically annotates them into a relational database. Additionally, it also clusters proteins on the basis of their sequence similarity and interconnects them with genomic data. The search engine and sequence management tools allow the query/retrieval of the database content and the extraction and export of sequences (gene, transcript, protein) and related sub-sequences (intron, exon, UTR, CDS, signal peptide and gene flanking regions) ready to be used for in silico analysis. The tool we describe here has been developed to support lab scientists and bioinformaticians alike in the characterization of molecular features and evolution of mitochondrial targeting sequences. The way it provides for the retrieval and extraction of sequences allows the user to overcome the obstacles encountered in the integrative use of different bioinformatic resources and the completeness of the sequence collection allows intra- and interspecies comparison at different biological levels (gene, transcript and protein).

  9. Visualization of Documents and Concepts in Neuroinformatics with the 3D-SE Viewer

    PubMed Central

    Naud, Antoine; Usui, Shiro; Ueda, Naonori; Taniguchi, Tatsuki

    2007-01-01

    A new interactive visualization tool is proposed for mining text data from various fields of neuroscience. Applications to several text datasets are presented to demonstrate the capability of the proposed interactive tool to visualize complex relationships between pairs of lexical entities (with some semantic contents) such as terms, keywords, posters, or papers' abstracts. Implemented as a Java applet, this tool is based on the spherical embedding (SE) algorithm, which was designed for the visualization of bipartite graphs. Items such as words and documents are linked on the basis of occurrence relationships, which can be represented in a bipartite graph. These items are visualized by embedding the vertices of the bipartite graph on spheres in a three-dimensional (3-D) space. The main advantage of the proposed visualization tool is that 3-D layouts can convey more information than planar or linear displays of items or graphs. Different kinds of information extracted from texts, such as keywords, indexing terms, or topics are visualized, allowing interactive browsing of various fields of research featured by keywords, topics, or research teams. A typical use of the 3D-SE viewer is quick browsing of topics displayed on a sphere, then selecting one or several item(s) displays links to related terms on another sphere representing, e.g., documents or abstracts, and provides direct online access to the document source in a database, such as the Visiome Platform or the SfN Annual Meeting. Developed as a Java applet, it operates as a tool on top of existing resources. PMID:18974802

  10. Visualization of Documents and Concepts in Neuroinformatics with the 3D-SE Viewer.

    PubMed

    Naud, Antoine; Usui, Shiro; Ueda, Naonori; Taniguchi, Tatsuki

    2007-01-01

    A new interactive visualization tool is proposed for mining text data from various fields of neuroscience. Applications to several text datasets are presented to demonstrate the capability of the proposed interactive tool to visualize complex relationships between pairs of lexical entities (with some semantic contents) such as terms, keywords, posters, or papers' abstracts. Implemented as a Java applet, this tool is based on the spherical embedding (SE) algorithm, which was designed for the visualization of bipartite graphs. Items such as words and documents are linked on the basis of occurrence relationships, which can be represented in a bipartite graph. These items are visualized by embedding the vertices of the bipartite graph on spheres in a three-dimensional (3-D) space. The main advantage of the proposed visualization tool is that 3-D layouts can convey more information than planar or linear displays of items or graphs. Different kinds of information extracted from texts, such as keywords, indexing terms, or topics are visualized, allowing interactive browsing of various fields of research featured by keywords, topics, or research teams. A typical use of the 3D-SE viewer is quick browsing of topics displayed on a sphere, then selecting one or several item(s) displays links to related terms on another sphere representing, e.g., documents or abstracts, and provides direct online access to the document source in a database, such as the Visiome Platform or the SfN Annual Meeting. Developed as a Java applet, it operates as a tool on top of existing resources.

  11. Cancer patients on Twitter: a novel patient community on social media.

    PubMed

    Sugawara, Yuya; Narimatsu, Hiroto; Hozawa, Atsushi; Shao, Li; Otani, Katsumi; Fukao, Akira

    2012-12-27

    Patients increasingly turn to the Internet for information on medical conditions, including clinical news and treatment options. In recent years, an online patient community has arisen alongside the rapidly expanding world of social media, or "Web 2.0." Twitter provides real-time dissemination of news, information, personal accounts and other details via a highly interactive form of social media, and has become an important online tool for patients. This medium is now considered to play an important role in the modern social community of online, "wired" cancer patients. Fifty-one highly influential "power accounts" belonging to cancer patients were extracted from a dataset of 731 Twitter accounts with cancer terminology in their profiles. In accordance with previously established methodology, "power accounts" were defined as those Twitter accounts with 500 or more followers. We extracted data on the cancer patient (female) with the most followers to study the specific relationships that existed between the user and her followers, and found that the majority of the examined tweets focused on greetings, treatment discussions, and other instances of psychological support. These findings went against our hypothesis that cancer patients' tweets would be centered on the dissemination of medical information and similar "newsy" details. At present, there exists a rapidly evolving network of cancer patients engaged in information exchange via Twitter. This network is valuable in the sharing of psychological support among the cancer community.

  12. A tool for filtering information in complex systems

    PubMed Central

    Tumminello, M.; Aste, T.; Di Matteo, T.; Mantegna, R. N.

    2005-01-01

    We introduce a technique to filter out complex data sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation-based graphs, giving filtered graphs that preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0), triangular loops and four-element cliques are formed. The application of this filtering procedure to 100 stocks in the U.S. equity markets shows that such loops and cliques have important and significant relationships with the market structure and properties. PMID:16027373

  13. A tool for filtering information in complex systems.

    PubMed

    Tumminello, M; Aste, T; Di Matteo, T; Mantegna, R N

    2005-07-26

    We introduce a technique to filter out complex data sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation-based graphs, giving filtered graphs that preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0), triangular loops and four-element cliques are formed. The application of this filtering procedure to 100 stocks in the U.S. equity markets shows that such loops and cliques have important and significant relationships with the market structure and properties.

  14. Building a standards-based and collaborative e-prescribing tool: MyRxPad.

    PubMed

    Nelson, Stuart J; Zeng, Kelly; Kilbourne, John

    2011-01-01

    MyRxPad (rxp.nlm.nih.gov) is a prototype application intended to enable a practitioner-patient collaborative approach towards e-prescribing: patients play an active role by maintaining up-to-date and accurate medication lists. Prescribers make well-informed and safe prescribing decisions based on personal medication records contributed by patients. MyRxPad is thus the vehicle for collaborations with patients using MyMedicationList (MML). Integration with personal medication records in the context of e-prescribing is thus enabled. We present our experience in applying RxNorm in an e-prescribing setting: using standard names and codes to capture prescribed medication as well as extracting information from RxNorm to support medication-related clinical decision.

  15. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    PubMed Central

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi

    2018-01-01

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625

  16. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.

    PubMed

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi

    2018-02-27

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

  17. Quantitative X-ray Map Analyser (Q-XRMA): A new GIS-based statistical approach to Mineral Image Analysis

    NASA Astrophysics Data System (ADS)

    Ortolano, Gaetano; Visalli, Roberto; Godard, Gaston; Cirrincione, Rosolino

    2018-06-01

    We present a new ArcGIS®-based tool developed in the Python programming language for calibrating EDS/WDS X-ray element maps, with the aim of acquiring quantitative information of petrological interest. The calibration procedure is based on a multiple linear regression technique that takes into account interdependence among elements and is constrained by the stoichiometry of minerals. The procedure requires an appropriate number of spot analyses for use as internal standards and provides several test indexes for a rapid check of calibration accuracy. The code is based on an earlier image-processing tool designed primarily for classifying minerals in X-ray element maps; the original Python code has now been enhanced to yield calibrated maps of mineral end-members or the chemical parameters of each classified mineral. The semi-automated procedure can be used to extract a dataset that is automatically stored within queryable tables. As a case study, the software was applied to an amphibolite-facies garnet-bearing micaschist. The calibrated images obtained for both anhydrous (i.e., garnet and plagioclase) and hydrous (i.e., biotite) phases show a good fit with corresponding electron microprobe analyses. This new GIS-based tool package can thus find useful application in petrology and materials science research. Moreover, the huge quantity of data extracted opens new opportunities for the development of a thin-section microchemical database that, using a GIS platform, can be linked with other major global geoscience databases.

  18. PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data.

    PubMed

    Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun

    2006-08-25

    We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.

  19. a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information

    NASA Astrophysics Data System (ADS)

    Lian, Shizhong; Chen, Jiangping; Luo, Minghai

    2016-06-01

    Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.

  20. Global catalogue of microorganisms (gcm): a comprehensive database and information retrieval, analysis, and visualization system for microbial resources

    PubMed Central

    2013-01-01

    Background Throughout the long history of industrial and academic research, many microbes have been isolated, characterized and preserved (whenever possible) in culture collections. With the steady accumulation in observational data of biodiversity as well as microbial sequencing data, bio-resource centers have to function as data and information repositories to serve academia, industry, and regulators on behalf of and for the general public. Hence, the World Data Centre for Microorganisms (WDCM) started to take its responsibility for constructing an effective information environment that would promote and sustain microbial research data activities, and bridge the gaps currently present within and outside the microbiology communities. Description Strain catalogue information was collected from collections by online submission. We developed tools for automatic extraction of strain numbers and species names from various sources, including Genbank, Pubmed, and SwissProt. These new tools connect strain catalogue information with the corresponding nucleotide and protein sequences, as well as to genome sequence and references citing a particular strain. All information has been processed and compiled in order to create a comprehensive database of microbial resources, and was named Global Catalogue of Microorganisms (GCM). The current version of GCM contains information of over 273,933 strains, which includes 43,436bacterial, fungal and archaea species from 52 collections in 25 countries and regions. A number of online analysis and statistical tools have been integrated, together with advanced search functions, which should greatly facilitate the exploration of the content of GCM. Conclusion A comprehensive dynamic database of microbial resources has been created, which unveils the resources preserved in culture collections especially for those whose informatics infrastructures are still under development, which should foster cumulative research, facilitating the activities of microbiologists world-wide, who work in both public and industrial research centres. This database is available from http://gcm.wfcc.info. PMID:24377417

Top