Sample records for automatically extract information

  1. Analysis of Technique to Extract Data from the Web for Improved Performance

    NASA Astrophysics Data System (ADS)

    Gupta, Neena; Singh, Manish

    2010-11-01

    The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.

  2. Presentation video retrieval using automatically recovered slide and spoken text

    NASA Astrophysics Data System (ADS)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  3. Automatic Extraction of Urban Built-Up Area Based on Object-Oriented Method and Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Li, L.; Zhou, H.; Wen, Q.; Chen, T.; Guan, F.; Ren, B.; Yu, H.; Wang, Z.

    2018-04-01

    Built-up area marks the use of city construction land in the different periods of the development, the accurate extraction is the key to the studies of the changes of urban expansion. This paper studies the technology of automatic extraction of urban built-up area based on object-oriented method and remote sensing data, and realizes the automatic extraction of the main built-up area of the city, which saves the manpower cost greatly. First, the extraction of construction land based on object-oriented method, the main technical steps include: (1) Multi-resolution segmentation; (2) Feature Construction and Selection; (3) Information Extraction of Construction Land Based on Rule Set, The characteristic parameters used in the rule set mainly include the mean of the red band (Mean R), Normalized Difference Vegetation Index (NDVI), Ratio of residential index (RRI), Blue band mean (Mean B), Through the combination of the above characteristic parameters, the construction site information can be extracted. Based on the degree of adaptability, distance and area of the object domain, the urban built-up area can be quickly and accurately defined from the construction land information without depending on other data and expert knowledge to achieve the automatic extraction of the urban built-up area. In this paper, Beijing city as an experimental area for the technical methods of the experiment, the results show that: the city built-up area to achieve automatic extraction, boundary accuracy of 2359.65 m to meet the requirements. The automatic extraction of urban built-up area has strong practicality and can be applied to the monitoring of the change of the main built-up area of city.

  4. Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians' Patient Care Decisions.

    PubMed

    Kim, Heejun; Bian, Jiantao; Mostafa, Javed; Jonnalagadda, Siddhartha; Del Fiol, Guilherme

    2016-01-01

    Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians.

  5. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org

  6. Information retrieval and terminology extraction in online resources for patients with diabetes.

    PubMed

    Seljan, Sanja; Baretić, Maja; Kucis, Vlasta

    2014-06-01

    Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.

  7. Automatic extraction of road features in urban environments using dense ALS data

    NASA Astrophysics Data System (ADS)

    Soilán, Mario; Truong-Hong, Linh; Riveiro, Belén; Laefer, Debra

    2018-02-01

    This paper describes a methodology that automatically extracts semantic information from urban ALS data for urban parameterization and road network definition. First, building façades are segmented from the ground surface by combining knowledge-based information with both voxel and raster data. Next, heuristic rules and unsupervised learning are applied to the ground surface data to distinguish sidewalk and pavement points as a means for curb detection. Then radiometric information was employed for road marking extraction. Using high-density ALS data from Dublin, Ireland, this fully automatic workflow was able to generate a F-score close to 95% for pavement and sidewalk identification with a resolution of 20 cm and better than 80% for road marking detection.

  8. Automatic Extraction of Metadata from Scientific Publications for CRIS Systems

    ERIC Educational Resources Information Center

    Kovacevic, Aleksandar; Ivanovic, Dragan; Milosavljevic, Branko; Konjovic, Zora; Surla, Dusan

    2011-01-01

    Purpose: The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS). Design/methodology/approach: The system is based on machine learning and performs automatic extraction…

  9. Integrating Information Extraction Agents into a Tourism Recommender System

    NASA Astrophysics Data System (ADS)

    Esparcia, Sergio; Sánchez-Anguix, Víctor; Argente, Estefanía; García-Fornes, Ana; Julián, Vicente

    Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system.

  10. Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians’ Patient Care Decisions

    PubMed Central

    Kim, Heejun; Bian, Jiantao; Mostafa, Javed; Jonnalagadda, Siddhartha; Del Fiol, Guilherme

    2016-01-01

    Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians. PMID:28269867

  11. System for definition of the central-chest vasculature

    NASA Astrophysics Data System (ADS)

    Taeprasartsit, Pinyo; Higgins, William E.

    2009-02-01

    Accurate definition of the central-chest vasculature from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. For instance, the aorta and pulmonary artery help in automatic definition of the Mountain lymph-node stations for lung-cancer staging. This work presents a system for defining major vascular structures in the central chest. The system provides automatic methods for extracting the aorta and pulmonary artery and semi-automatic methods for extracting the other major central chest arteries/veins, such as the superior vena cava and azygos vein. Automatic aorta and pulmonary artery extraction are performed by model fitting and selection. The system also extracts certain vascular structure information to validate outputs. A semi-automatic method extracts vasculature by finding the medial axes between provided important sites. Results of the system are applied to lymph-node station definition and guidance of bronchoscopic biopsy.

  12. Automatic updating and 3D modeling of airport information from high resolution images using GIS and LIDAR data

    NASA Astrophysics Data System (ADS)

    Lv, Zheng; Sui, Haigang; Zhang, Xilin; Huang, Xianfeng

    2007-11-01

    As one of the most important geo-spatial objects and military establishment, airport is always a key target in fields of transportation and military affairs. Therefore, automatic recognition and extraction of airport from remote sensing images is very important and urgent for updating of civil aviation and military application. In this paper, a new multi-source data fusion approach on automatic airport information extraction, updating and 3D modeling is addressed. Corresponding key technologies including feature extraction of airport information based on a modified Ostu algorithm, automatic change detection based on new parallel lines-based buffer detection algorithm, 3D modeling based on gradual elimination of non-building points algorithm, 3D change detecting between old airport model and LIDAR data, typical CAD models imported and so on are discussed in detail. At last, based on these technologies, we develop a prototype system and the results show our method can achieve good effects.

  13. Automatic Molar Extraction from Dental Panoramic Radiographs for Forensic Personal Identification

    NASA Astrophysics Data System (ADS)

    Samopa, Febriliyan; Asano, Akira; Taguchi, Akira

    Measurement of an individual molar provides rich information for forensic personal identification. We propose a computer-based system for extracting an individual molar from dental panoramic radiographs. A molar is obtained by extracting the region-of-interest, separating the maxilla and mandible, and extracting the boundaries between teeth. The proposed system is almost fully automatic; all that the user has to do is clicking three points on the boundary between the maxilla and the mandible.

  14. Impact of translation on named-entity recognition in radiology texts

    PubMed Central

    Pedro, Vasco

    2017-01-01

    Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455

  15. Research on Automatic Classification, Indexing and Extracting. Annual Progress Report.

    ERIC Educational Resources Information Center

    Baker, F.T.; And Others

    In order to contribute to the success of several studies for automatic classification, indexing and extracting currently in progress, as well as to further the theoretical and practical understanding of textual item distributions, the development of a frequency program capable of supplying these types of information was undertaken. The program…

  16. An automatic method for retrieving and indexing catalogues of biomedical courses.

    PubMed

    Maojo, Victor; de la Calle, Guillermo; García-Remesal, Miguel; Bankauskaite, Vaida; Crespo, Jose

    2008-11-06

    Although there is wide information about Biomedical Informatics education and courses in different Websites, information is usually not exhaustive and difficult to update. We propose a new methodology based on information retrieval techniques for extracting, indexing and retrieving automatically information about educational offers. A web application has been developed to make available such information in an inventory of courses and educational offers.

  17. Research on Remote Sensing Geological Information Extraction Based on Object Oriented Classification

    NASA Astrophysics Data System (ADS)

    Gao, Hui

    2018-04-01

    The northern Tibet belongs to the Sub cold arid climate zone in the plateau. It is rarely visited by people. The geological working conditions are very poor. However, the stratum exposures are good and human interference is very small. Therefore, the research on the automatic classification and extraction of remote sensing geological information has typical significance and good application prospect. Based on the object-oriented classification in Northern Tibet, using the Worldview2 high-resolution remote sensing data, combined with the tectonic information and image enhancement, the lithological spectral features, shape features, spatial locations and topological relations of various geological information are excavated. By setting the threshold, based on the hierarchical classification, eight kinds of geological information were classified and extracted. Compared with the existing geological maps, the accuracy analysis shows that the overall accuracy reached 87.8561 %, indicating that the classification-oriented method is effective and feasible for this study area and provides a new idea for the automatic extraction of remote sensing geological information.

  18. A Risk Assessment System with Automatic Extraction of Event Types

    NASA Astrophysics Data System (ADS)

    Capet, Philippe; Delavallade, Thomas; Nakamura, Takuya; Sandor, Agnes; Tarsitano, Cedric; Voyatzi, Stavroula

    In this article we describe the joint effort of experts in linguistics, information extraction and risk assessment to integrate EventSpotter, an automatic event extraction engine, into ADAC, an automated early warning system. By detecting as early as possible weak signals of emerging risks ADAC provides a dynamic synthetic picture of situations involving risk. The ADAC system calculates risk on the basis of fuzzy logic rules operated on a template graph whose leaves are event types. EventSpotter is based on a general purpose natural language dependency parser, XIP, enhanced with domain-specific lexical resources (Lexicon-Grammar). Its role is to automatically feed the leaves with input data.

  19. Rapid automatic keyword extraction for information retrieval and analysis

    DOEpatents

    Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA

    2012-03-06

    Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

  20. Construction of Green Tide Monitoring System and Research on its Key Techniques

    NASA Astrophysics Data System (ADS)

    Xing, B.; Li, J.; Zhu, H.; Wei, P.; Zhao, Y.

    2018-04-01

    As a kind of marine natural disaster, Green Tide has been appearing every year along the Qingdao Coast, bringing great loss to this region, since the large-scale bloom in 2008. Therefore, it is of great value to obtain the real time dynamic information about green tide distribution. In this study, methods of optical remote sensing and microwave remote sensing are employed in Green Tide Monitoring Research. A specific remote sensing data processing flow and a green tide information extraction algorithm are designed, according to the optical and microwave data of different characteristics. In the aspect of green tide spatial distribution information extraction, an automatic extraction algorithm of green tide distribution boundaries is designed based on the principle of mathematical morphology dilation/erosion. And key issues in information extraction, including the division of green tide regions, the obtaining of basic distributions, the limitation of distribution boundary, and the elimination of islands, have been solved. The automatic generation of green tide distribution boundaries from the results of remote sensing information extraction is realized. Finally, a green tide monitoring system is built based on IDL/GIS secondary development in the integrated environment of RS and GIS, achieving the integration of RS monitoring and information extraction.

  1. An automatic rat brain extraction method based on a deformable surface model.

    PubMed

    Li, Jiehua; Liu, Xiaofeng; Zhuo, Jiachen; Gullapalli, Rao P; Zara, Jason M

    2013-08-15

    The extraction of the brain from the skull in medical images is a necessary first step before image registration or segmentation. While pre-clinical MR imaging studies on small animals, such as rats, are increasing, fully automatic imaging processing techniques specific to small animal studies remain lacking. In this paper, we present an automatic rat brain extraction method, the Rat Brain Deformable model method (RBD), which adapts the popular human brain extraction tool (BET) through the incorporation of information on the brain geometry and MR image characteristics of the rat brain. The robustness of the method was demonstrated on T2-weighted MR images of 64 rats and compared with other brain extraction methods (BET, PCNN, PCNN-3D). The results demonstrate that RBD reliably extracts the rat brain with high accuracy (>92% volume overlap) and is robust against signal inhomogeneity in the images. Copyright © 2013 Elsevier B.V. All rights reserved.

  2. A novel murmur-based heart sound feature extraction technique using envelope-morphological analysis

    NASA Astrophysics Data System (ADS)

    Yao, Hao-Dong; Ma, Jia-Li; Fu, Bin-Bin; Wang, Hai-Yang; Dong, Ming-Chui

    2015-07-01

    Auscultation of heart sound (HS) signals serves as an important primary approach to diagnose cardiovascular diseases (CVDs) for centuries. Confronting the intrinsic drawbacks of traditional HS auscultation, computer-aided automatic HS auscultation based on feature extraction technique has witnessed explosive development. Yet, most existing HS feature extraction methods adopt acoustic or time-frequency features which exhibit poor relationship with diagnostic information, thus restricting the performance of further interpretation and analysis. Tackling such a bottleneck problem, this paper innovatively proposes a novel murmur-based HS feature extraction method since murmurs contain massive pathological information and are regarded as the first indications of pathological occurrences of heart valves. Adapting discrete wavelet transform (DWT) and Shannon envelope, the envelope-morphological characteristics of murmurs are obtained and three features are extracted accordingly. Validated by discriminating normal HS and 5 various abnormal HS signals with extracted features, the proposed method provides an attractive candidate in automatic HS auscultation.

  3. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information.

    PubMed

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Yan, Bin; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition.

  4. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information

    PubMed Central

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. PMID:26380294

  5. Automatic information extraction from unstructured mammography reports using distributed semantics.

    PubMed

    Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L

    2018-02-01

    To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Automatic River Network Extraction from LIDAR Data

    NASA Astrophysics Data System (ADS)

    Maderal, E. N.; Valcarcel, N.; Delgado, J.; Sevilla, C.; Ojeda, J. C.

    2016-06-01

    National Geographic Institute of Spain (IGN-ES) has launched a new production system for automatic river network extraction for the Geospatial Reference Information (GRI) within hydrography theme. The goal is to get an accurate and updated river network, automatically extracted as possible. For this, IGN-ES has full LiDAR coverage for the whole Spanish territory with a density of 0.5 points per square meter. To implement this work, it has been validated the technical feasibility, developed a methodology to automate each production phase: hydrological terrain models generation with 2 meter grid size and river network extraction combining hydrographic criteria (topographic network) and hydrological criteria (flow accumulation river network), and finally the production was launched. The key points of this work has been managing a big data environment, more than 160,000 Lidar data files, the infrastructure to store (up to 40 Tb between results and intermediate files), and process; using local virtualization and the Amazon Web Service (AWS), which allowed to obtain this automatic production within 6 months, it also has been important the software stability (TerraScan-TerraSolid, GlobalMapper-Blue Marble , FME-Safe, ArcGIS-Esri) and finally, the human resources managing. The results of this production has been an accurate automatic river network extraction for the whole country with a significant improvement for the altimetric component of the 3D linear vector. This article presents the technical feasibility, the production methodology, the automatic river network extraction production and its advantages over traditional vector extraction systems.

  7. Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network.

    PubMed

    Yang, Zhongliang; Huang, Yongfeng; Jiang, Yiran; Sun, Yuxi; Zhang, Yu-Jin; Luo, Pengcheng

    2018-04-20

    Automatically extracting useful information from electronic medical records along with conducting disease diagnoses is a promising task for both clinical decision support(CDS) and neural language processing(NLP). Most of the existing systems are based on artificially constructed knowledge bases, and then auxiliary diagnosis is done by rule matching. In this study, we present a clinical intelligent decision approach based on Convolutional Neural Networks(CNN), which can automatically extract high-level semantic information of electronic medical records and then perform automatic diagnosis without artificial construction of rules or knowledge bases. We use collected 18,590 copies of the real-world clinical electronic medical records to train and test the proposed model. Experimental results show that the proposed model can achieve 98.67% accuracy and 96.02% recall, which strongly supports that using convolutional neural network to automatically learn high-level semantic features of electronic medical records and then conduct assist diagnosis is feasible and effective.

  8. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

    PubMed Central

    AAlAbdulsalam, Abdulrahman K.; Garvin, Jennifer H.; Redd, Andrew; Carter, Marjorie E.; Sweeny, Carol; Meystre, Stephane M.

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). PMID:29888032

  9. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry.

    PubMed

    AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

  10. Place in Perspective: Extracting Online Information about Points of Interest

    NASA Astrophysics Data System (ADS)

    Alves, Ana O.; Pereira, Francisco C.; Rodrigues, Filipe; Oliveirinha, João

    During the last few years, the amount of online descriptive information about places has reached reasonable dimensions for many cities in the world. Being such information mostly in Natural Language text, Information Extraction techniques are needed for obtaining the meaning of places that underlies these massive amounts of commonsense and user made sources. In this article, we show how we automatically label places using Information Extraction techniques applied to online resources such as Wikipedia, Yellow Pages and Yahoo!.

  11. Extracting the Textual and Temporal Structure of Supercomputing Logs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jain, S; Singh, I; Chandra, A

    2009-05-26

    Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an onlinemore » clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.« less

  12. An Overview of Biomolecular Event Extraction from Scientific Documents

    PubMed Central

    Vanegas, Jorge A.; Matos, Sérgio; González, Fabio; Oliveira, José L.

    2015-01-01

    This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed. PMID:26587051

  13. Shape and texture fused recognition of flying targets

    NASA Astrophysics Data System (ADS)

    Kovács, Levente; Utasi, Ákos; Kovács, Andrea; Szirányi, Tamás

    2011-06-01

    This paper presents visual detection and recognition of flying targets (e.g. planes, missiles) based on automatically extracted shape and object texture information, for application areas like alerting, recognition and tracking. Targets are extracted based on robust background modeling and a novel contour extraction approach, and object recognition is done by comparisons to shape and texture based query results on a previously gathered real life object dataset. Application areas involve passive defense scenarios, including automatic object detection and tracking with cheap commodity hardware components (CPU, camera and GPS).

  14. A prototype system to support evidence-based practice.

    PubMed

    Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E; Clayton, Jennifer; Thoma, George R

    2008-11-06

    Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans.

  15. A Prototype System to Support Evidence-based Practice

    PubMed Central

    Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E.; Clayton, Jennifer; Thoma, George R.

    2008-01-01

    Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients’ problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans. PMID:18998835

  16. A 3D THz image processing methodology for a fully integrated, semi-automatic and near real-time operational system

    NASA Astrophysics Data System (ADS)

    Brook, A.; Cristofani, E.; Vandewal, M.; Matheis, C.; Jonuscheit, J.; Beigang, R.

    2012-05-01

    The present study proposes a fully integrated, semi-automatic and near real-time mode-operated image processing methodology developed for Frequency-Modulated Continuous-Wave (FMCW) THz images with the center frequencies around: 100 GHz and 300 GHz. The quality control of aeronautics composite multi-layered materials and structures using Non-Destructive Testing is the main focus of this work. Image processing is applied on the 3-D images to extract useful information. The data is processed by extracting areas of interest. The detected areas are subjected to image analysis for more particular investigation managed by a spatial model. Finally, the post-processing stage examines and evaluates the spatial accuracy of the extracted information.

  17. A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

    PubMed

    García-Remesal, Miguel; Maojo, Victor; Crespo, José

    2010-01-01

    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

  18. Automatic Extraction of JPF Options and Documentation

    NASA Technical Reports Server (NTRS)

    Luks, Wojciech; Tkachuk, Oksana; Buschnell, David

    2011-01-01

    Documenting existing Java PathFinder (JPF) projects or developing new extensions is a challenging task. JPF provides a platform for creating new extensions and relies on key-value properties for their configuration. Keeping track of all possible options and extension mechanisms in JPF can be difficult. This paper presents jpf-autodoc-options, a tool that automatically extracts JPF projects options and other documentation-related information, which can greatly help both JPF users and developers of JPF extensions.

  19. Collaborative human-machine analysis to disambiguate entities in unstructured text and structured datasets

    NASA Astrophysics Data System (ADS)

    Davenport, Jack H.

    2016-05-01

    Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.

  20. Automatic Short Essay Scoring Using Natural Language Processing to Extract Semantic Information in the Form of Propositions. CRESST Report 831

    ERIC Educational Resources Information Center

    Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.

    2013-01-01

    The Common Core assessments emphasize short essay constructed-response items over multiple-choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way to score them automatically can be found. Current automatic essay-scoring techniques…

  1. CMS-2 Reverse Engineering and ENCORE/MODEL Integration

    DTIC Science & Technology

    1992-05-01

    Automated extraction of design information from an existing software system written in CMS-2 can be used to document that system as-built, and that I The...extracted information is provided by a commer- dally available CASE tool. * Information describing software system design is automatically extracted...the displays in Figures 1, 2, and 3. T achiev ths GE 11 b iuo w as rjcs CM-2t Aa nsltr(M2da 1 n Joia Reverse EwngiernTcnlg 5RT [2GRE] . Two xampe fD

  2. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  3. Automatic recognition of seismic intensity based on RS and GIS: a case study in Wenchuan Ms8.0 earthquake of China.

    PubMed

    Zhang, Qiuwen; Zhang, Yan; Yang, Xiaohong; Su, Bin

    2014-01-01

    In recent years, earthquakes have frequently occurred all over the world, which caused huge casualties and economic losses. It is very necessary and urgent to obtain the seismic intensity map timely so as to master the distribution of the disaster and provide supports for quick earthquake relief. Compared with traditional methods of drawing seismic intensity map, which require many investigations in the field of earthquake area or are too dependent on the empirical formulas, spatial information technologies such as Remote Sensing (RS) and Geographical Information System (GIS) can provide fast and economical way to automatically recognize the seismic intensity. With the integrated application of RS and GIS, this paper proposes a RS/GIS-based approach for automatic recognition of seismic intensity, in which RS is used to retrieve and extract the information on damages caused by earthquake, and GIS is applied to manage and display the data of seismic intensity. The case study in Wenchuan Ms8.0 earthquake in China shows that the information on seismic intensity can be automatically extracted from remotely sensed images as quickly as possible after earthquake occurrence, and the Digital Intensity Model (DIM) can be used to visually query and display the distribution of seismic intensity.

  4. Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

    Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.

  5. Automatic identification of comparative effectiveness research from Medline citations to support clinicians’ treatment information needs

    PubMed Central

    Zhang, Mingyuan; Fiol, Guilherme Del; Grout, Randall W.; Jonnalagadda, Siddhartha; Medlin, Richard; Mishra, Rashmi; Weir, Charlene; Liu, Hongfang; Mostafa, Javed; Fiszman, Marcelo

    2014-01-01

    Online knowledge resources such as Medline can address most clinicians’ patient care information needs. Yet, significant barriers, notably lack of time, limit the use of these sources at the point of care. The most common information needs raised by clinicians are treatment-related. Comparative effectiveness studies allow clinicians to consider multiple treatment alternatives for a particular problem. Still, solutions are needed to enable efficient and effective consumption of comparative effectiveness research at the point of care. Objective Design and assess an algorithm for automatically identifying comparative effectiveness studies and extracting the interventions investigated in these studies. Methods The algorithm combines semantic natural language processing, Medline citation metadata, and machine learning techniques. We assessed the algorithm in a case study of treatment alternatives for depression. Results Both precision and recall for identifying comparative studies was 0.83. A total of 86% of the interventions extracted perfectly or partially matched the gold standard. Conclusion Overall, the algorithm achieved reasonable performance. The method provides building blocks for the automatic summarization of comparative effectiveness research to inform point of care decision-making. PMID:23920677

  6. Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.

    PubMed

    Ng; Wong

    1999-01-01

    We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.

  7. Natural language processing and visualization in the molecular imaging domain.

    PubMed

    Tulipano, P Karina; Tao, Ying; Millar, William S; Zanzonico, Pat; Kolbert, Katherine; Xu, Hua; Yu, Hong; Chen, Lifeng; Lussier, Yves A; Friedman, Carol

    2007-06-01

    Molecular imaging is at the crossroads of genomic sciences and medical imaging. Information within the molecular imaging literature could be used to link to genomic and imaging information resources and to organize and index images in a way that is potentially useful to researchers. A number of natural language processing (NLP) systems are available to automatically extract information from genomic literature. One existing NLP system, known as BioMedLEE, automatically extracts biological information consisting of biomolecular substances and phenotypic data. This paper focuses on the adaptation, evaluation, and application of BioMedLEE to the molecular imaging domain. In order to adapt BioMedLEE for this domain, we extend an existing molecular imaging terminology and incorporate it into BioMedLEE. BioMedLEE's performance is assessed with a formal evaluation study. The system's performance, measured as recall and precision, is 0.74 (95% CI: [.70-.76]) and 0.70 (95% CI [.63-.76]), respectively. We adapt a JAVA viewer known as PGviewer for the simultaneous visualization of images with NLP extracted information.

  8. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction.

    PubMed

    Jonnalagadda, Siddhartha; Gonzalez, Graciela

    2010-11-13

    BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net.

  9. Radiometric and geometric evaluation of GeoEye-1, WorldView-2 and Pléiades-1A stereo images for 3D information extraction

    NASA Astrophysics Data System (ADS)

    Poli, D.; Remondino, F.; Angiuli, E.; Agugiaro, G.

    2015-02-01

    Today the use of spaceborne Very High Resolution (VHR) optical sensors for automatic 3D information extraction is increasing in the scientific and civil communities. The 3D Optical Metrology (3DOM) unit of the Bruno Kessler Foundation (FBK) in Trento (Italy) has collected VHR satellite imagery, as well as aerial and terrestrial data over Trento for creating a complete testfield for investigations on image radiometry, geometric accuracy, automatic digital surface model (DSM) generation, 2D/3D feature extraction, city modelling and data fusion. This paper addresses the radiometric and the geometric aspects of the VHR spaceborne imagery included in the Trento testfield and their potential for 3D information extraction. The dataset consist of two stereo-pairs acquired by WorldView-2 and by GeoEye-1 in panchromatic and multispectral mode, and a triplet from Pléiades-1A. For reference and validation, a DSM from airborne LiDAR acquisition is used. The paper gives details on the project, dataset characteristics and achieved results.

  10. Framework for automatic information extraction from research papers on nanocrystal devices

    PubMed Central

    Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

    2015-01-01

    Summary To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system. PMID:26665057

  11. Framework for automatic information extraction from research papers on nanocrystal devices.

    PubMed

    Dieb, Thaer M; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

    2015-01-01

    To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.

  12. Automatic lip reading by using multimodal visual features

    NASA Astrophysics Data System (ADS)

    Takahashi, Shohei; Ohya, Jun

    2013-12-01

    Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.

  13. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning.

    PubMed

    Norouzzadeh, Mohammad Sadegh; Nguyen, Anh; Kosmala, Margaret; Swanson, Alexandra; Palmer, Meredith S; Packer, Craig; Clune, Jeff

    2018-06-19

    Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would improve our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into "big data" sciences. Motion-sensor "camera traps" enable collecting wildlife pictures inexpensively, unobtrusively, and frequently. However, extracting information from these pictures remains an expensive, time-consuming, manual task. We demonstrate that such information can be automatically extracted by deep learning, a cutting-edge type of artificial intelligence. We train deep convolutional neural networks to identify, count, and describe the behaviors of 48 species in the 3.2 million-image Snapshot Serengeti dataset. Our deep neural networks automatically identify animals with >93.8% accuracy, and we expect that number to improve rapidly in years to come. More importantly, if our system classifies only images it is confident about, our system can automate animal identification for 99.3% of the data while still performing at the same 96.6% accuracy as that of crowdsourced teams of human volunteers, saving >8.4 y (i.e., >17,000 h at 40 h/wk) of human labeling effort on this 3.2 million-image dataset. Those efficiency gains highlight the importance of using deep neural networks to automate data extraction from camera-trap images, reducing a roadblock for this widely used technology. Our results suggest that deep learning could enable the inexpensive, unobtrusive, high-volume, and even real-time collection of a wealth of information about vast numbers of animals in the wild. Copyright © 2018 the Author(s). Published by PNAS.

  14. Semantic Information Extraction of Lanes Based on Onboard Camera Videos

    NASA Astrophysics Data System (ADS)

    Tang, L.; Deng, T.; Ren, C.

    2018-04-01

    In the field of autonomous driving, semantic information of lanes is very important. This paper proposes a method of automatic detection of lanes and extraction of semantic information from onboard camera videos. The proposed method firstly detects the edges of lanes by the grayscale gradient direction, and improves the Probabilistic Hough transform to fit them; then, it uses the vanishing point principle to calculate the lane geometrical position, and uses lane characteristics to extract lane semantic information by the classification of decision trees. In the experiment, 216 road video images captured by a camera mounted onboard a moving vehicle were used to detect lanes and extract lane semantic information. The results show that the proposed method can accurately identify lane semantics from video images.

  15. An information extraction framework for cohort identification using electronic health records.

    PubMed

    Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G

    2013-01-01

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.

  16. Developing a Satellite Based Automatic System for Crop Monitoring: Kenya's Great Rift Valley, A Case Study

    NASA Astrophysics Data System (ADS)

    Lucciani, Roberto; Laneve, Giovanni; Jahjah, Munzer; Mito, Collins

    2016-08-01

    The crop growth stage represents essential information for agricultural areas management. In this study we investigate the feasibility of a tool based on remotely sensed satellite (Landsat 8) imagery, capable of automatically classify crop fields and how much resolution enhancement based on pan-sharpening techniques and phenological information extraction, useful to create decision rules that allow to identify semantic class to assign to an object, can effectively support the classification process. Moreover we investigate the opportunity to extract vegetation health status information from remotely sensed assessment of the equivalent water thickness (EWT). Our case study is the Kenya's Great Rift valley, in this area a ground truth campaign was conducted during August 2015 in order to collect crop fields GPS measurements, leaf area index (LAI) and chlorophyll samples.

  17. Automatic extraction of building boundaries using aerial LiDAR data

    NASA Astrophysics Data System (ADS)

    Wang, Ruisheng; Hu, Yong; Wu, Huayi; Wang, Jian

    2016-01-01

    Building extraction is one of the main research topics of the photogrammetry community. This paper presents automatic algorithms for building boundary extractions from aerial LiDAR data. First, segmenting height information generated from LiDAR data, the outer boundaries of aboveground objects are expressed as closed chains of oriented edge pixels. Then, building boundaries are distinguished from nonbuilding ones by evaluating their shapes. The candidate building boundaries are reconstructed as rectangles or regular polygons by applying new algorithms, following the hypothesis verification paradigm. These algorithms include constrained searching in Hough space, enhanced Hough transformation, and the sequential linking technique. The experimental results show that the proposed algorithms successfully extract building boundaries at rates of 97%, 85%, and 92% for three LiDAR datasets with varying scene complexities.

  18. Doppler extraction with a digital VCO

    NASA Technical Reports Server (NTRS)

    Starner, E. R.; Nossen, E. J.

    1977-01-01

    Digitally controlled oscillator in phased-locked loop may be useful for data communications systems, or may be modified to serve as information extraction component of microwave or optical system for collision avoidance or automatic braking. Instrument is frequency-synthesizing device with output specified precisely by digital number programmed into frequency register.

  19. Pattern-Based Extraction of Argumentation from the Scientific Literature

    ERIC Educational Resources Information Center

    White, Elizabeth K.

    2010-01-01

    As the number of publications in the biomedical field continues its exponential increase, techniques for automatically summarizing information from this body of literature have become more diverse. In addition, the targets of summarization have become more subtle; initial work focused on extracting the factual assertions from full-text papers,…

  20. Longitudinal Analysis of New Information Types in Clinical Notes

    PubMed Central

    Zhang, Rui; Pakhomov, Serguei; Melton, Genevieve B.

    2014-01-01

    It is increasingly recognized that redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous, significant, and may negatively impact the secondary use of these notes for research and patient care. We investigated several automated methods to identify redundant versus relevant new information in clinical reports. These methods may provide a valuable approach to extract clinically pertinent information and further improve the accuracy of clinical information extraction systems. In this study, we used UMLS semantic types to extract several types of new information, including problems, medications, and laboratory information. Automatically identified new information highly correlated with manual reference standard annotations. Methods to identify different types of new information can potentially help to build up more robust information extraction systems for clinical researchers as well as aid clinicians and researchers in navigating clinical notes more effectively and quickly identify information pertaining to changes in health states. PMID:25717418

  1. Automatic segmentation of the bone and extraction of the bone cartilage interface from magnetic resonance images of the knee

    NASA Astrophysics Data System (ADS)

    Fripp, Jurgen; Crozier, Stuart; Warfield, Simon K.; Ourselin, Sébastien

    2007-03-01

    The accurate segmentation of the articular cartilages from magnetic resonance (MR) images of the knee is important for clinical studies and drug trials into conditions like osteoarthritis. Currently, segmentations are obtained using time-consuming manual or semi-automatic algorithms which have high inter- and intra-observer variabilities. This paper presents an important step towards obtaining automatic and accurate segmentations of the cartilages, namely an approach to automatically segment the bones and extract the bone-cartilage interfaces (BCI) in the knee. The segmentation is performed using three-dimensional active shape models, which are initialized using an affine registration to an atlas. The BCI are then extracted using image information and prior knowledge about the likelihood of each point belonging to the interface. The accuracy and robustness of the approach was experimentally validated using an MR database of fat suppressed spoiled gradient recall images. The (femur, tibia, patella) bone segmentation had a median Dice similarity coefficient of (0.96, 0.96, 0.89) and an average point-to-surface error of 0.16 mm on the BCI. The extracted BCI had a median surface overlap of 0.94 with the real interface, demonstrating its usefulness for subsequent cartilage segmentation or quantitative analysis.

  2. Extracting semantically enriched events from biomedical literature

    PubMed Central

    2012-01-01

    Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266

  3. Extracting semantically enriched events from biomedical literature.

    PubMed

    Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia

    2012-05-23

    Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.

  4. An Information Extraction Framework for Cohort Identification Using Electronic Health Records

    PubMed Central

    Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255

  5. Automated generation of individually customized visualizations of diagnosis-specific medical information using novel techniques of information extraction

    NASA Astrophysics Data System (ADS)

    Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang

    2005-04-01

    Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.

  6. Automatic segmentation of right ventricle on ultrasound images using sparse matrix transform and level set

    NASA Astrophysics Data System (ADS)

    Qin, Xulei; Cong, Zhibin; Halig, Luma V.; Fei, Baowei

    2013-03-01

    An automatic framework is proposed to segment right ventricle on ultrasound images. This method can automatically segment both epicardial and endocardial boundaries from a continuous echocardiography series by combining sparse matrix transform (SMT), a training model, and a localized region based level set. First, the sparse matrix transform extracts main motion regions of myocardium as eigenimages by analyzing statistical information of these images. Second, a training model of right ventricle is registered to the extracted eigenimages in order to automatically detect the main location of the right ventricle and the corresponding transform relationship between the training model and the SMT-extracted results in the series. Third, the training model is then adjusted as an adapted initialization for the segmentation of each image in the series. Finally, based on the adapted initializations, a localized region based level set algorithm is applied to segment both epicardial and endocardial boundaries of the right ventricle from the whole series. Experimental results from real subject data validated the performance of the proposed framework in segmenting right ventricle from echocardiography. The mean Dice scores for both epicardial and endocardial boundaries are 89.1%+/-2.3% and 83.6+/-7.3%, respectively. The automatic segmentation method based on sparse matrix transform and level set can provide a useful tool for quantitative cardiac imaging.

  7. A method for automatically extracting infectious disease-related primers and probes from the literature

    PubMed Central

    2010-01-01

    Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041

  8. Automatic indexing of compound words based on mutual information for Korean text retrieval

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pan Koo Kim; Yoo Kun Cho

    In this paper, we present an automatic indexing technique for compound words suitable to an aggulutinative language, specifically Korean. Firstly, we present the construction conditions to compose compound words as indexing terms. Also we present the decomposition rules applicable to consecutive nouns to extract all contents of text. Finally we propose a measure to estimate the usefulness of a term, mutual information, to calculate the degree of word association of compound words, based on the information theoretic notion. By applying this method, our system has raised the precision rate of compound words from 72% to 87%.

  9. Visual content highlighting via automatic extraction of embedded captions on MPEG compressed video

    NASA Astrophysics Data System (ADS)

    Yeo, Boon-Lock; Liu, Bede

    1996-03-01

    Embedded captions in TV programs such as news broadcasts, documentaries and coverage of sports events provide important information on the underlying events. In digital video libraries, such captions represent a highly condensed form of key information on the contents of the video. In this paper we propose a scheme to automatically detect the presence of captions embedded in video frames. The proposed method operates on reduced image sequences which are efficiently reconstructed from compressed MPEG video and thus does not require full frame decompression. The detection, extraction and analysis of embedded captions help to capture the highlights of visual contents in video documents for better organization of video, to present succinctly the important messages embedded in the images, and to facilitate browsing, searching and retrieval of relevant clips.

  10. A quality score for coronary artery tree extraction results

    NASA Astrophysics Data System (ADS)

    Cao, Qing; Broersen, Alexander; Kitslaar, Pieter H.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke

    2018-02-01

    Coronary artery trees (CATs) are often extracted to aid the fully automatic analysis of coronary artery disease on coronary computed tomography angiography (CCTA) images. Automatically extracted CATs often miss some arteries or include wrong extractions which require manual corrections before performing successive steps. For analyzing a large number of datasets, a manual quality check of the extraction results is time-consuming. This paper presents a method to automatically calculate quality scores for extracted CATs in terms of clinical significance of the extracted arteries and the completeness of the extracted CAT. Both right dominant (RD) and left dominant (LD) anatomical statistical models are generated and exploited in developing the quality score. To automatically determine which model should be used, a dominance type detection method is also designed. Experiments are performed on the automatically extracted and manually refined CATs from 42 datasets to evaluate the proposed quality score. In 39 (92.9%) cases, the proposed method is able to measure the quality of the manually refined CATs with higher scores than the automatically extracted CATs. In a 100-point scale system, the average scores for automatically and manually refined CATs are 82.0 (+/-15.8) and 88.9 (+/-5.4) respectively. The proposed quality score will assist the automatic processing of the CAT extractions for large cohorts which contain both RD and LD cases. To the best of our knowledge, this is the first time that a general quality score for an extracted CAT is presented.

  11. Extracting the information of coastline shape and its multiple representations

    NASA Astrophysics Data System (ADS)

    Liu, Ying; Li, Shujun; Tian, Zhen; Chen, Huirong

    2007-06-01

    According to studying the coastline, a new way of multiple representations is put forward in the paper. That is stimulating human thinking way when they generalized, building the appropriate math model and describing the coastline with graphics, extracting all kinds of the coastline shape information. The coastline automatic generalization will be finished based on the knowledge rules and arithmetic operators. Showing the information of coastline shape by building the curve Douglas binary tree, it can reveal the shape character of coastline not only microcosmically but also macroscopically. Extracting the information of coastline concludes the local characteristic point and its orientation, the curve structure and the topology trait. The curve structure can be divided the single curve and the curve cluster. By confirming the knowledge rules of the coastline generalization, the generalized scale and its shape parameter, the coastline automatic generalization model is established finally. The method of the multiple scale representation of coastline in this paper has some strong points. It is human's thinking mode and can keep the nature character of the curve prototype. The binary tree structure can control the coastline comparability, avoid the self-intersect phenomenon and hold the unanimous topology relationship.

  12. Thermal feature extraction of servers in a datacenter using thermal image registration

    NASA Astrophysics Data System (ADS)

    Liu, Hang; Ran, Jian; Xie, Ting; Gao, Shan

    2017-09-01

    Thermal cameras provide fine-grained thermal information that enhances monitoring and enables automatic thermal management in large datacenters. Recent approaches employing mobile robots or thermal camera networks can already identify the physical locations of hot spots. Other distribution information used to optimize datacenter management can also be obtained automatically using pattern recognition technology. However, most of the features extracted from thermal images, such as shape and gradient, may be affected by changes in the position and direction of the thermal camera. This paper presents a method for extracting the thermal features of a hot spot or a server in a container datacenter. First, thermal and visual images are registered based on textural characteristics extracted from images acquired in datacenters. Then, the thermal distribution of each server is standardized. The features of a hot spot or server extracted from the standard distribution can reduce the impact of camera position and direction. The results of experiments show that image registration is efficient for aligning the corresponding visual and thermal images in the datacenter, and the standardization procedure reduces the impacts of camera position and direction on hot spot or server features.

  13. Text feature extraction based on deep learning: a review.

    PubMed

    Liang, Hong; Sun, Xiao; Sun, Yunlei; Gao, Yuan

    2017-01-01

    Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, deep learning has made achievements in text mining. The major difference between deep learning and conventional methods is that deep learning automatically learns features from big data, instead of adopting handcrafted features, which mainly depends on priori knowledge of designers and is highly impossible to take the advantage of big data. Deep learning can automatically learn feature representation from big data, including millions of parameters. This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in text feature extraction and its applications, and forecasts the application of deep learning in feature extraction.

  14. MPEG-7 based video annotation and browsing

    NASA Astrophysics Data System (ADS)

    Hoeynck, Michael; Auweiler, Thorsten; Wellhausen, Jens

    2003-11-01

    The huge amount of multimedia data produced worldwide requires annotation in order to enable universal content access and to provide content-based search-and-retrieval functionalities. Since manual video annotation can be time consuming, automatic annotation systems are required. We review recent approaches to content-based indexing and annotation of videos for different kind of sports and describe our approach to automatic annotation of equestrian sports videos. We especially concentrate on MPEG-7 based feature extraction and content description, where we apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information. Having determined single shot positions as well as the visual highlights, the information is jointly stored with meta-textual information in an MPEG-7 description scheme. Based on this information, we generate content summaries which can be utilized in a user-interface in order to provide content-based access to the video stream, but further for media browsing on a streaming server.

  15. Automatic extraction of pavement markings on streets from point cloud data of mobile LiDAR

    NASA Astrophysics Data System (ADS)

    Gao, Yang; Zhong, Ruofei; Tang, Tao; Wang, Liuzhao; Liu, Xianlin

    2017-08-01

    Pavement markings provide an important foundation as they help to keep roads users safe. Accurate and comprehensive information about pavement markings assists the road regulators and is useful in developing driverless technology. Mobile light detection and ranging (LiDAR) systems offer new opportunities to collect and process accurate pavement markings’ information. Mobile LiDAR systems can directly obtain the three-dimensional (3D) coordinates of an object, thus defining spatial data and the intensity of (3D) objects in a fast and efficient way. The RGB attribute information of data points can be obtained based on the panoramic camera in the system. In this paper, we present a novel method process to automatically extract pavement markings using multiple attribute information of the laser scanning point cloud from the mobile LiDAR data. This method process utilizes a differential grayscale of RGB color, laser pulse reflection intensity, and the differential intensity to identify and extract pavement markings. We utilized point cloud density to remove the noise and used morphological operations to eliminate the errors. In the application, we tested our method process on different sections of roads in Beijing, China, and Buffalo, NY, USA. The results indicated that both correctness (p) and completeness (r) were higher than 90%. The method process of this research can be applied to extract pavement markings from huge point cloud data produced by mobile LiDAR.

  16. Identification of Cichlid Fishes from Lake Malawi Using Computer Vision

    PubMed Central

    Joo, Deokjin; Kwan, Ye-seul; Song, Jongwoo; Pinho, Catarina; Hey, Jody; Won, Yong-Jin

    2013-01-01

    Background The explosively radiating evolution of cichlid fishes of Lake Malawi has yielded an amazing number of haplochromine species estimated as many as 500 to 800 with a surprising degree of diversity not only in color and stripe pattern but also in the shape of jaw and body among them. As these morphological diversities have been a central subject of adaptive speciation and taxonomic classification, such high diversity could serve as a foundation for automation of species identification of cichlids. Methodology/Principal Finding Here we demonstrate a method for automatic classification of the Lake Malawi cichlids based on computer vision and geometric morphometrics. For this end we developed a pipeline that integrates multiple image processing tools to automatically extract informative features of color and stripe patterns from a large set of photographic images of wild cichlids. The extracted information was evaluated by statistical classifiers Support Vector Machine and Random Forests. Both classifiers performed better when body shape information was added to the feature of color and stripe. Besides the coloration and stripe pattern, body shape variables boosted the accuracy of classification by about 10%. The programs were able to classify 594 live cichlid individuals belonging to 12 different classes (species and sexes) with an average accuracy of 78%, contrasting to a mere 42% success rate by human eyes. The variables that contributed most to the accuracy were body height and the hue of the most frequent color. Conclusions Computer vision showed a notable performance in extracting information from the color and stripe patterns of Lake Malawi cichlids although the information was not enough for errorless species identification. Our results indicate that there appears an unavoidable difficulty in automatic species identification of cichlid fishes, which may arise from short divergence times and gene flow between closely related species. PMID:24204918

  17. Processing of Crawled Urban Imagery for Building Use Classification

    NASA Astrophysics Data System (ADS)

    Tutzauer, P.; Haala, N.

    2017-05-01

    Recent years have shown a shift from pure geometric 3D city models to data with semantics. This is induced by new applications (e.g. Virtual/Augmented Reality) and also a requirement for concepts like Smart Cities. However, essential urban semantic data like building use categories is often not available. We present a first step in bridging this gap by proposing a pipeline to use crawled urban imagery and link it with ground truth cadastral data as an input for automatic building use classification. We aim to extract this city-relevant semantic information automatically from Street View (SV) imagery. Convolutional Neural Networks (CNNs) proved to be extremely successful for image interpretation, however, require a huge amount of training data. Main contribution of the paper is the automatic provision of such training datasets by linking semantic information as already available from databases provided from national mapping agencies or city administrations to the corresponding façade images extracted from SV. Finally, we present first investigations with a CNN and an alternative classifier as a proof of concept.

  18. Measurement of Hydrologic Resource Parameters Through Remote Sensing in the Feather River Headwaters Area

    NASA Technical Reports Server (NTRS)

    Thorley, G. A.; Draeger, W. C.; Lauer, D. T.; Lent, J.; Roberts, E.

    1971-01-01

    The four problem are as being investigated are: (1) determination of the feasibility of providing the resource manager with operationally useful information through the use of remote sensing techniques; (2) definition of the spectral characteristics of earth resources and the optimum procedures for calibrating tone and color characteristics of multispectral imagery (3) determination of the extent to which humans can extract useful earth resource information through remote sensing imagery; (4) determination of the extent to which automatic classification and data processing can extract useful information from remote sensing data.

  19. Extraction of linear features on SAR imagery

    NASA Astrophysics Data System (ADS)

    Liu, Junyi; Li, Deren; Mei, Xin

    2006-10-01

    Linear features are usually extracted from SAR imagery by a few edge detectors derived from the contrast ratio edge detector with a constant probability of false alarm. On the other hand, the Hough Transform is an elegant way of extracting global features like curve segments from binary edge images. Randomized Hough Transform can reduce the computation time and memory usage of the HT drastically. While Randomized Hough Transform will bring about a great deal of cells invalid during the randomized sample. In this paper, we propose a new approach to extract linear features on SAR imagery, which is an almost automatic algorithm based on edge detection and Randomized Hough Transform. The presented improved method makes full use of the directional information of each edge candidate points so as to solve invalid cumulate problems. Applied result is in good agreement with the theoretical study, and the main linear features on SAR imagery have been extracted automatically. The method saves storage space and computational time, which shows its effectiveness and applicability.

  20. Proactive Response to Potential Material Shortages Arising from Environmental Restrictions Using Automatic Discovery and Extraction of Information from Technical Documents

    DTIC Science & Technology

    2012-12-21

    material data and other key information in a UIMA environment. In the course of this project, the tools and methods developed were used to extract and...Architecture ( UIMA ) library from the Apache Software Foundation. Using this architecture, a given document is run through several “annotators” to...material taxonomy developed for the XSB, Inc. Coherent View™ database. In order to integrate this technology into the Java-based UIMA annotation

  1. Sample processor for the automatic extraction of families of compounds from liquid samples and/or homogenized solid samples suspended in a liquid

    NASA Technical Reports Server (NTRS)

    Jahnsen, Vilhelm J. (Inventor); Campen, Jr., Charles F. (Inventor)

    1980-01-01

    A sample processor and method for the automatic extraction of families of compounds, known as extracts, from liquid and/or homogenized solid samples are disclosed. The sample processor includes a tube support structure which supports a plurality of extraction tubes, each containing a sample from which families of compounds are to be extracted. The support structure is moveable automatically with respect to one or more extraction stations, so that as each tube is at each station a solvent system, consisting of a solvent and reagents, is introduced therein. As a result an extract is automatically extracted from the tube. The sample processor includes an arrangement for directing the different extracts from each tube to different containers, or to direct similar extracts from different tubes to the same utilization device.

  2. Information Retrieval and Text Mining Technologies for Chemistry.

    PubMed

    Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

    2017-06-28

    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

  3. Brain extraction in partial volumes T2*@7T by using a quasi-anatomic segmentation with bias field correction.

    PubMed

    Valente, João; Vieira, Pedro M; Couto, Carlos; Lima, Carlos S

    2018-02-01

    Poor brain extraction in Magnetic Resonance Imaging (MRI) has negative consequences in several types of brain post-extraction such as tissue segmentation and related statistical measures or pattern recognition algorithms. Current state of the art algorithms for brain extraction work on weighted T1 and T2, being not adequate for non-whole brain images such as the case of T2*FLASH@7T partial volumes. This paper proposes two new methods that work directly in T2*FLASH@7T partial volumes. The first is an improvement of the semi-automatic threshold-with-morphology approach adapted to incomplete volumes. The second method uses an improved version of a current implementation of the fuzzy c-means algorithm with bias correction for brain segmentation. Under high inhomogeneity conditions the performance of the first method degrades, requiring user intervention which is unacceptable. The second method performed well for all volumes, being entirely automatic. State of the art algorithms for brain extraction are mainly semi-automatic, requiring a correct initialization by the user and knowledge of the software. These methods can't deal with partial volumes and/or need information from atlas which is not available in T2*FLASH@7T. Also, combined volumes suffer from manipulations such as re-sampling which deteriorates significantly voxel intensity structures making segmentation tasks difficult. The proposed method can overcome all these difficulties, reaching good results for brain extraction using only T2*FLASH@7T volumes. The development of this work will lead to an improvement of automatic brain lesions segmentation in T2*FLASH@7T volumes, becoming more important when lesions such as cortical Multiple-Sclerosis need to be detected. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. An effective self-assessment based on concept map extraction from test-sheet for personalized learning

    NASA Astrophysics Data System (ADS)

    Liew, Keng-Hou; Lin, Yu-Shih; Chang, Yi-Chun; Chu, Chih-Ping

    2013-12-01

    Examination is a traditional way to assess learners' learning status, progress and performance after a learning activity. Except the test grade, a test sheet hides some implicit information such as test concepts, their relationships, importance, and prerequisite. The implicit information can be extracted and constructed a concept map for considering (1) the test concepts covered in the same question means these test concepts have strong relationships, and (2) questions in the same test sheet means the test concepts are relative. Concept map has been successfully employed in many researches to help instructors and learners organize relationships among concepts. However, concept map construction depends on experts who need to take effort and time for the organization of the domain knowledge. In addition, the previous researches regarding to automatic concept map construction are limited to consider all learners of a class, which have not considered personalized learning. To cope with this problem, this paper proposes a new approach to automatically extract and construct concept map based on implicit information in a test sheet. Furthermore, the proposed approach also can help learner for self-assessment and self-diagnosis. Finally, an example is given to depict the effectiveness of proposed approach.

  5. Automatic Rooftop Extraction in Stereo Imagery Using Distance and Building Shape Regularized Level Set Evolution

    NASA Astrophysics Data System (ADS)

    Tian, J.; Krauß, T.; d'Angelo, P.

    2017-05-01

    Automatic rooftop extraction is one of the most challenging problems in remote sensing image analysis. Classical 2D image processing techniques are expensive due to the high amount of features required to locate buildings. This problem can be avoided when 3D information is available. In this paper, we show how to fuse the spectral and height information of stereo imagery to achieve an efficient and robust rooftop extraction. In the first step, the digital terrain model (DTM) and in turn the normalized digital surface model (nDSM) is generated by using a newly step-edge approach. In the second step, the initial building locations and rooftop boundaries are derived by removing the low-level pixels and high-level pixels with higher probability to be trees and shadows. This boundary is then served as the initial level set function, which is further refined to fit the best possible boundaries through distance regularized level-set curve evolution. During the fitting procedure, the edge-based active contour model is adopted and implemented by using the edges indicators extracted from panchromatic image. The performance of the proposed approach is tested by using the WorldView-2 satellite data captured over Munich.

  6. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  7. Using Information from the Electronic Health Record to Improve Measurement of Unemployment in Service Members and Veterans with mTBI and Post-Deployment Stress

    PubMed Central

    Dillahunt-Aspillaga, Christina; Finch, Dezon; Massengale, Jill; Kretzmer, Tracy; Luther, Stephen L.; McCart, James A.

    2014-01-01

    Objective The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). Design Retrospective cohort study using data from selected progress notes stored in the EHR. Setting Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. Participants Service members and Veterans with TBI who participated in the PREP program (N = 60). Main Outcome Measures Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. Results Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. Conclusions Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort. PMID:25541956

  8. BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language.

    PubMed

    Rinaldi, Fabio; Ellendorff, Tilia Renate; Madan, Sumit; Clematide, Simon; van der Lek, Adrian; Mevissen, Theo; Fluck, Juliane

    2016-01-01

    Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text. © The Author(s) 2016. Published by Oxford University Press.

  9. The development of an automatically produced cholangiography procedure using the reconstruction of portal-phase multidetector-row computed tomography images: preliminary experience.

    PubMed

    Hirose, Tomoaki; Igami, Tsuyoshi; Koga, Kusuto; Hayashi, Yuichiro; Ebata, Tomoki; Yokoyama, Yukihiro; Sugawara, Gen; Mizuno, Takashi; Yamaguchi, Junpei; Mori, Kensaku; Nagino, Masato

    2017-03-01

    Fusion angiography using reconstructed multidetector-row computed tomography (MDCT) images, and cholangiography using reconstructed images from MDCT with a cholangiographic agent include an anatomical gap due to the different periods of MDCT scanning. To conquer such gaps, we attempted to develop a cholangiography procedure that automatically reconstructs a cholangiogram from portal-phase MDCT images. The automatically produced cholangiography procedure utilized an original software program that was developed by the Graduate School of Information Science, Nagoya University. This program structured 5 candidate biliary tracts, and automatically selected one as the candidate for cholangiography. The clinical value of the automatically produced cholangiography procedure was estimated based on a comparison with manually produced cholangiography. Automatically produced cholangiograms were reconstructed for 20 patients who underwent MDCT scanning before biliary drainage for distal biliary obstruction. The procedure showed the ability to extract the 5 main biliary branches and the 21 subsegmental biliary branches in 55 and 25 % of the cases, respectively. The extent of aberrant connections and aberrant extractions outside the biliary tract was acceptable. Among all of the cholangiograms, 5 were clinically applied with no correction, 8 were applied with modest improvements, and 3 produced a correct cholangiography before automatic selection. Although our procedure requires further improvement based on the analysis of additional patient data, it may represent an alternative to direct cholangiography in the future.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Jian; Casey, Cameron P.; Zheng, Xueyun

    Motivation: Drift tube ion mobility spectrometry (DTIMS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS coupled with mass spectrometry and compute their associated collisional cross sections (CCS) we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of information necessary to create a reference library containing accu-rate masses, DTIMS arrival times and CCSs for use in high throughput omics analyses. Results: We demonstrate the utility of this approach bymore » automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were identical to those calculated by hand and within error of those calcu-lated using commercially available instrument vendor software.« less

  11. Visual perception system and method for a humanoid robot

    NASA Technical Reports Server (NTRS)

    Chelian, Suhas E. (Inventor); Linn, Douglas Martin (Inventor); Wampler, II, Charles W. (Inventor); Bridgwater, Lyndon (Inventor); Wells, James W. (Inventor); Mc Kay, Neil David (Inventor)

    2012-01-01

    A robotic system includes a humanoid robot with robotic joints each moveable using an actuator(s), and a distributed controller for controlling the movement of each of the robotic joints. The controller includes a visual perception module (VPM) for visually identifying and tracking an object in the field of view of the robot under threshold lighting conditions. The VPM includes optical devices for collecting an image of the object, a positional extraction device, and a host machine having an algorithm for processing the image and positional information. The algorithm visually identifies and tracks the object, and automatically adapts an exposure time of the optical devices to prevent feature data loss of the image under the threshold lighting conditions. A method of identifying and tracking the object includes collecting the image, extracting positional information of the object, and automatically adapting the exposure time to thereby prevent feature data loss of the image.

  12. WRIS: a resource information system for wildland management

    Treesearch

    Robert M. Russell; David A. Sharpnack; Elliot Amidon

    1975-01-01

    WRIS (Wildland Resource Information System) is a computer system for processing, storing, retrieving, updating, and displaying geographic data. The polygon, representing a land area boundary, forms the building block of WRIS. Polygons form a map. Maps are digitized manually or by automatic scanning. Computer programs can extract and produce polygon maps and can overlay...

  13. A Graph-Based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications

    PubMed Central

    Cameron, Delroy; Bodenreider, Olivier; Yalamanchili, Hima; Danh, Tu; Vallabhaneni, Sreeram; Thirunarayan, Krishnaprasad; Sheth, Amit P.; Rindflesch, Thomas C.

    2014-01-01

    Objectives This paper presents a methodology for recovering and decomposing Swanson’s Raynaud Syndrome–Fish Oil Hypothesis semi-automatically. The methodology leverages the semantics of assertions extracted from biomedical literature (called semantic predications) along with structured background knowledge and graph-based algorithms to semi-automatically capture the informative associations originally discovered manually by Swanson. Demonstrating that Swanson’s manually intensive techniques can be undertaken semi-automatically, paves the way for fully automatic semantics-based hypothesis generation from scientific literature. Methods Semantic predications obtained from biomedical literature allow the construction of labeled directed graphs which contain various associations among concepts from the literature. By aggregating such associations into informative subgraphs, some of the relevant details originally articulated by Swanson has been uncovered. However, by leveraging background knowledge to bridge important knowledge gaps in the literature, a methodology for semi-automatically capturing the detailed associations originally explicated in natural language by Swanson has been developed. Results Our methodology not only recovered the 3 associations commonly recognized as Swanson’s Hypothesis, but also decomposed them into an additional 16 detailed associations, formulated as chains of semantic predications. Altogether, 14 out of the 19 associations that can be attributed to Swanson were retrieved using our approach. To the best of our knowledge, such an in-depth recovery and decomposition of Swanson’s Hypothesis has never been attempted. Conclusion In this work therefore, we presented a methodology for semi- automatically recovering and decomposing Swanson’s RS-DFO Hypothesis using semantic representations and graph algorithms. Our methodology provides new insights into potential prerequisites for semantics-driven Literature-Based Discovery (LBD). These suggest that three critical aspects of LBD include: 1) the need for more expressive representations beyond Swanson’s ABC model; 2) an ability to accurately extract semantic information from text; and 3) the semantic integration of scientific literature with structured background knowledge. PMID:23026233

  14. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

    PubMed

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2016-10-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets ( e.g. , application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

  15. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora

    PubMed Central

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2017-01-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes. PMID:28210517

  16. Overview of machine vision methods in x-ray imaging and microtomography

    NASA Astrophysics Data System (ADS)

    Buzmakov, Alexey; Zolotov, Denis; Chukalina, Marina; Nikolaev, Dmitry; Gladkov, Andrey; Ingacheva, Anastasia; Yakimchuk, Ivan; Asadchikov, Victor

    2018-04-01

    Digital X-ray imaging became widely used in science, medicine, non-destructive testing. This allows using modern digital images analysis for automatic information extraction and interpretation. We give short review of scientific applications of machine vision in scientific X-ray imaging and microtomography, including image processing, feature detection and extraction, images compression to increase camera throughput, microtomography reconstruction, visualization and setup adjustment.

  17. Hybrid Automatic Building Interpretation System

    NASA Astrophysics Data System (ADS)

    Pakzad, K.; Klink, A.; Müterthies, A.; Gröger, G.; Stroh, V.; Plümer, L.

    2011-09-01

    HABIS (Hybrid Automatic Building Interpretation System) is a system for an automatic reconstruction of building roofs used in virtual 3D building models. Unlike most of the commercially available systems, HABIS is able to work to a high degree automatically. The hybrid method uses different sources intending to exploit the advantages of the particular sources. 3D point clouds usually provide good height and surface data, whereas spatial high resolution aerial images provide important information for edges and detail information for roof objects like dormers or chimneys. The cadastral data provide important basis information about the building ground plans. The approach used in HABIS works with a multi-stage-process, which starts with a coarse roof classification based on 3D point clouds. After that it continues with an image based verification of these predicted roofs. In a further step a final classification and adjustment of the roofs is done. In addition some roof objects like dormers and chimneys are also extracted based on aerial images and added to the models. In this paper the used methods are described and some results are presented.

  18. Finding geospatial pattern of unstructured data by clustering routes

    NASA Astrophysics Data System (ADS)

    Boustani, M.; Mattmann, C. A.; Ramirez, P.; Burke, W.

    2016-12-01

    Today the majority of data generated has a geospatial context to it. Either in attribute form as a latitude or longitude, or name of location or cross referenceable using other means such as an external gazetteer or location service. Our research is interested in exploiting geospatial location and context in unstructured data such as that found on the web in HTML pages, images, videos, documents, and other areas, and in structured information repositories found on intranets, in scientific environments, and otherwise. We are working together on the DARPA MEMEX project to exploit open source software tools such as the Lucene Geo Gazetteer, Apache Tika, Apache Lucene, and Apache OpenNLP, to automatically extract, and make meaning out of geospatial information. In particular, we are interested in unstructured descriptors e.g., a phone number, or a named entity, and the ability to automatically learn geospatial paths related to these descriptors. For example, a particular phone number may represent an entity that travels on a monthly basis, according to easily identifiable and somes more difficult to track patterns. We will present a set of automatic techniques to extract descriptors, and then to geospatially infer their paths across unstructured data.

  19. A Knowledge-Based Approach to Automatic Detection of Equipment Alarm Sounds in a Neonatal Intensive Care Unit Environment.

    PubMed

    Raboshchuk, Ganna; Nadeu, Climent; Jancovic, Peter; Lilja, Alex Peiro; Kokuer, Munevver; Munoz Mahamud, Blanca; Riverola De Veciana, Ana

    2018-01-01

    A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage, and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modeling, and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%.

  20. Automatic video summarization driven by a spatio-temporal attention model

    NASA Astrophysics Data System (ADS)

    Barland, R.; Saadane, A.

    2008-02-01

    According to the literature, automatic video summarization techniques can be classified in two parts, following the output nature: "video skims", which are generated using portions of the original video and "key-frame sets", which correspond to the images, selected from the original video, having a significant semantic content. The difference between these two categories is reduced when we consider automatic procedures. Most of the published approaches are based on the image signal and use either pixel characterization or histogram techniques or image decomposition by blocks. However, few of them integrate properties of the Human Visual System (HVS). In this paper, we propose to extract keyframes for video summarization by studying the variations of salient information between two consecutive frames. For each frame, a saliency map is produced simulating the human visual attention by a bottom-up (signal-dependent) approach. This approach includes three parallel channels for processing three early visual features: intensity, color and temporal contrasts. For each channel, the variations of the salient information between two consecutive frames are computed. These outputs are then combined to produce the global saliency variation which determines the key-frames. Psychophysical experiments have been defined and conducted to analyze the relevance of the proposed key-frame extraction algorithm.

  1. A Knowledge-Based Approach to Automatic Detection of Equipment Alarm Sounds in a Neonatal Intensive Care Unit Environment

    PubMed Central

    Nadeu, Climent; Jančovič, Peter; Lilja, Alex Peiró; Köküer, Münevver; Muñoz Mahamud, Blanca; Riverola De Veciana, Ana

    2018-01-01

    A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage, and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modeling, and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%. PMID:29404227

  2. Automatic detection of Martian dark slope streaks by machine learning using HiRISE images

    NASA Astrophysics Data System (ADS)

    Wang, Yexin; Di, Kaichang; Xin, Xin; Wan, Wenhui

    2017-07-01

    Dark slope streaks (DSSs) on the Martian surface are one of the active geologic features that can be observed on Mars nowadays. The detection of DSS is a prerequisite for studying its appearance, morphology, and distribution to reveal its underlying geological mechanisms. In addition, increasingly massive amounts of Mars high resolution data are now available. Hence, an automatic detection method for locating DSSs is highly desirable. In this research, we present an automatic DSS detection method by combining interest region extraction and machine learning techniques. The interest region extraction combines gradient and regional grayscale information. Moreover, a novel recognition strategy is proposed that takes the normalized minimum bounding rectangles (MBRs) of the extracted regions to calculate the Local Binary Pattern (LBP) feature and train a DSS classifier using the Adaboost machine learning algorithm. Comparative experiments using five different feature descriptors and three different machine learning algorithms show the superiority of the proposed method. Experimental results utilizing 888 extracted region samples from 28 HiRISE images show that the overall detection accuracy of our proposed method is 92.4%, with a true positive rate of 79.1% and false positive rate of 3.7%, which in particular indicates great performance of the method at eliminating non-DSS regions.

  3. A Method for Extracting Suspected Parotid Lesions in CT Images using Feature-based Segmentation and Active Contours based on Stationary Wavelet Transform

    NASA Astrophysics Data System (ADS)

    Wu, T. Y.; Lin, S. F.

    2013-10-01

    Automatic suspected lesion extraction is an important application in computer-aided diagnosis (CAD). In this paper, we propose a method to automatically extract the suspected parotid regions for clinical evaluation in head and neck CT images. The suspected lesion tissues in low contrast tissue regions can be localized with feature-based segmentation (FBS) based on local texture features, and can be delineated with accuracy by modified active contour models (ACM). At first, stationary wavelet transform (SWT) is introduced. The derived wavelet coefficients are applied to derive the local features for FBS, and to generate enhanced energy maps for ACM computation. Geometric shape features (GSFs) are proposed to analyze each soft tissue region segmented by FBS; the regions with higher similarity GSFs with the lesions are extracted and the information is also applied as the initial conditions for fine delineation computation. Consequently, the suspected lesions can be automatically localized and accurately delineated for aiding clinical diagnosis. The performance of the proposed method is evaluated by comparing with the results outlined by clinical experts. The experiments on 20 pathological CT data sets show that the true-positive (TP) rate on recognizing parotid lesions is about 94%, and the dimension accuracy of delineation results can also approach over 93%.

  4. A semi-automatic method for extracting thin line structures in images as rooted tree network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brazzini, Jacopo; Dillard, Scott; Soille, Pierre

    2010-01-01

    This paper addresses the problem of semi-automatic extraction of line networks in digital images - e.g., road or hydrographic networks in satellite images, blood vessels in medical images, robust. For that purpose, we improve a generic method derived from morphological and hydrological concepts and consisting in minimum cost path estimation and flow simulation. While this approach fully exploits the local contrast and shape of the network, as well as its arborescent nature, we further incorporate local directional information about the structures in the image. Namely, an appropriate anisotropic metric is designed by using both the characteristic features of the targetmore » network and the eigen-decomposition of the gradient structure tensor of the image. Following, the geodesic propagation from a given seed with this metric is combined with hydrological operators for overland flow simulation to extract the line network. The algorithm is demonstrated for the extraction of blood vessels in a retina image and of a river network in a satellite image.« less

  5. The automatic extraction of pitch perturbation using microcomputers: some methodological considerations.

    PubMed

    Deem, J F; Manning, W H; Knack, J V; Matesich, J S

    1989-09-01

    A program for the automatic extraction of jitter (PAEJ) was developed for the clinical measurement of pitch perturbations using a microcomputer. The program currently includes 12 implementations of an algorithm for marking the boundary criteria for a fundamental period of vocal fold vibration. The relative sensitivity of these extraction procedures for identifying the pitch period was compared using sine waves. Data obtained to date provide information for each procedure concerning the effects of waveform peakedness and slope, sample duration in cycles, noise level of the analysis system with both direct and tape recorded input, and the influence of interpolation. Zero crossing extraction procedures provided lower jitter values regardless of sine wave frequency or sample duration. The procedures making use of positive- or negative-going zero crossings with interpolation provided the lowest measures of jitter with the sine wave stimuli. Pilot data obtained with normal-speaking adults indicated that jitter measures varied as a function of the speaker, vowel, and sample duration.

  6. LexValueSets: An Approach for Context-Driven Value Sets Extraction

    PubMed Central

    Pathak, Jyotishman; Jiang, Guoqian; Dwarkanath, Sridhar O.; Buntrock, James D.; Chute, Christopher G.

    2008-01-01

    The ability to model, share and re-use value sets across multiple medical information systems is an important requirement. However, generating value sets semi-automatically from a terminology service is still an unresolved issue, in part due to the lack of linkage to clinical context patterns that provide the constraints in defining a concept domain and invocation of value sets extraction. Towards this goal, we develop and evaluate an approach for context-driven automatic value sets extraction based on a formal terminology model. The crux of the technique is to identify and define the context patterns from various domains of discourse and leverage them for value set extraction using two complementary ideas based on (i) local terms provided by the Subject Matter Experts (extensional) and (ii) semantic definition of the concepts in coding schemes (intensional). A prototype was implemented based on SNOMED CT rendered in the LexGrid terminology model and a preliminary evaluation is presented. PMID:18998955

  7. The LifeWatch approach to the exploration of distributed species information

    PubMed Central

    Fuentes, Daniel; Fiore, Nicola

    2014-01-01

    Abstract This paper introduces a new method of automatically extracting, integrating and presenting information regarding species from the most relevant online taxonomic resources. First, the information is extracted and joined using data wrappers and integration solutions. Then, an analytical tool is used to provide a visual representation of the data. The information is then integrated into a user friendly content management system. The proposal has been implemented using data from the Global Biodiversity Information Facility (GBIF), the Catalogue of Life (CoL), the World Register of Marine Species (WoRMS), the Integrated Taxonomic Information System (ITIS) and the Global Names Index (GNI). The approach improves data quality, avoiding taxonomic and nomenclature errors whilst increasing the availability and accessibility of the information. PMID:25589865

  8. Assessment of Automatically Exported Clinical Data from a Hospital Information System for Clinical Research in Multiple Myeloma.

    PubMed

    Torres, Viviana; Cerda, Mauricio; Knaup, Petra; Löpprich, Martin

    2016-01-01

    An important part of the electronic information available in Hospital Information System (HIS) has the potential to be automatically exported to Electronic Data Capture (EDC) platforms for improving clinical research. This automation has the advantage of reducing manual data transcription, a time consuming and prone to errors process. However, quantitative evaluations of the process of exporting data from a HIS to an EDC system have not been reported extensively, in particular comparing with manual transcription. In this work an assessment to study the quality of an automatic export process, focused in laboratory data from a HIS is presented. Quality of the laboratory data was assessed in two types of processes: (1) a manual process of data transcription, and (2) an automatic process of data transference. The automatic transference was implemented as an Extract, Transform and Load (ETL) process. Then, a comparison was carried out between manual and automatic data collection methods. The criteria to measure data quality were correctness and completeness. The manual process had a general error rate of 2.6% to 7.1%, obtaining the lowest error rate if data fields with a not clear definition were removed from the analysis (p < 10E-3). In the case of automatic process, the general error rate was 1.9% to 12.1%, where lowest error rate is obtained when excluding information missing in the HIS but transcribed to the EDC from other physical sources. The automatic ETL process can be used to collect laboratory data for clinical research if data in the HIS as well as physical documentation not included in HIS, are identified previously and follows a standardized data collection protocol.

  9. Zone analysis in biology articles as a basis for information extraction.

    PubMed

    Mizuta, Yoko; Korhonen, Anna; Mullen, Tony; Collier, Nigel

    2006-06-01

    In the field of biomedicine, an overwhelming amount of experimental data has become available as a result of the high throughput of research in this domain. The amount of results reported has now grown beyond the limits of what can be managed by manual means. This makes it increasingly difficult for the researchers in this area to keep up with the latest developments. Information extraction (IE) in the biological domain aims to provide an effective automatic means to dynamically manage the information contained in archived journal articles and abstract collections and thus help researchers in their work. However, while considerable advances have been made in certain areas of IE, pinpointing and organizing factual information (such as experimental results) remains a challenge. In this paper we propose tackling this task by incorporating into IE information about rhetorical zones, i.e. classification of spans of text in terms of argumentation and intellectual attribution. As the first step towards this goal, we introduce a scheme for annotating biological texts for rhetorical zones and provide a qualitative and quantitative analysis of the data annotated according to this scheme. We also discuss our preliminary research on automatic zone analysis, and its incorporation into our IE framework.

  10. Foreign Language Analysis and Recognition (FLARE) Progress

    DTIC Science & Technology

    2015-02-01

    Copies may be obtained from the Defense Technical Information Center (DTIC) (http://www.dtic.mil). AFRL- RH -WP-TR-2015-0007 HAS BEEN REVIEWED AND IS... retrieval (IR). 15. SUBJECT TERMS Automatic speech recognition (ASR), information retrieval (IR). 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF...to the Haystack Multilingual Multimedia Information Extraction and Retrieval (MMIER) system that was initially developed under a prior work unit

  11. Text-mining and information-retrieval services for molecular biology

    PubMed Central

    Krallinger, Martin; Valencia, Alfonso

    2005-01-01

    Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455

  12. Study on the extraction method of tidal flat area in northern Jiangsu Province based on remote sensing waterlines

    NASA Astrophysics Data System (ADS)

    Zhang, Yuanyuan; Gao, Zhiqiang; Liu, Xiangyang; Xu, Ning; Liu, Chaoshun; Gao, Wei

    2016-09-01

    Reclamation caused a significant dynamic change in the coastal zone, the tidal flat zone is an unstable reserve land resource, it has important significance for its research. In order to realize the efficient extraction of the tidal flat area information, this paper takes Rudong County in Jiangsu Province as the research area, using the HJ1A/1B images as the data source, on the basis of previous research experience and literature review, the paper chooses the method of object-oriented classification as a semi-automatic extraction method to generate waterlines. Then waterlines are analyzed by DSAS software to obtain tide points, automatic extraction of outer boundary points are followed under the use of Python to determine the extent of tidal flats in 2014 of Rudong County, the extraction area was 55182hm2, the confusion matrix is used to verify the accuracy and the result shows that the kappa coefficient is 0.945. The method could improve deficiencies of previous studies and its available free nature on the Internet makes a generalization.

  13. Independent component analysis for automatic note extraction from musical trills

    NASA Astrophysics Data System (ADS)

    Brown, Judith C.; Smaragdis, Paris

    2004-05-01

    The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.

  14. Convolution neural-network-based detection of lung structures

    NASA Astrophysics Data System (ADS)

    Hasegawa, Akira; Lo, Shih-Chung B.; Freedman, Matthew T.; Mun, Seong K.

    1994-05-01

    Chest radiography is one of the most primary and widely used techniques in diagnostic imaging. Nowadays with the advent of digital radiology, the digital medical image processing techniques for digital chest radiographs have attracted considerable attention, and several studies on the computer-aided diagnosis (CADx) as well as on the conventional image processing techniques for chest radiographs have been reported. In the automatic diagnostic process for chest radiographs, it is important to outline the areas of the lungs, the heart, and the diaphragm. This is because the original chest radiograph is composed of important anatomic structures and, without knowing exact positions of the organs, the automatic diagnosis may result in unexpected detections. The automatic extraction of an anatomical structure from digital chest radiographs can be a useful tool for (1) the evaluation of heart size, (2) automatic detection of interstitial lung diseases, (3) automatic detection of lung nodules, and (4) data compression, etc. Based on the clearly defined boundaries of heart area, rib spaces, rib positions, and rib cage extracted, one should be able to use this information to facilitate the tasks of the CADx on chest radiographs. In this paper, we present an automatic scheme for the detection of lung field from chest radiographs by using a shift-invariant convolution neural network. A novel algorithm for smoothing boundaries of lungs is also presented.

  15. Conception of Self-Construction Production Scheduling System

    NASA Astrophysics Data System (ADS)

    Xue, Hai; Zhang, Xuerui; Shimizu, Yasuhiro; Fujimura, Shigeru

    With the high speed innovation of information technology, many production scheduling systems have been developed. However, a lot of customization according to individual production environment is required, and then a large investment for development and maintenance is indispensable. Therefore now the direction to construct scheduling systems should be changed. The final objective of this research aims at developing a system which is built by it extracting the scheduling technique automatically through the daily production scheduling work, so that an investment will be reduced. This extraction mechanism should be applied for various production processes for the interoperability. Using the master information extracted by the system, production scheduling operators can be supported to accelerate the production scheduling work easily and accurately without any restriction of scheduling operations. By installing this extraction mechanism, it is easy to introduce scheduling system without a lot of expense for customization. In this paper, at first a model for expressing a scheduling problem is proposed. Then the guideline to extract the scheduling information and use the extracted information is shown and some applied functions are also proposed based on it.

  16. Target recognition based on convolutional neural network

    NASA Astrophysics Data System (ADS)

    Wang, Liqiang; Wang, Xin; Xi, Fubiao; Dong, Jian

    2017-11-01

    One of the important part of object target recognition is the feature extraction, which can be classified into feature extraction and automatic feature extraction. The traditional neural network is one of the automatic feature extraction methods, while it causes high possibility of over-fitting due to the global connection. The deep learning algorithm used in this paper is a hierarchical automatic feature extraction method, trained with the layer-by-layer convolutional neural network (CNN), which can extract the features from lower layers to higher layers. The features are more discriminative and it is beneficial to the object target recognition.

  17. An Expertise Recommender using Web Mining

    NASA Technical Reports Server (NTRS)

    Joshi, Anupam; Chandrasekaran, Purnima; ShuYang, Michelle; Ramakrishnan, Ramya

    2001-01-01

    This report explored techniques to mine web pages of scientists to extract information regarding their expertise, build expertise chains and referral webs, and semi automatically combine this information with directory information services to create a recommender system that permits query by expertise. The approach included experimenting with existing techniques that have been reported in research literature in recent past , and adapted them as needed. In addition, software tools were developed to capture and use this information.

  18. Support patient search on pathology reports with interactive online learning based data extraction.

    PubMed

    Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng

    2015-01-01

    Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.

  19. Big Data: The Future of Biocuration

    USDA-ARS?s Scientific Manuscript database

    This report is a white paper that describes the roles of biocurators in linking biologists to their data. Roles include: to extract knowledge from published papers; to connect information from different sources in a coherent and comprehensible way; to inspect and correct automatically predicted gene...

  20. Detection of exudates in fundus images using a Markovian segmentation model.

    PubMed

    Harangi, Balazs; Hajdu, Andras

    2014-01-01

    Diabetic retinopathy (DR) is one of the most common causing of vision loss in developed countries. In early stage of DR, some signs like exudates appear in the retinal images. An automatic screening system must be capable to detect these signs properly so that the treatment of the patients may begin in time. The appearance of exudates shows a rich variety regarding their shape and size making automatic detection more challenging. We propose a way for the automatic segmentation of exudates consisting of a candidate extraction step followed by exact contour detection and region-wise classification. More specifically, we extract possible exudate candidates using grayscale morphology and their proper shape is determined by a Markovian segmentation model considering edge information. Finally, we label the candidates as true or false ones by an optimally adjusted SVM classifier. For testing purposes, we considered the publicly available database DiaretDB1, where the proposed method outperformed several state-of-the-art exudate detectors.

  1. Automatic Recognition of Fetal Facial Standard Plane in Ultrasound Image via Fisher Vector.

    PubMed

    Lei, Baiying; Tan, Ee-Leng; Chen, Siping; Zhuo, Liu; Li, Shengli; Ni, Dong; Wang, Tianfu

    2015-01-01

    Acquisition of the standard plane is the prerequisite of biometric measurement and diagnosis during the ultrasound (US) examination. In this paper, a new algorithm is developed for the automatic recognition of the fetal facial standard planes (FFSPs) such as the axial, coronal, and sagittal planes. Specifically, densely sampled root scale invariant feature transform (RootSIFT) features are extracted and then encoded by Fisher vector (FV). The Fisher network with multi-layer design is also developed to extract spatial information to boost the classification performance. Finally, automatic recognition of the FFSPs is implemented by support vector machine (SVM) classifier based on the stochastic dual coordinate ascent (SDCA) algorithm. Experimental results using our dataset demonstrate that the proposed method achieves an accuracy of 93.27% and a mean average precision (mAP) of 99.19% in recognizing different FFSPs. Furthermore, the comparative analyses reveal the superiority of the proposed method based on FV over the traditional methods.

  2. A novel approach to automatic threat detection in MMW imagery of people scanned in portals

    NASA Astrophysics Data System (ADS)

    Vaidya, Nitin M.; Williams, Thomas

    2008-04-01

    We have developed a novel approach to performing automatic detection of concealed threat objects in passive MMW imagery of people scanned in a portal setting. It is applicable to the significant class of imaging scanners that use the protocol of having the subject rotate in front of the camera in order to image them from several closely spaced directions. Customary methods of dealing with MMW sequences rely on the analysis of the spatial images in a frame-by-frame manner, with information extracted from separate frames combined by some subsequent technique of data association and tracking over time. We contend that the pooling of information over time in traditional methods is not as direct as can be and potentially less efficient in distinguishing threats from clutter. We have formulated a more direct approach to extracting information about the scene as it evolves over time. We propose an atypical spatio-temporal arrangement of the MMW image data - to which we give the descriptive name Row Evolution Image (REI) sequence. This representation exploits the singular aspect of having the subject rotate in front of the camera. We point out which features in REIs are most relevant to detecting threats, and describe the algorithms we have developed to extract them. We demonstrate results of successful automatic detection of threats, including ones whose faint image contrast renders their disambiguation from clutter very challenging. We highlight the ease afforded by the REI approach in permitting specialization of the detection algorithms to different parts of the subject body. Finally, we describe the execution efficiency advantages of our approach, given its natural fit to parallel processing. mage

  3. Deep Learning from EEG Reports for Inferring Underspecified Information

    PubMed Central

    Goodwin, Travis R.; Harabagiu, Sanda M.

    2017-01-01

    Secondary use1of electronic health records (EHRs) often relies on the ability to automatically identify and extract information from EHRs. Unfortunately, EHRs are known to suffer from a variety of idiosyncrasies – most prevalently, they have been shown to often omit or underspecify information. Adapting traditional machine learning methods for inferring underspecified information relies on manually specifying features characterizing the specific information to recover (e.g. particular findings, test results, or physician’s impressions). By contrast, in this paper, we present a method for jointly (1) automatically extracting word- and report-level features and (2) inferring underspecified information from EHRs. Our approach accomplishes these two tasks jointly by combining recent advances in deep neural learning with access to textual data in electroencephalogram (EEG) reports. We evaluate the performance of our model on the problem of inferring the neurologist’s over-all impression (normal or abnormal) from electroencephalogram (EEG) reports and report an accuracy of 91.4% precision of 94.4% recall of 91.2% and F1 measure of 92.8% (a 40% improvement over the performance obtained using Doc2Vec). These promising results demonstrate the power of our approach, while error analysis reveals remaining obstacles as well as areas for future improvement. PMID:28815118

  4. A Supporting Platform for Semi-Automatic Hyoid Bone Tracking and Parameter Extraction from Videofluoroscopic Images for the Diagnosis of Dysphagia Patients.

    PubMed

    Lee, Jun Chang; Nam, Kyoung Won; Jang, Dong Pyo; Paik, Nam Jong; Ryu, Ju Seok; Kim, In Young

    2017-04-01

    Conventional kinematic analysis of videofluoroscopic (VF) swallowing image, most popular for dysphagia diagnosis, requires time-consuming and repetitive manual extraction of diagnostic information from multiple images representing one swallowing period, which results in a heavy work load for clinicians and excessive hospital visits for patients to receive counseling and prescriptions. In this study, a software platform was developed that can assist in the VF diagnosis of dysphagia by automatically extracting a two-dimensional moving trajectory of the hyoid bone as well as 11 temporal and kinematic parameters. Fifty VF swallowing videos containing both non-mandible-overlapped and mandible-overlapped cases from eight patients with dysphagia of various etiologies and 19 videos from ten healthy controls were utilized for performance verification. Percent errors of hyoid bone tracking were 1.7 ± 2.1% for non-overlapped images and 4.2 ± 4.8% for overlapped images. Correlation coefficients between manually extracted and automatically extracted moving trajectories of the hyoid bone were 0.986 ± 0.017 (X-axis) and 0.992 ± 0.006 (Y-axis) for non-overlapped images, and 0.988 ± 0.009 (X-axis) and 0.991 ± 0.006 (Y-axis) for overlapped images. Based on the experimental results, we believe that the proposed platform has the potential to improve the satisfaction of both clinicians and patients with dysphagia.

  5. Application of MPEG-7 descriptors for content-based indexing of sports videos

    NASA Astrophysics Data System (ADS)

    Hoeynck, Michael; Auweiler, Thorsten; Ohm, Jens-Rainer

    2003-06-01

    The amount of multimedia data available worldwide is increasing every day. There is a vital need to annotate multimedia data in order to allow universal content access and to provide content-based search-and-retrieval functionalities. Since supervised video annotation can be time consuming, an automatic solution is appreciated. We review recent approaches to content-based indexing and annotation of videos for different kind of sports, and present our application for the automatic annotation of equestrian sports videos. Thereby, we especially concentrate on MPEG-7 based feature extraction and content description. We apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information and taking specific domain knowledge into account. Having determined single shot positions as well as the visual highlights, the information is jointly stored together with additional textual information in an MPEG-7 description scheme. Using this information, we generate content summaries which can be utilized in a user front-end in order to provide content-based access to the video stream, but further content-based queries and navigation on a video-on-demand streaming server.

  6. An automatic system to detect and extract texts in medical images for de-identification

    NASA Astrophysics Data System (ADS)

    Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael

    2010-03-01

    Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.

  7. Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

    PubMed

    Elayavilli, Ravikumar Komandur; Liu, Hongfang

    2016-01-01

    Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.

  8. Recent progress in automatically extracting information from the pharmacogenomic literature

    PubMed Central

    Garten, Yael; Coulet, Adrien; Altman, Russ B

    2011-01-01

    The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206

  9. Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports

    PubMed Central

    Yim, Wen-wai; Kwan, Sharon W; Johnson, Guy; Yetisgen, Meliha

    2017-01-01

    Cancer stage information is important for clinical research. However, they are not always explicitly noted in electronic medical records. In this paper, we present our work on automatic classification of hepatocellular carcinoma (HCC) stages from free-text clinical and radiology notes. To accomplish this, we defined 11 stage parameters used in the three HCC staging systems, American Joint Committee on Cancer (AJCC), Barcelona Clinic Liver Cancer (BCLC), and Cancer of the Liver Italian Program (CLIP). After aggregating stage parameters to the patient-level, the final stage classifications were achieved using an expert-created decision logic. Each stage parameter relevant for staging was extracted using several classification methods, e.g. sentence classification and automatic information structuring, to identify and normalize text as cancer stage parameter values. Stage parameter extraction for the test set performed at 0.81 F1. Cancer stage prediction for AJCC, BCLC, and CLIP stage classifications were 0.55, 0.50, and 0.43 F1.

  10. MULTISCALE TENSOR ANISOTROPIC FILTERING OF FLUORESCENCE MICROSCOPY FOR DENOISING MICROVASCULATURE.

    PubMed

    Prasath, V B S; Pelapur, R; Glinskii, O V; Glinsky, V V; Huxley, V H; Palaniappan, K

    2015-04-01

    Fluorescence microscopy images are contaminated by noise and improving image quality without blurring vascular structures by filtering is an important step in automatic image analysis. The application of interest here is to automatically extract the structural components of the microvascular system with accuracy from images acquired by fluorescence microscopy. A robust denoising process is necessary in order to extract accurate vascular morphology information. For this purpose, we propose a multiscale tensor with anisotropic diffusion model which progressively and adaptively updates the amount of smoothing while preserving vessel boundaries accurately. Based on a coherency enhancing flow with planar confidence measure and fused 3D structure information, our method integrates multiple scales for microvasculature preservation and noise removal membrane structures. Experimental results on simulated synthetic images and epifluorescence images show the advantage of our improvement over other related diffusion filters. We further show that the proposed multiscale integration approach improves denoising accuracy of different tensor diffusion methods to obtain better microvasculature segmentation.

  11. Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports.

    PubMed

    Krebs, Jonathan; Corovic, Hamo; Dietrich, Georg; Ertl, Max; Fette, Georg; Kaspar, Mathias; Krug, Markus; Stoerk, Stefan; Puppe, Frank

    2017-01-01

    Extraction of structured data from textual reports is an important subtask for building medical data warehouses for research and care. Many medical and most radiology reports are written in a telegraphic style with a concatenation of noun phrases describing the presence or absence of findings. Therefore a lexico-syntactical approach is promising, where key terms and their relations are recognized and mapped on a predefined standard terminology (ontology). We propose a two-phase algorithm for terminology matching: In the first pass, a local terminology for recognition is derived as close as possible to the terms used in the radiology reports. In the second pass, the local terminology is mapped to a standard terminology. In this paper, we report on an algorithm for the first step of semi-automatic generation of the local terminology and evaluate the algorithm with radiology reports of chest X-ray examinations from Würzburg university hospital. With an effort of about 20 hours work of a radiologist as domain expert and 10 hours for meetings, a local terminology with about 250 attributes and various value patterns was built. In an evaluation with 100 randomly chosen reports it achieved an F1-Score of about 95% for information extraction.

  12. Automatic Segmentation of High-Throughput RNAi Fluorescent Cellular Images

    PubMed Central

    Yan, Pingkum; Zhou, Xiaobo; Shah, Mubarak; Wong, Stephen T. C.

    2010-01-01

    High-throughput genome-wide RNA interference (RNAi) screening is emerging as an essential tool to assist biologists in understanding complex cellular processes. The large number of images produced in each study make manual analysis intractable; hence, automatic cellular image analysis becomes an urgent need, where segmentation is the first and one of the most important steps. In this paper, a fully automatic method for segmentation of cells from genome-wide RNAi screening images is proposed. Nuclei are first extracted from the DNA channel by using a modified watershed algorithm. Cells are then extracted by modeling the interaction between them as well as combining both gradient and region information in the Actin and Rac channels. A new energy functional is formulated based on a novel interaction model for segmenting tightly clustered cells with significant intensity variance and specific phenotypes. The energy functional is minimized by using a multiphase level set method, which leads to a highly effective cell segmentation method. Promising experimental results demonstrate that automatic segmentation of high-throughput genome-wide multichannel screening can be achieved by using the proposed method, which may also be extended to other multichannel image segmentation problems. PMID:18270043

  13. Validity of association rules extracted by healthcare-data-mining.

    PubMed

    Takeuchi, Hiroshi; Kodama, Naoki

    2014-01-01

    A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.

  14. Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature.

    PubMed

    Hsiao, Mei-Yu; Chen, Chien-Chung; Chen, Jyh-Horng

    2009-10-01

    With a rapid progress in the field, a great many fMRI studies are published every year, to the extent that it is now becoming difficult for researchers to keep up with the literature, since reading papers is extremely time-consuming and labor-intensive. Thus, automatic information extraction has become an important issue. In this study, we used the Unified Medical Language System (UMLS) to construct a hierarchical concept-based dictionary of brain functions. To the best of our knowledge, this is the first generalized dictionary of this kind. We also developed an information extraction system for recognizing, mapping and classifying terms relevant to human brain study. The precision and recall of our system was on a par with that of human experts in term recognition, term mapping and term classification. Our approach presented in this paper presents an alternative to the more laborious, manual entry approach to information extraction.

  15. Automatic Keyword Extraction from Individual Documents

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rose, Stuart J.; Engel, David W.; Cramer, Nicholas O.

    2010-05-03

    This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.

  16. NASA automatic subject analysis technique for extracting retrievable multi-terms (NASA TERM) system

    NASA Technical Reports Server (NTRS)

    Kirschbaum, J.; Williamson, R. E.

    1978-01-01

    Current methods for information processing and retrieval used at the NASA Scientific and Technical Information Facility are reviewed. A more cost effective computer aided indexing system is proposed which automatically generates print terms (phrases) from the natural text. Satisfactory print terms can be generated in a primarily automatic manner to produce a thesaurus (NASA TERMS) which extends all the mappings presently applied by indexers, specifies the worth of each posting term in the thesaurus, and indicates the areas of use of the thesaurus entry phrase. These print terms enable the computer to determine which of several terms in a hierarchy is desirable and to differentiate ambiguous terms. Steps in the NASA TERMS algorithm are discussed and the processing of surrogate entry phrases is demonstrated using four previously manually indexed STAR abstracts for comparison. The simulation shows phrase isolation, text phrase reduction, NASA terms selection, and RECON display.

  17. Automatic definition of the oncologic EHR data elements from NCIT in OWL.

    PubMed

    Cuggia, Marc; Bourdé, Annabel; Turlin, Bruno; Vincendeau, Sebastien; Bertaud, Valerie; Bohec, Catherine; Duvauferrier, Régis

    2011-01-01

    Semantic interoperability based on ontologies allows systems to combine their information and process them automatically. The ability to extract meaningful fragments from ontology is a key for the ontology re-use and the construction of a subset will help to structure clinical data entries. The aim of this work is to provide a method for extracting a set of concepts for a specific domain, in order to help to define data elements of an oncologic EHR. a generic extraction algorithm was developed to extract, from the NCIT and for a specific disease (i.e. prostate neoplasm), all the concepts of interest into a sub-ontology. We compared all the concepts extracted to the concepts encoded manually contained into the multi-disciplinary meeting report form (MDMRF). We extracted two sub-ontologies: sub-ontology 1 by using a single key concept and sub-ontology 2 by using 5 additional keywords. The coverage of sub-ontology 2 to the MDMRF concepts was 51%. The low rate of coverage is due to the lack of definition or mis-classification of the NCIT concepts. By providing a subset of concepts focused on a particular domain, this extraction method helps at optimizing the binding process of data elements and at maintaining and enriching a domain ontology.

  18. Introducing meta-services for biomedical information extraction

    PubMed Central

    Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso

    2008-01-01

    We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497

  19. Passive Polarimetric Information Processing for Target Classification

    NASA Astrophysics Data System (ADS)

    Sadjadi, Firooz; Sadjadi, Farzad

    Polarimetric sensing is an area of active research in a variety of applications. In particular, the use of polarization diversity has been shown to improve performance in automatic target detection and recognition. Within the diverse scope of polarimetric sensing, the field of passive polarimetric sensing is of particular interest. This chapter presents several new methods for gathering in formation using such passive techniques. One method extracts three-dimensional (3D) information and surface properties using one or more sensors. Another method extracts scene-specific algebraic expressions that remain unchanged under polariza tion transformations (such as along the transmission path to the sensor).

  20. Building Extraction Based on Openstreetmap Tags and Very High Spatial Resolution Image in Urban Area

    NASA Astrophysics Data System (ADS)

    Kang, L.; Wang, Q.; Yan, H. W.

    2018-04-01

    How to derive contour of buildings from VHR images is the essential problem for automatic building extraction in urban area. To solve this problem, OSM data is introduced to offer vector contour information of buildings which is hard to get from VHR images. First, we import OSM data into database. The line string data of OSM with tags of building, amenity, office etc. are selected and combined into completed contours; Second, the accuracy of contours of buildings is confirmed by comparing with the real buildings in Google Earth; Third, maximum likelihood classification is conducted with the confirmed building contours, and the result demonstrates that the proposed approach is effective and accurate. The approach offers a new way for automatic interpretation of VHR images.

  1. FEX: A Knowledge-Based System For Planimetric Feature Extraction

    NASA Astrophysics Data System (ADS)

    Zelek, John S.

    1988-10-01

    Topographical planimetric features include natural surfaces (rivers, lakes) and man-made surfaces (roads, railways, bridges). In conventional planimetric feature extraction, a photointerpreter manually interprets and extracts features from imagery on a stereoplotter. Visual planimetric feature extraction is a very labour intensive operation. The advantages of automating feature extraction include: time and labour savings; accuracy improvements; and planimetric data consistency. FEX (Feature EXtraction) combines techniques from image processing, remote sensing and artificial intelligence for automatic feature extraction. The feature extraction process co-ordinates the information and knowledge in a hierarchical data structure. The system simulates the reasoning of a photointerpreter in determining the planimetric features. Present efforts have concentrated on the extraction of road-like features in SPOT imagery. Keywords: Remote Sensing, Artificial Intelligence (AI), SPOT, image understanding, knowledge base, apars.

  2. Abbreviation definition identification based on automatic precision estimates.

    PubMed

    Sohn, Sunghwan; Comeau, Donald C; Kim, Won; Wilbur, W John

    2008-09-25

    The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation. On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm. We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.

  3. A method for automatic feature points extraction of human vertebrae three-dimensional model

    NASA Astrophysics Data System (ADS)

    Wu, Zhen; Wu, Junsheng

    2017-05-01

    A method for automatic extraction of the feature points of the human vertebrae three-dimensional model is presented. Firstly, the statistical model of vertebrae feature points is established based on the results of manual vertebrae feature points extraction. Then anatomical axial analysis of the vertebrae model is performed according to the physiological and morphological characteristics of the vertebrae. Using the axial information obtained from the analysis, a projection relationship between the statistical model and the vertebrae model to be extracted is established. According to the projection relationship, the statistical model is matched with the vertebrae model to get the estimated position of the feature point. Finally, by analyzing the curvature in the spherical neighborhood with the estimated position of feature points, the final position of the feature points is obtained. According to the benchmark result on multiple test models, the mean relative errors of feature point positions are less than 5.98%. At more than half of the positions, the error rate is less than 3% and the minimum mean relative error is 0.19%, which verifies the effectiveness of the method.

  4. Automatic facial animation parameters extraction in MPEG-4 visual communication

    NASA Astrophysics Data System (ADS)

    Yang, Chenggen; Gong, Wanwei; Yu, Lu

    2002-01-01

    Facial Animation Parameters (FAPs) are defined in MPEG-4 to animate a facial object. The algorithm proposed in this paper to extract these FAPs is applied to very low bit-rate video communication, in which the scene is composed of a head-and-shoulder object with complex background. This paper addresses the algorithm to automatically extract all FAPs needed to animate a generic facial model, estimate the 3D motion of head by points. The proposed algorithm extracts human facial region by color segmentation and intra-frame and inter-frame edge detection. Facial structure and edge distribution of facial feature such as vertical and horizontal gradient histograms are used to locate the facial feature region. Parabola and circle deformable templates are employed to fit facial feature and extract a part of FAPs. A special data structure is proposed to describe deformable templates to reduce time consumption for computing energy functions. Another part of FAPs, 3D rigid head motion vectors, are estimated by corresponding-points method. A 3D head wire-frame model provides facial semantic information for selection of proper corresponding points, which helps to increase accuracy of 3D rigid object motion estimation.

  5. Flow Quantification from 2D Phase Contrast MRI in Renal Arteries Using Clustering

    NASA Astrophysics Data System (ADS)

    Zöllner, Frank G.; Monnsen, Jan Ankar; Lundervold, Arvid; Rørvik, Jarle

    We present an approach based on clustering to segment renal arteries from 2D PC Cine MR images to measure blood velocity and flow. Such information are important in grading renal artery stenosis and support the decision on surgical interventions like percutan transluminal angioplasty. Results show that the renal arteries could be extracted automatically and the corresponding velocity profiles could be calculated. Furthermore, the clustering could detect possible phase wrap effects automatically as well as differences in the blood flow patterns within the vessel.

  6. Rapid Extraction of Lexical Tone Phonology in Chinese Characters: A Visual Mismatch Negativity Study

    PubMed Central

    Wang, Xiao-Dong; Liu, A-Ping; Wu, Yin-Yuan; Wang, Peng

    2013-01-01

    Background In alphabetic languages, emerging evidence from behavioral and neuroimaging studies shows the rapid and automatic activation of phonological information in visual word recognition. In the mapping from orthography to phonology, unlike most alphabetic languages in which there is a natural correspondence between the visual and phonological forms, in logographic Chinese, the mapping between visual and phonological forms is rather arbitrary and depends on learning and experience. The issue of whether the phonological information is rapidly and automatically extracted in Chinese characters by the brain has not yet been thoroughly addressed. Methodology/Principal Findings We continuously presented Chinese characters differing in orthography and meaning to adult native Mandarin Chinese speakers to construct a constant varying visual stream. In the stream, most stimuli were homophones of Chinese characters: The phonological features embedded in these visual characters were the same, including consonants, vowels and the lexical tone. Occasionally, the rule of phonology was randomly violated by characters whose phonological features differed in the lexical tone. Conclusions/Significance We showed that the violation of the lexical tone phonology evoked an early, robust visual response, as revealed by whole-head electrical recordings of the visual mismatch negativity (vMMN), indicating the rapid extraction of phonological information embedded in Chinese characters. Source analysis revealed that the vMMN was involved in neural activations of the visual cortex, suggesting that the visual sensory memory is sensitive to phonological information embedded in visual words at an early processing stage. PMID:23437235

  7. Enhancing interpretability of automatically extracted machine learning features: application to a RBM-Random Forest system on brain lesion segmentation.

    PubMed

    Pereira, Sérgio; Meier, Raphael; McKinley, Richard; Wiest, Roland; Alves, Victor; Silva, Carlos A; Reyes, Mauricio

    2018-02-01

    Machine learning systems are achieving better performances at the cost of becoming increasingly complex. However, because of that, they become less interpretable, which may cause some distrust by the end-user of the system. This is especially important as these systems are pervasively being introduced to critical domains, such as the medical field. Representation Learning techniques are general methods for automatic feature computation. Nevertheless, these techniques are regarded as uninterpretable "black boxes". In this paper, we propose a methodology to enhance the interpretability of automatically extracted machine learning features. The proposed system is composed of a Restricted Boltzmann Machine for unsupervised feature learning, and a Random Forest classifier, which are combined to jointly consider existing correlations between imaging data, features, and target variables. We define two levels of interpretation: global and local. The former is devoted to understanding if the system learned the relevant relations in the data correctly, while the later is focused on predictions performed on a voxel- and patient-level. In addition, we propose a novel feature importance strategy that considers both imaging data and target variables, and we demonstrate the ability of the approach to leverage the interpretability of the obtained representation for the task at hand. We evaluated the proposed methodology in brain tumor segmentation and penumbra estimation in ischemic stroke lesions. We show the ability of the proposed methodology to unveil information regarding relationships between imaging modalities and extracted features and their usefulness for the task at hand. In both clinical scenarios, we demonstrate that the proposed methodology enhances the interpretability of automatically learned features, highlighting specific learning patterns that resemble how an expert extracts relevant data from medical images. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Populating the Semantic Web by Macro-reading Internet Text

    NASA Astrophysics Data System (ADS)

    Mitchell, Tom M.; Betteridge, Justin; Carlson, Andrew; Hruschka, Estevam; Wang, Richard

    A key question regarding the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web. We also describe preliminary results demonstrating that machine learning algorithms can learn to extract tens of thousands of facts to populate a diverse ontology, with imperfect but reasonably good accuracy.

  9. Automatic Line Network Extraction from Aerial Imagery of Urban Areas through Knowledge Based Image Analysis

    DTIC Science & Technology

    1989-08-01

    Automatic Line Network Extraction from Aerial Imangery of Urban Areas Sthrough KnowledghBased Image Analysis N 04 Final Technical ReportI December...Automatic Line Network Extraction from Aerial Imagery of Urban Areas through Knowledge Based Image Analysis Accesion For NTIS CRA&I DTIC TAB 0...paittern re’ognlition. blac’kboardl oriented symbollic processing, knowledge based image analysis , image understanding, aer’ial imsagery, urban area, 17

  10. ANNUAL REPORT-AUTOMATIC INDEXING AND ABSTRACTING.

    ERIC Educational Resources Information Center

    Lockheed Missiles and Space Co., Palo Alto, CA. Electronic Sciences Lab.

    THE INVESTIGATION IS CONCERNED WITH THE DEVELOPMENT OF AUTOMATIC INDEXING, ABSTRACTING, AND EXTRACTING SYSTEMS. BASIC INVESTIGATIONS IN ENGLISH MORPHOLOGY, PHONETICS, AND SYNTAX ARE PURSUED AS NECESSARY MEANS TO THIS END. IN THE FIRST SECTION THE THEORY AND DESIGN OF THE "SENTENCE DICTIONARY" EXPERIMENT IN AUTOMATIC EXTRACTION IS OUTLINED. SOME OF…

  11. Automatic morphological classification of galaxy images

    PubMed Central

    Shamir, Lior

    2009-01-01

    We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594

  12. A Robust Gradient Based Method for Building Extraction from LiDAR and Photogrammetric Imagery.

    PubMed

    Siddiqui, Fasahat Ullah; Teng, Shyh Wei; Awrangjeb, Mohammad; Lu, Guojun

    2016-07-19

    Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existing methods use numerous parameters to extract buildings in complex environments, e.g., hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics.

  13. A Robust Gradient Based Method for Building Extraction from LiDAR and Photogrammetric Imagery

    PubMed Central

    Siddiqui, Fasahat Ullah; Teng, Shyh Wei; Awrangjeb, Mohammad; Lu, Guojun

    2016-01-01

    Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existing methods use numerous parameters to extract buildings in complex environments, e.g., hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics. PMID:27447631

  14. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature

    PubMed Central

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838

  15. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

    PubMed

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.

  16. Automatic retinal blood vessel parameter calculation in spectral domain optical coherence tomography

    NASA Astrophysics Data System (ADS)

    Wehbe, Hassan; Ruggeri, Marco; Jiao, Shuliang; Gregori, Giovanni; Puliafito, Carmen A.

    2007-02-01

    Measurement of retinal blood vessel parameters like the blood blow in the vessels may have significant impact on the study and diagnosis of glaucoma, a leading blinding disease worldwide. Optical coherence tomography (OCT) is a noninvasive imaging technique that can provide not only microscopic structural imaging of the retina but also functional information like the blood flow velocity in the retina. The aim of this study is to automatically extract the parameters of retinal blood vessels like the 3D orientation, the vessel diameters, as well as the corresponding absolute blood flow velocity in the vessel. The parameters were extracted from circular OCT scans around the optic disc. By removing the surface reflection through simple segmentation of the circular OCT scans a blood vessel shadowgram can be generated. The lateral coordinates and the diameter of each blood vessel are extracted from the shadowgram through a series of signal processing. Upon determination of the lateral position and the vessel diameter, the coordinate in the depth direction of each blood vessel is calculated in combination with the Doppler information for the vessel. The extraction of the vessel coordinates and diameter makes it possible to calculate the orientation of the vessel in reference to the direction of the incident sample light, which in turn can be used to calculate the absolute blood flow velocity and the flow rate.

  17. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association

    PubMed Central

    Ma, Jian; Casey, Cameron P.; Zheng, Xueyun; Ibrahim, Yehia M.; Wilkins, Christopher S.; Renslow, Ryan S.; Thomas, Dennis G.; Payne, Samuel H.; Monroe, Matthew E.; Smith, Richard D.; Teeguarden, Justin G.; Baker, Erin S.; Metz, Thomas O.

    2017-01-01

    Abstract Motivation: Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. Results: We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. Availability and implementation: PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library. Contact: erin.baker@pnnl.gov or thomas.metz@pnnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28505286

  18. Automated encoding of clinical documents based on natural language processing.

    PubMed

    Friedman, Carol; Shagina, Lyudmila; Lussier, Yves; Hripcsak, George

    2004-01-01

    The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

  19. Automatic optical detection and classification of marine animals around MHK converters using machine vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brunton, Steven

    Optical systems provide valuable information for evaluating interactions and associations between organisms and MHK energy converters and for capturing potentially rare encounters between marine organisms and MHK device. The deluge of optical data from cabled monitoring packages makes expert review time-consuming and expensive. We propose algorithms and a processing framework to automatically extract events of interest from underwater video. The open-source software framework consists of background subtraction, filtering, feature extraction and hierarchical classification algorithms. This principle classification pipeline was validated on real-world data collected with an experimental underwater monitoring package. An event detection rate of 100% was achieved using robustmore » principal components analysis (RPCA), Fourier feature extraction and a support vector machine (SVM) binary classifier. The detected events were then further classified into more complex classes – algae | invertebrate | vertebrate, one species | multiple species of fish, and interest rank. Greater than 80% accuracy was achieved using a combination of machine learning techniques.« less

  20. Automated software system for checking the structure and format of ACM SIG documents

    NASA Astrophysics Data System (ADS)

    Mirza, Arsalan Rahman; Sah, Melike

    2017-04-01

    Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.

  1. Validation of the ICU-DaMa tool for automatically extracting variables for minimum dataset and quality indicators: The importance of data quality assessment.

    PubMed

    Sirgo, Gonzalo; Esteban, Federico; Gómez, Josep; Moreno, Gerard; Rodríguez, Alejandro; Blanch, Lluis; Guardiola, Juan José; Gracia, Rafael; De Haro, Lluis; Bodí, María

    2018-04-01

    Big data analytics promise insights into healthcare processes and management, improving outcomes while reducing costs. However, data quality is a major challenge for reliable results. Business process discovery techniques and an associated data model were used to develop data management tool, ICU-DaMa, for extracting variables essential for overseeing the quality of care in the intensive care unit (ICU). To determine the feasibility of using ICU-DaMa to automatically extract variables for the minimum dataset and ICU quality indicators from the clinical information system (CIS). The Wilcoxon signed-rank test and Fisher's exact test were used to compare the values extracted from the CIS with ICU-DaMa for 25 variables from all patients attended in a polyvalent ICU during a two-month period against the gold standard of values manually extracted by two trained physicians. Discrepancies with the gold standard were classified into plausibility, conformance, and completeness errors. Data from 149 patients were included. Although there were no significant differences between the automatic method and the manual method, we detected differences in values for five variables, including one plausibility error and two conformance and completeness errors. Plausibility: 1) Sex, ICU-DaMa incorrectly classified one male patient as female (error generated by the Hospital's Admissions Department). Conformance: 2) Reason for isolation, ICU-DaMa failed to detect a human error in which a professional misclassified a patient's isolation. 3) Brain death, ICU-DaMa failed to detect another human error in which a professional likely entered two mutually exclusive values related to the death of the patient (brain death and controlled donation after circulatory death). Completeness: 4) Destination at ICU discharge, ICU-DaMa incorrectly classified two patients due to a professional failing to fill out the patient discharge form when thepatients died. 5) Length of continuous renal replacement therapy, data were missing for one patient because the CRRT device was not connected to the CIS. Automatic generation of minimum dataset and ICU quality indicators using ICU-DaMa is feasible. The discrepancies were identified and can be corrected by improving CIS ergonomics, training healthcare professionals in the culture of the quality of information, and using tools for detecting and correcting data errors. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Automatic digital surface model (DSM) generation from aerial imagery data

    NASA Astrophysics Data System (ADS)

    Zhou, Nan; Cao, Shixiang; He, Hongyan; Xing, Kun; Yue, Chunyu

    2018-04-01

    Aerial sensors are widely used to acquire imagery for photogrammetric and remote sensing application. In general, the images have large overlapped region, which provide a lot of redundant geometry and radiation information for matching. This paper presents a POS supported dense matching procedure for automatic DSM generation from aerial imagery data. The method uses a coarse-to-fine hierarchical strategy with an effective combination of several image matching algorithms: image radiation pre-processing, image pyramid generation, feature point extraction and grid point generation, multi-image geometrically constraint cross-correlation (MIG3C), global relaxation optimization, multi-image geometrically constrained least squares matching (MIGCLSM), TIN generation and point cloud filtering. The image radiation pre-processing is used in order to reduce the effects of the inherent radiometric problems and optimize the images. The presented approach essentially consists of 3 components: feature point extraction and matching procedure, grid point matching procedure and relational matching procedure. The MIGCLSM method is used to achieve potentially sub-pixel accuracy matches and identify some inaccurate and possibly false matches. The feasibility of the method has been tested on different aerial scale images with different landcover types. The accuracy evaluation is based on the comparison between the automatic extracted DSMs derived from the precise exterior orientation parameters (EOPs) and the POS.

  3. An Efficient Method for Automatic Road Extraction Based on Multiple Features from LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Y.; Hu, X.; Guan, H.; Liu, P.

    2016-06-01

    The road extraction in urban areas is difficult task due to the complicated patterns and many contextual objects. LiDAR data directly provides three dimensional (3D) points with less occlusions and smaller shadows. The elevation information and surface roughness are distinguishing features to separate roads. However, LiDAR data has some disadvantages are not beneficial to object extraction, such as the irregular distribution of point clouds and lack of clear edges of roads. For these problems, this paper proposes an automatic road centerlines extraction method which has three major steps: (1) road center point detection based on multiple feature spatial clustering for separating road points from ground points, (2) local principal component analysis with least squares fitting for extracting the primitives of road centerlines, and (3) hierarchical grouping for connecting primitives into complete roads network. Compared with MTH (consist of Mean shift algorithm, Tensor voting, and Hough transform) proposed in our previous article, this method greatly reduced the computational cost. To evaluate the proposed method, the Vaihingen data set, a benchmark testing data provided by ISPRS for "Urban Classification and 3D Building Reconstruction" project, was selected. The experimental results show that our method achieve the same performance by less time in road extraction using LiDAR data.

  4. Fast and automatic algorithm for optic disc extraction in retinal images using principle-component-analysis-based preprocessing and curvelet transform.

    PubMed

    Shahbeig, Saleh; Pourghassem, Hossein

    2013-01-01

    Optic disc or optic nerve (ON) head extraction in retinal images has widespread applications in retinal disease diagnosis and human identification in biometric systems. This paper introduces a fast and automatic algorithm for detecting and extracting the ON region accurately from the retinal images without the use of the blood-vessel information. In this algorithm, to compensate for the destructive changes of the illumination and also enhance the contrast of the retinal images, we estimate the illumination of background and apply an adaptive correction function on the curvelet transform coefficients of retinal images. In other words, we eliminate the fault factors and pave the way to extract the ON region exactly. Then, we detect the ON region from retinal images using the morphology operators based on geodesic conversions, by applying a proper adaptive correction function on the reconstructed image's curvelet transform coefficients and a novel powerful criterion. Finally, using a local thresholding on the detected area of the retinal images, we extract the ON region. The proposed algorithm is evaluated on available images of DRIVE and STARE databases. The experimental results indicate that the proposed algorithm obtains an accuracy rate of 100% and 97.53% for the ON extractions on DRIVE and STARE databases, respectively.

  5. Application of Mls Data to the Assessment of Safety-Related Features in the Surrounding Area of Automatically Detected Pedestrian Crossings

    NASA Astrophysics Data System (ADS)

    Soilán, M.; Riveiro, B.; Sánchez-Rodríguez, A.; González-deSantos, L. M.

    2018-05-01

    During the last few years, there has been a huge methodological development regarding the automatic processing of 3D point cloud data acquired by both terrestrial and aerial mobile mapping systems, motivated by the improvement of surveying technologies and hardware performance. This paper presents a methodology that, in a first place, extracts geometric and semantic information regarding the road markings within the surveyed area from Mobile Laser Scanning (MLS) data, and then employs it to isolate street areas where pedestrian crossings are found and, therefore, pedestrians are more likely to cross the road. Then, different safety-related features can be extracted in order to offer information about the adequacy of the pedestrian crossing regarding its safety, which can be displayed in a Geographical Information System (GIS) layer. These features are defined in four different processing modules: Accessibility analysis, traffic lights classification, traffic signs classification, and visibility analysis. The validation of the proposed methodology has been carried out in two different cities in the northwest of Spain, obtaining both quantitative and qualitative results for pedestrian crossing classification and for each processing module of the safety assessment on pedestrian crossing environments.

  6. A research of road centerline extraction algorithm from high resolution remote sensing images

    NASA Astrophysics Data System (ADS)

    Zhang, Yushan; Xu, Tingfa

    2017-09-01

    Satellite remote sensing technology has become one of the most effective methods for land surface monitoring in recent years, due to its advantages such as short period, large scale and rich information. Meanwhile, road extraction is an important field in the applications of high resolution remote sensing images. An intelligent and automatic road extraction algorithm with high precision has great significance for transportation, road network updating and urban planning. The fuzzy c-means (FCM) clustering segmentation algorithms have been used in road extraction, but the traditional algorithms did not consider spatial information. An improved fuzzy C-means clustering algorithm combined with spatial information (SFCM) is proposed in this paper, which is proved to be effective for noisy image segmentation. Firstly, the image is segmented using the SFCM. Secondly, the segmentation result is processed by mathematical morphology to remover the joint region. Thirdly, the road centerlines are extracted by morphology thinning and burr trimming. The average integrity of the centerline extraction algorithm is 97.98%, the average accuracy is 95.36% and the average quality is 93.59%. Experimental results show that the proposed method in this paper is effective for road centerline extraction.

  7. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    PubMed

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  8. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    PubMed Central

    Zhang, Hongjun; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463

  9. Image analysis for skeletal evaluation of carpal bones

    NASA Astrophysics Data System (ADS)

    Ko, Chien-Chuan; Mao, Chi-Wu; Lin, Chi-Jen; Sun, Yung-Nien

    1995-04-01

    The assessment of bone age is an important field to the pediatric radiology. It provides very important information for treatment and prediction of skeletal growth in a developing child. So far, various computerized algorithms for automatically assessing the skeletal growth have been reported. Most of these methods made attempt to analyze the phalangeal growth. The most fundamental step in these automatic measurement methods is the image segmentation that extracts bones from soft-tissue and background. These automatic segmentation methods of hand radiographs can roughly be categorized into two main approaches that are edge and region based methods. This paper presents a region-based carpal-bone segmentation approach. It is organized into four stages: contrast enhancement, moment-preserving thresholding, morphological processing, and region-growing labeling.

  10. Automatic identification of bullet signatures based on consecutive matching striae (CMS) criteria.

    PubMed

    Chu, Wei; Thompson, Robert M; Song, John; Vorburger, Theodore V

    2013-09-10

    The consecutive matching striae (CMS) numeric criteria for firearm and toolmark identifications have been widely accepted by forensic examiners, although there have been questions concerning its observer subjectivity and limited statistical support. In this paper, based on signal processing and extraction, a model for the automatic and objective counting of CMS is proposed. The position and shape information of the striae on the bullet land is represented by a feature profile, which is used for determining the CMS number automatically. Rapid counting of CMS number provides a basis for ballistics correlations with large databases and further statistical and probability analysis. Experimental results in this report using bullets fired from ten consecutively manufactured barrels support this developed model. Published by Elsevier Ireland Ltd.

  11. A Complete System for Automatic Extraction of Left Ventricular Myocardium From CT Images Using Shape Segmentation and Contour Evolution

    PubMed Central

    Zhu, Liangjia; Gao, Yi; Appia, Vikram; Yezzi, Anthony; Arepalli, Chesnal; Faber, Tracy; Stillman, Arthur; Tannenbaum, Allen

    2014-01-01

    The left ventricular myocardium plays a key role in the entire circulation system and an automatic delineation of the myocardium is a prerequisite for most of the subsequent functional analysis. In this paper, we present a complete system for an automatic segmentation of the left ventricular myocardium from cardiac computed tomography (CT) images using the shape information from images to be segmented. The system follows a coarse-to-fine strategy by first localizing the left ventricle and then deforming the myocardial surfaces of the left ventricle to refine the segmentation. In particular, the blood pool of a CT image is extracted and represented as a triangulated surface. Then, the left ventricle is localized as a salient component on this surface using geometric and anatomical characteristics. After that, the myocardial surfaces are initialized from the localization result and evolved by applying forces from the image intensities with a constraint based on the initial myocardial surface locations. The proposed framework has been validated on 34-human and 12-pig CT images, and the robustness and accuracy are demonstrated. PMID:24723531

  12. Linking attentional processes and conceptual problem solving: visual cues facilitate the automaticity of extracting relevant information from diagrams

    PubMed Central

    Rouinfar, Amy; Agra, Elise; Larson, Adam M.; Rebello, N. Sanjay; Loschky, Lester C.

    2014-01-01

    This study investigated links between visual attention processes and conceptual problem solving. This was done by overlaying visual cues on conceptual physics problem diagrams to direct participants’ attention to relevant areas to facilitate problem solving. Participants (N = 80) individually worked through four problem sets, each containing a diagram, while their eye movements were recorded. Each diagram contained regions that were relevant to solving the problem correctly and separate regions related to common incorrect responses. Problem sets contained an initial problem, six isomorphic training problems, and a transfer problem. The cued condition saw visual cues overlaid on the training problems. Participants’ verbal responses were used to determine their accuracy. This study produced two major findings. First, short duration visual cues which draw attention to solution-relevant information and aid in the organizing and integrating of it, facilitate both immediate problem solving and generalization of that ability to new problems. Thus, visual cues can facilitate re-representing a problem and overcoming impasse, enabling a correct solution. Importantly, these cueing effects on problem solving did not involve the solvers’ attention necessarily embodying the solution to the problem, but were instead caused by solvers attending to and integrating relevant information in the problems into a solution path. Second, this study demonstrates that when such cues are used across multiple problems, solvers can automatize the extraction of problem-relevant information extraction. These results suggest that low-level attentional selection processes provide a necessary gateway for relevant information to be used in problem solving, but are generally not sufficient for correct problem solving. Instead, factors that lead a solver to an impasse and to organize and integrate problem information also greatly facilitate arriving at correct solutions. PMID:25324804

  13. Linking attentional processes and conceptual problem solving: visual cues facilitate the automaticity of extracting relevant information from diagrams.

    PubMed

    Rouinfar, Amy; Agra, Elise; Larson, Adam M; Rebello, N Sanjay; Loschky, Lester C

    2014-01-01

    This study investigated links between visual attention processes and conceptual problem solving. This was done by overlaying visual cues on conceptual physics problem diagrams to direct participants' attention to relevant areas to facilitate problem solving. Participants (N = 80) individually worked through four problem sets, each containing a diagram, while their eye movements were recorded. Each diagram contained regions that were relevant to solving the problem correctly and separate regions related to common incorrect responses. Problem sets contained an initial problem, six isomorphic training problems, and a transfer problem. The cued condition saw visual cues overlaid on the training problems. Participants' verbal responses were used to determine their accuracy. This study produced two major findings. First, short duration visual cues which draw attention to solution-relevant information and aid in the organizing and integrating of it, facilitate both immediate problem solving and generalization of that ability to new problems. Thus, visual cues can facilitate re-representing a problem and overcoming impasse, enabling a correct solution. Importantly, these cueing effects on problem solving did not involve the solvers' attention necessarily embodying the solution to the problem, but were instead caused by solvers attending to and integrating relevant information in the problems into a solution path. Second, this study demonstrates that when such cues are used across multiple problems, solvers can automatize the extraction of problem-relevant information extraction. These results suggest that low-level attentional selection processes provide a necessary gateway for relevant information to be used in problem solving, but are generally not sufficient for correct problem solving. Instead, factors that lead a solver to an impasse and to organize and integrate problem information also greatly facilitate arriving at correct solutions.

  14. EDGE COMPUTING AND CONTEXTUAL INFORMATION FOR THE INTERNET OF THINGS SENSORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, Levente

    Interpreting sensor data require knowledge about sensor placement and the surrounding environment. For a single sensor measurement, it is easy to document the context by visual observation, however for millions of sensors reporting data back to a server, the contextual information needs to be automatically extracted from either data analysis or leveraging complimentary data sources. Data layers that overlap spatially or temporally with sensor locations, can be used to extract the context and to validate the measurement. To minimize the amount of data transmitted through the internet, while preserving signal information content, two methods are explored; computation at the edgemore » and compressed sensing. We validate the above methods on wind and chemical sensor data (1) eliminate redundant measurement from wind sensors and (2) extract peak value of a chemical sensor measuring a methane plume. We present a general cloud based framework to validate sensor data based on statistical and physical modeling and contextual data extracted from geospatial data.« less

  15. Spatial-spectral preprocessing for endmember extraction on GPU's

    NASA Astrophysics Data System (ADS)

    Jimenez, Luis I.; Plaza, Javier; Plaza, Antonio; Li, Jun

    2016-10-01

    Spectral unmixing is focused in the identification of spectrally pure signatures, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Mainly focused on the spectral information contained in the hyperspectral images, endmember extraction techniques have recently included spatial information to achieve more accurate results. Several algorithms have been developed for automatic or semi-automatic identification of endmembers using spatial and spectral information, including the spectral-spatial endmember extraction (SSEE) where, within a preprocessing step in the technique, both sources of information are extracted from the hyperspectral image and equally used for this purpose. Previous works have implemented the SSEE technique in four main steps: 1) local eigenvectors calculation in each sub-region in which the original hyperspectral image is divided; 2) computation of the maxima and minima projection of all eigenvectors over the entire hyperspectral image in order to obtain a candidates pixels set; 3) expansion and averaging of the signatures of the candidate set; 4) ranking based on the spectral angle distance (SAD). The result of this method is a list of candidate signatures from which the endmembers can be extracted using various spectral-based techniques, such as orthogonal subspace projection (OSP), vertex component analysis (VCA) or N-FINDR. Considering the large volume of data and the complexity of the calculations, there is a need for efficient implementations. Latest- generation hardware accelerators such as commodity graphics processing units (GPUs) offer a good chance for improving the computational performance in this context. In this paper, we develop two different implementations of the SSEE algorithm using GPUs. Both are based on the eigenvectors computation within each sub-region of the first step, one using the singular value decomposition (SVD) and another one using principal component analysis (PCA). Based on our experiments with hyperspectral data sets, high computational performance is observed in both cases.

  16. A computationally efficient method for incorporating spike waveform information into decoding algorithms.

    PubMed

    Ventura, Valérie; Todorova, Sonia

    2015-05-01

    Spike-based brain-computer interfaces (BCIs) have the potential to restore motor ability to people with paralysis and amputation, and have shown impressive performance in the lab. To transition BCI devices from the lab to the clinic, decoding must proceed automatically and in real time, which prohibits the use of algorithms that are computationally intensive or require manual tweaking. A common choice is to avoid spike sorting and treat the signal on each electrode as if it came from a single neuron, which is fast, easy, and therefore desirable for clinical use. But this approach ignores the kinematic information provided by individual neurons recorded on the same electrode. The contribution of this letter is a linear decoding model that extracts kinematic information from individual neurons without spike-sorting the electrode signals. The method relies on modeling sample averages of waveform features as functions of kinematics, which is automatic and requires minimal data storage and computation. In offline reconstruction of arm trajectories of a nonhuman primate performing reaching tasks, the proposed method performs as well as decoders based on expertly manually and automatically sorted spikes.

  17. A new patent-based approach for technology mapping in the pharmaceutical domain.

    PubMed

    Russo, Davide; Montecchi, Tiziano; Carrara, Paolo

    2013-09-01

    The key factor in decision-making is the quality of information collected and processed in the problem analysis. In most cases, patents represent a very important source of information. The main problem is how to extract such information from the huge corpus of documents with a high recall and precision, and in a short time. This article demonstrates a patent search and classification method, called Knowledge Organizing Module, which consists of creating, almost automatically, a pool of patents based on polysemy expansion and homonymy disambiguation. Since the pool is done, an automatic patent technology landscaping is provided for fixing the state of the art of our product, and exploring competing alternative treatments and/or possible technological opportunities. An exemplary case study is provided, it deals with a patent analysis in the field of verruca treatments.

  18. Impact of JPEG2000 compression on spatial-spectral endmember extraction from hyperspectral data

    NASA Astrophysics Data System (ADS)

    Martín, Gabriel; Ruiz, V. G.; Plaza, Antonio; Ortiz, Juan P.; García, Inmaculada

    2009-08-01

    Hyperspectral image compression has received considerable interest in recent years. However, an important issue that has not been investigated in the past is the impact of lossy compression on spectral mixture analysis applications, which characterize mixed pixels in terms of a suitable combination of spectrally pure spectral substances (called endmembers) weighted by their estimated fractional abundances. In this paper, we specifically investigate the impact of JPEG2000 compression of hyperspectral images on the quality of the endmembers extracted by algorithms that incorporate both the spectral and the spatial information (useful for incorporating contextual information in the spectral endmember search). The two considered algorithms are the automatic morphological endmember extraction (AMEE) and the spatial spectral endmember extraction (SSEE) techniques. Experimental results are conducted using a well-known data set collected by AVIRIS over the Cuprite mining district in Nevada and with detailed ground-truth information available from U. S. Geological Survey. Our experiments reveal some interesting findings that may be useful to specialists applying spatial-spectral endmember extraction algorithms to compressed hyperspectral imagery.

  19. Sugarcane Crop Extraction Using Object-Oriented Method from ZY-3 High Resolution Satellite Tlc Image

    NASA Astrophysics Data System (ADS)

    Luo, H.; Ling, Z. Y.; Shao, G. Z.; Huang, Y.; He, Y. Q.; Ning, W. Y.; Zhong, Z.

    2018-04-01

    Sugarcane is one of the most important crops in Guangxi, China. As the development of satellite remote sensing technology, more remotely sensed images can be used for monitoring sugarcane crop. With the help of Three Line Camera (TLC) images, wide coverage and stereoscopic mapping ability, Chinese ZY-3 high resolution stereoscopic mapping satellite is useful in attaining more information for sugarcane crop monitoring, such as spectral, shape, texture difference between forward, nadir and backward images. Digital surface model (DSM) derived from ZY-3 TLC images are also able to provide height information for sugarcane crop. In this study, we make attempt to extract sugarcane crop from ZY-3 images, which are acquired in harvest period. Ortho-rectified TLC images, fused image, DSM are processed for our extraction. Then Object-oriented method is used in image segmentation, example collection, and feature extraction. The results of our study show that with the help of ZY-3 TLC image, the information of sugarcane crop in harvest time can be automatic extracted, with an overall accuracy of about 85.3 %.

  20. Towards an intelligent framework for multimodal affective data analysis.

    PubMed

    Poria, Soujanya; Cambria, Erik; Hussain, Amir; Huang, Guang-Bin

    2015-03-01

    An increasingly large amount of multimodal content is posted on social media websites such as YouTube and Facebook everyday. In order to cope with the growth of such so much multimodal data, there is an urgent need to develop an intelligent multi-modal analysis framework that can effectively extract information from multiple modalities. In this paper, we propose a novel multimodal information extraction agent, which infers and aggregates the semantic and affective information associated with user-generated multimodal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction. In particular, the developed intelligent agent adopts an ensemble feature extraction approach by exploiting the joint use of tri-modal (text, audio and video) features to enhance the multimodal information extraction process. In preliminary experiments using the eNTERFACE dataset, our proposed multi-modal system is shown to achieve an accuracy of 87.95%, outperforming the best state-of-the-art system by more than 10%, or in relative terms, a 56% reduction in error rate. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Automatic seizure detection based on the combination of newborn multi-channel EEG and HRV information

    NASA Astrophysics Data System (ADS)

    Mesbah, Mostefa; Balakrishnan, Malarvili; Colditz, Paul B.; Boashash, Boualem

    2012-12-01

    This article proposes a new method for newborn seizure detection that uses information extracted from both multi-channel electroencephalogram (EEG) and a single channel electrocardiogram (ECG). The aim of the study is to assess whether additional information extracted from ECG can improve the performance of seizure detectors based solely on EEG. Two different approaches were used to combine this extracted information. The first approach, known as feature fusion, involves combining features extracted from EEG and heart rate variability (HRV) into a single feature vector prior to feeding it to a classifier. The second approach, called classifier or decision fusion, is achieved by combining the independent decisions of the EEG and the HRV-based classifiers. Tested on recordings obtained from eight newborns with identified EEG seizures, the proposed neonatal seizure detection algorithms achieved 95.20% sensitivity and 88.60% specificity for the feature fusion case and 95.20% sensitivity and 94.30% specificity for the classifier fusion case. These results are considerably better than those involving classifiers using EEG only (80.90%, 86.50%) or HRV only (85.70%, 84.60%).

  2. Rectification of elemental image set and extraction of lens lattice by projective image transformation in integral imaging.

    PubMed

    Hong, Keehoon; Hong, Jisoo; Jung, Jae-Hyun; Park, Jae-Hyeung; Lee, Byoungho

    2010-05-24

    We propose a new method for rectifying a geometrical distortion in the elemental image set and extracting an accurate lens lattice lines by projective image transformation. The information of distortion in the acquired elemental image set is found by Hough transform algorithm. With this initial information of distortions, the acquired elemental image set is rectified automatically without the prior knowledge on the characteristics of pickup system by stratified image transformation procedure. Computer-generated elemental image sets with distortion on purpose are used for verifying the proposed rectification method. Experimentally-captured elemental image sets are optically reconstructed before and after the rectification by the proposed method. The experimental results support the validity of the proposed method with high accuracy of image rectification and lattice extraction.

  3. Integration of forward-looking infrared (FLIR) and traffic information for moving obstacle detection with integrity

    NASA Astrophysics Data System (ADS)

    Zhu, Zhen; Vana, Sudha; Bhattacharya, Sumit; Uijt de Haag, Maarten

    2009-05-01

    This paper discusses the integration of Forward-looking Infrared (FLIR) and traffic information from, for example, the Automatic Dependent Surveillance - Broadcast (ADS-B) or the Traffic Information Service-Broadcast (TIS-B). The goal of this integration method is to obtain an improved state estimate of a moving obstacle within the Field-of-View of the FLIR with added integrity. The focus of the paper will be on the approach phase of the flight. The paper will address methods to extract moving objects from the FLIR imagery and geo-reference these objects using outputs of both the onboard Global Positioning System (GPS) and the Inertial Navigation System (INS). The proposed extraction method uses a priori airport information and terrain databases. Furthermore, state information from the traffic information sources will be extracted and integrated with the state estimates from the FLIR. Finally, a method will be addressed that performs a consistency check between both sources of traffic information. The methods discussed in this paper will be evaluated using flight test data collected with a Gulfstream V in Reno, NV (GVSITE) and simulated ADS-B.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, Levente

    Interpreting sensor data require knowledge about sensor placement and the surrounding environment. For a single sensor measurement, it is easy to document the context by visual observation, however for millions of sensors reporting data back to a server, the contextual information needs to be automatically extracted from either data analysis or leveraging complimentary data sources. Data layers that overlap spatially or temporally with sensor locations, can be used to extract the context and to validate the measurement. To minimize the amount of data transmitted through the internet, while preserving signal information content, two methods are explored; computation at the edgemore » and compressed sensing. We validate the above methods on wind and chemical sensor data (1) eliminate redundant measurement from wind sensors and (2) extract peak value of a chemical sensor measuring a methane plume. We present a general cloud based framework to validate sensor data based on statistical and physical modeling and contextual data extracted from geospatial data.« less

  5. ECO: A Framework for Entity Co-Occurrence Exploration with Faceted Navigation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Halliday, K. D.

    2010-08-20

    Even as highly structured databases and semantic knowledge bases become more prevalent, a substantial amount of human knowledge is reported as written prose. Typical textual reports, such as news articles, contain information about entities (people, organizations, and locations) and their relationships. Automatically extracting such relationships from large text corpora is a key component of corporate and government knowledge bases. The primary goal of the ECO project is to develop a scalable framework for extracting and presenting these relationships for exploration using an easily navigable faceted user interface. ECO uses entity co-occurrence relationships to identify related entities. The system aggregates andmore » indexes information on each entity pair, allowing the user to rapidly discover and mine relational information.« less

  6. Integrated Functional and Executional Modelling of Software Using Web-Based Databases

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Marietta, Roberta

    1998-01-01

    NASA's software subsystems undergo extensive modification and updates over the operational lifetimes. It is imperative that modified software should satisfy safety goals. This report discusses the difficulties encountered in doing so and discusses a solution based on integrated modelling of software, use of automatic information extraction tools, web technology and databases.

  7. Using a User-Interactive QA System for Personalized E-Learning

    ERIC Educational Resources Information Center

    Hu, Dawei; Chen, Wei; Zeng, Qingtian; Hao, Tianyong; Min, Feng; Wenyin, Liu

    2008-01-01

    A personalized e-learning framework based on a user-interactive question-answering (QA) system is proposed, in which a user-modeling approach is used to capture personal information of students and a personalized answer extraction algorithm is proposed for personalized automatic answering. In our approach, a topic ontology (or concept hierarchy)…

  8. Automatic Detection of Learning Styles for an E-Learning System

    ERIC Educational Resources Information Center

    Ozpolat, Ebru; Akar, Gozde B.

    2009-01-01

    A desirable characteristic for an e-learning system is to provide the learner the most appropriate information based on his requirements and preferences. This can be achieved by capturing and utilizing the learner model. Learner models can be extracted based on personality factors like learning styles, behavioral factors like user's browsing…

  9. What Are Data? Museum Data Bank Research Report Number 1.

    ERIC Educational Resources Information Center

    Vance, David

    This paper describes the process of automatic extraction of implicit--global--data from explicit information by file inversion and threading. Each datum is the symbolic representation of a proposition, and as such has a number of movable parts corresponding to the ideal elements of the proposition represented; e.g., subject, predicate. A third…

  10. On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

    PubMed

    Oronoz, Maite; Gojenola, Koldo; Pérez, Alicia; de Ilarraza, Arantza Díaz; Casillas, Arantza

    2015-08-01

    The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Delineation and geometric modeling of road networks

    NASA Astrophysics Data System (ADS)

    Poullis, Charalambos; You, Suya

    In this work we present a novel vision-based system for automatic detection and extraction of complex road networks from various sensor resources such as aerial photographs, satellite images, and LiDAR. Uniquely, the proposed system is an integrated solution that merges the power of perceptual grouping theory (Gabor filtering, tensor voting) and optimized segmentation techniques (global optimization using graph-cuts) into a unified framework to address the challenging problems of geospatial feature detection and classification. Firstly, the local precision of the Gabor filters is combined with the global context of the tensor voting to produce accurate classification of the geospatial features. In addition, the tensorial representation used for the encoding of the data eliminates the need for any thresholds, therefore removing any data dependencies. Secondly, a novel orientation-based segmentation is presented which incorporates the classification of the perceptual grouping, and results in segmentations with better defined boundaries and continuous linear segments. Finally, a set of gaussian-based filters are applied to automatically extract centerline information (magnitude, width and orientation). This information is then used for creating road segments and transforming them to their polygonal representations.

  12. Automatic drawing for traffic marking with MMS LIDAR intensity

    NASA Astrophysics Data System (ADS)

    Takahashi, G.; Takeda, H.; Shimano, Y.

    2014-05-01

    Upgrading the database of CYBER JAPAN has been strategically promoted because the "Basic Act on Promotion of Utilization of Geographical Information", was enacted in May 2007. In particular, there is a high demand for road information that comprises a framework in this database. Therefore, road inventory mapping work has to be accurate and eliminate variation caused by individual human operators. Further, the large number of traffic markings that are periodically maintained and possibly changed require an efficient method for updating spatial data. Currently, we apply manual photogrammetry drawing for mapping traffic markings. However, this method is not sufficiently efficient in terms of the required productivity, and data variation can arise from individual operators. In contrast, Mobile Mapping Systems (MMS) and high-density Laser Imaging Detection and Ranging (LIDAR) scanners are rapidly gaining popularity. The aim in this study is to build an efficient method for automatically drawing traffic markings using MMS LIDAR data. The key idea in this method is extracting lines using a Hough transform strategically focused on changes in local reflection intensity along scan lines. However, also note that this method processes every traffic marking. In this paper, we discuss a highly accurate and non-human-operator-dependent method that applies the following steps: (1) Binarizing LIDAR points by intensity and extracting higher intensity points; (2) Generating a Triangulated Irregular Network (TIN) from higher intensity points; (3) Deleting arcs by length and generating outline polygons on the TIN; (4) Generating buffers from the outline polygons; (5) Extracting points from the buffers using the original LIDAR points; (6) Extracting local-intensity-changing points along scan lines using the extracted points; (7) Extracting lines from intensity-changing points through a Hough transform; and (8) Connecting lines to generate automated traffic marking mapping data.

  13. Automatic Recognition of Indoor Navigation Elements from Kinect Point Clouds

    NASA Astrophysics Data System (ADS)

    Zeng, L.; Kang, Z.

    2017-09-01

    This paper realizes automatically the navigating elements defined by indoorGML data standard - door, stairway and wall. The data used is indoor 3D point cloud collected by Kinect v2 launched in 2011 through the means of ORB-SLAM. By contrast, it is cheaper and more convenient than lidar, but the point clouds also have the problem of noise, registration error and large data volume. Hence, we adopt a shape descriptor - histogram of distances between two randomly chosen points, proposed by Osada and merges with other descriptor - in conjunction with random forest classifier to recognize the navigation elements (door, stairway and wall) from Kinect point clouds. This research acquires navigation elements and their 3-d location information from each single data frame through segmentation of point clouds, boundary extraction, feature calculation and classification. Finally, this paper utilizes the acquired navigation elements and their information to generate the state data of the indoor navigation module automatically. The experimental results demonstrate a high recognition accuracy of the proposed method.

  14. ExaCT: automatic extraction of clinical trial characteristics from journal publications

    PubMed Central

    2010-01-01

    Background Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). Methods ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. Results We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. Conclusions Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols). PMID:20920176

  15. Automatic 3D Extraction of Buildings, Vegetation and Roads from LIDAR Data

    NASA Astrophysics Data System (ADS)

    Bellakaout, A.; Cherkaoui, M.; Ettarid, M.; Touzani, A.

    2016-06-01

    Aerial topographic surveys using Light Detection and Ranging (LiDAR) technology collect dense and accurate information from the surface or terrain; it is becoming one of the important tools in the geosciences for studying objects and earth surface. Classification of Lidar data for extracting ground, vegetation, and buildings is a very important step needed in numerous applications such as 3D city modelling, extraction of different derived data for geographical information systems (GIS), mapping, navigation, etc... Regardless of what the scan data will be used for, an automatic process is greatly required to handle the large amount of data collected because the manual process is time consuming and very expensive. This paper is presenting an approach for automatic classification of aerial Lidar data into five groups of items: buildings, trees, roads, linear object and soil using single return Lidar and processing the point cloud without generating DEM. Topological relationship and height variation analysis is adopted to segment, preliminary, the entire point cloud preliminarily into upper and lower contours, uniform and non-uniform surface, non-uniform surfaces, linear objects, and others. This primary classification is used on the one hand to know the upper and lower part of each building in an urban scene, needed to model buildings façades; and on the other hand to extract point cloud of uniform surfaces which contain roofs, roads and ground used in the second phase of classification. A second algorithm is developed to segment the uniform surface into buildings roofs, roads and ground, the second phase of classification based on the topological relationship and height variation analysis, The proposed approach has been tested using two areas : the first is a housing complex and the second is a primary school. The proposed approach led to successful classification results of buildings, vegetation and road classes.

  16. Chemical name extraction based on automatic training data generation and rich feature set.

    PubMed

    Yan, Su; Spangler, W Scott; Chen, Ying

    2013-01-01

    The automation of extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable and good quality data to train a reliable entity extraction model. Another difficulty is the selection of informative features of chemical names, since comprehensive domain knowledge on chemistry nomenclature is required. Leveraging random text generation techniques, we explore the idea of automatically creating training sets for the task of chemical name extraction. Assuming the availability of an incomplete list of chemical names, called a dictionary, we are able to generate well-controlled, random, yet realistic chemical-like training documents. We statistically analyze the construction of chemical names based on the incomplete dictionary, and propose a series of new features, without relying on any domain knowledge. Compared to state-of-the-art models learned from manually labeled data and domain knowledge, our solution shows better or comparable results in annotating real-world data with less human effort. Moreover, we report an interesting observation about the language for chemical names. That is, both the structural and semantic components of chemical names follow a Zipfian distribution, which resembles many natural languages.

  17. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

    PubMed Central

    Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John

    2008-01-01

    Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948

  18. Extraction, integration and analysis of alternative splicing and protein structure distributed information

    PubMed Central

    D'Antonio, Matteo; Masseroli, Marco

    2009-01-01

    Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075

  19. Knowledge representation and management: transforming textual information into useful knowledge.

    PubMed

    Rassinoux, A-M

    2010-01-01

    To summarize current outstanding research in the field of knowledge representation and management. Synopsis of the articles selected for the IMIA Yearbook 2010. Four interesting papers, dealing with structured knowledge, have been selected for the section knowledge representation and management. Combining the newest techniques in computational linguistics and natural language processing with the latest methods in statistical data analysis, machine learning and text mining has proved to be efficient for turning unstructured textual information into meaningful knowledge. Three of the four selected papers for the section knowledge representation and management corroborate this approach and depict various experiments conducted to .extract meaningful knowledge from unstructured free texts such as extracting cancer disease characteristics from pathology reports, or extracting protein-protein interactions from biomedical papers, as well as extracting knowledge for the support of hypothesis generation in molecular biology from the Medline literature. Finally, the last paper addresses the level of formally representing and structuring information within clinical terminologies in order to render such information easily available and shareable among the health informatics community. Delivering common powerful tools able to automatically extract meaningful information from the huge amount of electronically unstructured free texts is an essential step towards promoting sharing and reusability across applications, domains, and institutions thus contributing to building capacities worldwide.

  20. Beyond Information Retrieval—Medical Question Answering

    PubMed Central

    Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong

    2006-01-01

    Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385

  1. Automatic extraction of relations between medical concepts in clinical texts

    PubMed Central

    Harabagiu, Sanda; Roberts, Kirk

    2011-01-01

    Objective A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records. Materials and methods A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier. Results The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7. Discussion Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction. Conclusion Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available. PMID:21846787

  2. Extracting Loop Bounds for WCET Analysis Using the Instrumentation Point Graph

    NASA Astrophysics Data System (ADS)

    Betts, A.; Bernat, G.

    2009-05-01

    Every calculation engine proposed in the literature of Worst-Case Execution Time (WCET) analysis requires upper bounds on loop iterations. Existing mechanisms to procure this information are either error prone, because they are gathered from the end-user, or limited in scope, because automatic analyses target very specific loop structures. In this paper, we present a technique that obtains bounds completely automatically for arbitrary loop structures. In particular, we show how to employ the Instrumentation Point Graph (IPG) to parse traces of execution (generated by an instrumented program) in order to extract bounds relative to any loop-nesting level. With this technique, therefore, non-rectangular dependencies between loops can be captured, allowing more accurate WCET estimates to be calculated. We demonstrate the improvement in accuracy by comparing WCET estimates computed through our HMB framework against those computed with state-of-the-art techniques.

  3. EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal.

    PubMed

    Baker, Ed

    2013-01-01

    Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping.

  4. Enhancing acronym/abbreviation knowledge bases with semantic information.

    PubMed

    Torii, Manabu; Liu, Hongfang

    2007-10-11

    In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus.

  5. EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

    PubMed Central

    2013-01-01

    Abstract Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping. PMID:24723768

  6. Relation extraction for biological pathway construction using node2vec.

    PubMed

    Kim, Munui; Baek, Seung Han; Song, Min

    2018-06-13

    Systems biology is an important field for understanding whole biological mechanisms composed of interactions between biological components. One approach for understanding complex and diverse mechanisms is to analyze biological pathways. However, because these pathways consist of important interactions and information on these interactions is disseminated in a large number of biomedical reports, text-mining techniques are essential for extracting these relationships automatically. In this study, we applied node2vec, an algorithmic framework for feature learning in networks, for relationship extraction. To this end, we extracted genes from paper abstracts using pkde4j, a text-mining tool for detecting entities and relationships. Using the extracted genes, a co-occurrence network was constructed and node2vec was used with the network to generate a latent representation. To demonstrate the efficacy of node2vec in extracting relationships between genes, performance was evaluated for gene-gene interactions involved in a type 2 diabetes pathway. Moreover, we compared the results of node2vec to those of baseline methods such as co-occurrence and DeepWalk. Node2vec outperformed existing methods in detecting relationships in the type 2 diabetes pathway, demonstrating that this method is appropriate for capturing the relatedness between pairs of biological entities involved in biological pathways. The results demonstrated that node2vec is useful for automatic pathway construction.

  7. Automatic Detection and Recognition of Craters Based on the Spectral Features of Lunar Rocks and Minerals

    NASA Astrophysics Data System (ADS)

    Ye, L.; Xu, X.; Luan, D.; Jiang, W.; Kang, Z.

    2017-07-01

    Crater-detection approaches can be divided into four categories: manual recognition, shape-profile fitting algorithms, machine-learning methods and geological information-based analysis using terrain and spectral data. The mainstream method is Shape-profile fitting algorithms. Many scholars throughout the world use the illumination gradient information to fit standard circles by least square method. Although this method has achieved good results, it is difficult to identify the craters with poor "visibility", complex structure and composition. Moreover, the accuracy of recognition is difficult to be improved due to the multiple solutions and noise interference. Aiming at the problem, we propose a method for the automatic extraction of impact craters based on spectral characteristics of the moon rocks and minerals: 1) Under the condition of sunlight, the impact craters are extracted from MI by condition matching and the positions as well as diameters of the craters are obtained. 2) Regolith is spilled while lunar is impacted and one of the elements of lunar regolith is iron. Therefore, incorrectly extracted impact craters can be removed by judging whether the crater contains "non iron" element. 3) Craters which are extracted correctly, are divided into two types: simple type and complex type according to their diameters. 4) Get the information of titanium and match the titanium distribution of the complex craters with normal distribution curve, then calculate the goodness of fit and set the threshold. The complex craters can be divided into two types: normal distribution curve type of titanium and non normal distribution curve type of titanium. We validated our proposed method with MI acquired by SELENE. Experimental results demonstrate that the proposed method has good performance in the test area.

  8. Automatical and accurate segmentation of cerebral tissues in fMRI dataset with combination of image processing and deep learning

    NASA Astrophysics Data System (ADS)

    Kong, Zhenglun; Luo, Junyi; Xu, Shengpu; Li, Ting

    2018-02-01

    Image segmentation plays an important role in medical science. One application is multimodality imaging, especially the fusion of structural imaging with functional imaging, which includes CT, MRI and new types of imaging technology such as optical imaging to obtain functional images. The fusion process require precisely extracted structural information, in order to register the image to it. Here we used image enhancement, morphometry methods to extract the accurate contours of different tissues such as skull, cerebrospinal fluid (CSF), grey matter (GM) and white matter (WM) on 5 fMRI head image datasets. Then we utilized convolutional neural network to realize automatic segmentation of images in deep learning way. Such approach greatly reduced the processing time compared to manual and semi-automatic segmentation and is of great importance in improving speed and accuracy as more and more samples being learned. The contours of the borders of different tissues on all images were accurately extracted and 3D visualized. This can be used in low-level light therapy and optical simulation software such as MCVM. We obtained a precise three-dimensional distribution of brain, which offered doctors and researchers quantitative volume data and detailed morphological characterization for personal precise medicine of Cerebral atrophy/expansion. We hope this technique can bring convenience to visualization medical and personalized medicine.

  9. Extracting knowledge from the World Wide Web

    PubMed Central

    Henzinger, Monika; Lawrence, Steve

    2004-01-01

    The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, such as the distribution of web pages over domains, the distribution of interest in different areas, communities related to different topics, the nature of competition in different categories of sites, and the degree of communication between different communities or countries. PMID:14745041

  10. Mutual information-based facial expression recognition

    NASA Astrophysics Data System (ADS)

    Hazar, Mliki; Hammami, Mohamed; Hanêne, Ben-Abdallah

    2013-12-01

    This paper introduces a novel low-computation discriminative regions representation for expression analysis task. The proposed approach relies on interesting studies in psychology which show that most of the descriptive and responsible regions for facial expression are located around some face parts. The contributions of this work lie in the proposition of new approach which supports automatic facial expression recognition based on automatic regions selection. The regions selection step aims to select the descriptive regions responsible or facial expression and was performed using Mutual Information (MI) technique. For facial feature extraction, we have applied Local Binary Patterns Pattern (LBP) on Gradient image to encode salient micro-patterns of facial expressions. Experimental studies have shown that using discriminative regions provide better results than using the whole face regions whilst reducing features vector dimension.

  11. Advanced Information Assurance Handbook

    DTIC Science & Technology

    2004-03-01

    and extraction of stored or recorded digital /electronic evidence (from the Latin root “ forensis ,” relating to the forum/legal business). 4.3...up over 100 MB of memory ) and also open the door to potential security incidents. Services can be set to start automatically upon boot or they can... digital certificates, thereby increasing the reliability of the authentication considerably. 12

  12. Automatic Extraction of High-Resolution Rainfall Series from Rainfall Strip Charts

    NASA Astrophysics Data System (ADS)

    Saa-Requejo, Antonio; Valencia, Jose Luis; Garrido, Alberto; Tarquis, Ana M.

    2015-04-01

    Soil erosion is a complex phenomenon involving the detachment and transport of soil particles, storage and runoff of rainwater, and infiltration. The relative magnitude and importance of these processes depends on a host of factors, including climate, soil, topography, cropping and land management practices among others. Most models for soil erosion or hydrological processes need an accurate storm characterization. However, this data are not always available and in some cases indirect models are generated to fill this gap. In Spain, the rain intensity data known for time periods less than 24 hours back to 1924 and many studies are limited by it. In many cases this data is stored in rainfall strip charts in the meteorological stations but haven't been transfer in a numerical form. To overcome this deficiency in the raw data a process of information extraction from large amounts of rainfall strip charts is implemented by means of computer software. The method has been developed that largely automates the intensive-labour extraction work based on van Piggelen et al. (2011). The method consists of the following five basic steps: 1) scanning the charts to high-resolution digital images, 2) manually and visually registering relevant meta information from charts and pre-processing, 3) applying automatic curve extraction software in a batch process to determine the coordinates of cumulative rainfall lines on the images (main step), 4) post processing the curves that were not correctly determined in step 3, and 5) aggregating the cumulative rainfall in pixel coordinates to the desired time resolution. A colour detection procedure is introduced that automatically separates the background of the charts and rolls from the grid and subsequently the rainfall curve. The rainfall curve is detected by minimization of a cost function. Some utilities have been added to improve the previous work and automates some auxiliary processes: readjust the bands properly, merge bands when those have been scanned in two parts, detect and cut the borders of bands not used (demanded by the software). Also some variations in which colour system is tried basing in HUE or RGB colour have been included. Thanks to apply this digitization rainfall strip charts 209 station-years of three locations in the centre of Spain have been transformed to long-term rainfall time series with 5-min resolution. References van Piggelen, H.E., T. Brandsma, H. Manders, and J. F. Lichtenauer, 2011: Automatic Curve Extraction for Digitizing Rainfall Strip Charts. J. Atmos. Oceanic Technol., 28, 891-906. Acknowledgements Financial support for this research by DURERO Project (Env.C1.3913442) is greatly appreciated.

  13. A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    PubMed Central

    2011-01-01

    Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy. PMID:22208882

  14. Sieve-based relation extraction of gene regulatory networks from biological literature

    PubMed Central

    2015-01-01

    Background Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. Results We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Conclusions Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. PMID:26551454

  15. Sieve-based relation extraction of gene regulatory networks from biological literature.

    PubMed

    Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko

    2015-01-01

    Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

  16. Extracting 5m Shorelines From Multi-Temporal Images

    NASA Astrophysics Data System (ADS)

    Kapadia, A.; Jordahl, K. A.; Kington, J. D., IV

    2016-12-01

    Planet operates the largest Earth observing constellation of satellites, collecting imagery at an unprecedented temporal resolution. While daily cadence is expected in early 2017, Planet has already imaged the majority of landmass several dozen times over the past year. The current dataset provides enough value to build and test algorithms to automatically extract information. Here we demonstrate the extraction of shorelines across California using image stacks. The method implemented uses as input an uncalibrated RGB data product and limited NIR combined with the National Land Cover Database 2011 (NLCD2011) and Shuttle Radar Topography Mission (SRTM) to extract shorelines at 5 meter resolution. In the near future these methods along with daily cadence of imagery will allow for temporal monitoring of shorelines on a global scale.

  17. Manual editing of automatically recorded data in an anesthesia information management system.

    PubMed

    Wax, David B; Beilin, Yaakov; Hossain, Sabera; Lin, Hung-Mo; Reich, David L

    2008-11-01

    Anesthesia information management systems allow automatic recording of physiologic and anesthetic data. The authors investigated the prevalence of such data modification in an academic medical center. The authors queried their anesthesia information management system database of anesthetics performed in 2006 and tabulated the counts of data points for automatically recorded physiologic and anesthetic parameters as well as the subset of those data that were manually invalidated by clinicians (both with and without alternate values manually appended). Patient, practitioner, data source, and timing characteristics of recorded values were also extracted to determine their associations with editing of various parameters in the anesthesia information management system record. A total of 29,491 cases were analyzed, 19% of which had one or more data points manually invalidated. Among 58 attending anesthesiologists, each invalidated data in a median of 7% of their cases when working as a sole practitioner. A minority of invalidated values were manually appended with alternate values. Pulse rate, blood pressure, and pulse oximetry were the most commonly invalidated parameters. Data invalidation usually resulted in a decrease in parameter variance. Factors independently associated with invalidation included extreme physiologic values, American Society of Anesthesiologists physical status classification, emergency status, timing (phase of the procedure/anesthetic), presence of an intraarterial catheter, resident or certified registered nurse anesthetist involvement, and procedure duration. Editing of physiologic data automatically recorded in an anesthesia information management system is a common practice and results in decreased variability of intraoperative data. Further investigation may clarify the reasons for and consequences of this behavior.

  18. Automatic vasculature identification in coronary angiograms by adaptive geometrical tracking.

    PubMed

    Xiao, Ruoxiu; Yang, Jian; Goyal, Mahima; Liu, Yue; Wang, Yongtian

    2013-01-01

    As the uneven distribution of contrast agents and the perspective projection principle of X-ray, the vasculatures in angiographic image are with low contrast and are generally superposed with other organic tissues; therefore, it is very difficult to identify the vasculature and quantitatively estimate the blood flow directly from angiographic images. In this paper, we propose a fully automatic algorithm named adaptive geometrical vessel tracking (AGVT) for coronary artery identification in X-ray angiograms. Initially, the ridge enhancement (RE) image is obtained utilizing multiscale Hessian information. Then, automatic initialization procedures including seed points detection, and initial directions determination are performed on the RE image. The extracted ridge points can be adjusted to the geometrical centerline points adaptively through diameter estimation. Bifurcations are identified by discriminating connecting relationship of the tracked ridge points. Finally, all the tracked centerlines are merged and smoothed by classifying the connecting components on the vascular structures. Synthetic angiographic images and clinical angiograms are used to evaluate the performance of the proposed algorithm. The proposed algorithm is compared with other two vascular tracking techniques in terms of the efficiency and accuracy, which demonstrate successful applications of the proposed segmentation and extraction scheme in vasculature identification.

  19. Automated retinal vessel type classification in color fundus images

    NASA Astrophysics Data System (ADS)

    Yu, H.; Barriga, S.; Agurto, C.; Nemeth, S.; Bauman, W.; Soliz, P.

    2013-02-01

    Automated retinal vessel type classification is an essential first step toward machine-based quantitative measurement of various vessel topological parameters and identifying vessel abnormalities and alternations in cardiovascular disease risk analysis. This paper presents a new and accurate automatic artery and vein classification method developed for arteriolar-to-venular width ratio (AVR) and artery and vein tortuosity measurements in regions of interest (ROI) of 1.5 and 2.5 optic disc diameters from the disc center, respectively. This method includes illumination normalization, automatic optic disc detection and retinal vessel segmentation, feature extraction, and a partial least squares (PLS) classification. Normalized multi-color information, color variation, and multi-scale morphological features are extracted on each vessel segment. We trained the algorithm on a set of 51 color fundus images using manually marked arteries and veins. We tested the proposed method in a previously unseen test data set consisting of 42 images. We obtained an area under the ROC curve (AUC) of 93.7% in the ROI of AVR measurement and 91.5% of AUC in the ROI of tortuosity measurement. The proposed AV classification method has the potential to assist automatic cardiovascular disease early detection and risk analysis.

  20. Point Cloud Classification of Tesserae from Terrestrial Laser Data Combined with Dense Image Matching for Archaeological Information Extraction

    NASA Astrophysics Data System (ADS)

    Poux, F.; Neuville, R.; Billen, R.

    2017-08-01

    Reasoning from information extraction given by point cloud data mining allows contextual adaptation and fast decision making. However, to achieve this perceptive level, a point cloud must be semantically rich, retaining relevant information for the end user. This paper presents an automatic knowledge-based method for pre-processing multi-sensory data and classifying a hybrid point cloud from both terrestrial laser scanning and dense image matching. Using 18 features including sensor's biased data, each tessera in the high-density point cloud from the 3D captured complex mosaics of Germigny-des-prés (France) is segmented via a colour multi-scale abstraction-based featuring extracting connectivity. A 2D surface and outline polygon of each tessera is generated by a RANSAC plane extraction and convex hull fitting. Knowledge is then used to classify every tesserae based on their size, surface, shape, material properties and their neighbour's class. The detection and semantic enrichment method shows promising results of 94% correct semantization, a first step toward the creation of an archaeological smart point cloud.

  1. Automatic Extraction of Road Markings from Mobile Laser Scanning Data

    NASA Astrophysics Data System (ADS)

    Ma, H.; Pei, Z.; Wei, Z.; Zhong, R.

    2017-09-01

    Road markings as critical feature in high-defination maps, which are Advanced Driver Assistance System (ADAS) and self-driving technology required, have important functions in providing guidance and information to moving cars. Mobile laser scanning (MLS) system is an effective way to obtain the 3D information of the road surface, including road markings, at highway speeds and at less than traditional survey costs. This paper presents a novel method to automatically extract road markings from MLS point clouds. Ground points are first filtered from raw input point clouds using neighborhood elevation consistency method. The basic assumption of the method is that the road surface is smooth. Points with small elevation-difference between neighborhood are considered to be ground points. Then ground points are partitioned into a set of profiles according to trajectory data. The intensity histogram of points in each profile is generated to find intensity jumps in certain threshold which inversely to laser distance. The separated points are used as seed points to region grow based on intensity so as to obtain road mark of integrity. We use the point cloud template-matching method to refine the road marking candidates via removing the noise clusters with low correlation coefficient. During experiment with a MLS point set of about 2 kilometres in a city center, our method provides a promising solution to the road markings extraction from MLS data.

  2. Towards the automated analysis and database development of defibrillator data from cardiac arrest.

    PubMed

    Eftestøl, Trygve; Sherman, Lawrence D

    2014-01-01

    During resuscitation of cardiac arrest victims a variety of information in electronic format is recorded as part of the documentation of the patient care contact and in order to be provided for case review for quality improvement. Such review requires considerable effort and resources. There is also the problem of interobserver effects. We show that it is possible to efficiently analyze resuscitation episodes automatically using a minimal set of the available information. A minimal set of variables is defined which describe therapeutic events (compression sequences and defibrillations) and corresponding patient response events (annotated rhythm transitions). From this a state sequence representation of the resuscitation episode is constructed and an algorithm is developed for reasoning with this representation and extract review variables automatically. As a case study, the method is applied to the data abstraction process used in the King County EMS. The automatically generated variables are compared to the original ones with accuracies ≥ 90% for 18 variables and ≥ 85% for the remaining four variables. It is possible to use the information present in the CPR process data recorded by the AED along with rhythm and chest compression annotations to automate the episode review.

  3. Automatic Extraction of Drug Adverse Effects from Product Characteristics (SPCs): A Text Versus Table Comparison.

    PubMed

    Lamy, Jean-Baptiste; Ugon, Adrien; Berthelot, Hélène

    2016-01-01

    Potential adverse effects (AEs) of drugs are described in their summary of product characteristics (SPCs), a textual document. Automatic extraction of AEs from SPCs is useful for detecting AEs and for building drug databases. However, this task is difficult because each AE is associated with a frequency that must be extracted and the presentation of AEs in SPCs is heterogeneous, consisting of plain text and tables in many different formats. We propose a taxonomy for the presentation of AEs in SPCs. We set up natural language processing (NLP) and table parsing methods for extracting AEs from texts and tables of any format, and evaluate them on 10 SPCs. Automatic extraction performed better on tables than on texts. Tables should be recommended for the presentation of the AEs section of the SPCs.

  4. Automatic detection of pelvic lymph nodes using multiple MR sequences

    NASA Astrophysics Data System (ADS)

    Yan, Michelle; Lu, Yue; Lu, Renzhi; Requardt, Martin; Moeller, Thomas; Takahashi, Satoru; Barentsz, Jelle

    2007-03-01

    A system for automatic detection of pelvic lymph nodes is developed by incorporating complementary information extracted from multiple MR sequences. A single MR sequence lacks sufficient diagnostic information for lymph node localization and staging. Correct diagnosis often requires input from multiple complementary sequences which makes manual detection of lymph nodes very labor intensive. Small lymph nodes are often missed even by highly-trained radiologists. The proposed system is aimed at assisting radiologists in finding lymph nodes faster and more accurately. To the best of our knowledge, this is the first such system reported in the literature. A 3-dimensional (3D) MR angiography (MRA) image is employed for extracting blood vessels that serve as a guide in searching for pelvic lymph nodes. Segmentation, shape and location analysis of potential lymph nodes are then performed using a high resolution 3D T1-weighted VIBE (T1-vibe) MR sequence acquired by Siemens 3T scanner. An optional contrast-agent enhanced MR image, such as post ferumoxtran-10 T2*-weighted MEDIC sequence, can also be incorporated to further improve detection accuracy of malignant nodes. The system outputs a list of potential lymph node locations that are overlaid onto the corresponding MR sequences and presents them to users with associated confidence levels as well as their sizes and lengths in each axis. Preliminary studies demonstrates the feasibility of automatic lymph node detection and scenarios in which this system may be used to assist radiologists in diagnosis and reporting.

  5. A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories.

    PubMed

    Yang, Wei; Ai, Tinghua; Lu, Wei

    2018-04-19

    Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality.

  6. A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories

    PubMed Central

    Yang, Wei

    2018-01-01

    Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality. PMID:29671792

  7. Two-dimensional thermal video analysis of offshore bird and bat flight

    DOE PAGES

    Matzner, Shari; Cullinan, Valerie I.; Duberstein, Corey A.

    2015-09-11

    Thermal infrared video can provide essential information about bird and bat presence and activity for risk assessment studies, but the analysis of recorded video can be time-consuming and may not extract all of the available information. Automated processing makes continuous monitoring over extended periods of time feasible, and maximizes the information provided by video. This is especially important for collecting data in remote locations that are difficult for human observers to access, such as proposed offshore wind turbine sites. We present guidelines for selecting an appropriate thermal camera based on environmental conditions and the physical characteristics of the target animals.more » We developed new video image processing algorithms that automate the extraction of bird and bat flight tracks from thermal video, and that characterize the extracted tracks to support animal identification and behavior inference. The algorithms use a video peak store process followed by background masking and perceptual grouping to extract flight tracks. The extracted tracks are automatically quantified in terms that could then be used to infer animal type and possibly behavior. The developed automated processing generates results that are reproducible and verifiable, and reduces the total amount of video data that must be retained and reviewed by human experts. Finally, we suggest models for interpreting thermal imaging information.« less

  8. Two-dimensional thermal video analysis of offshore bird and bat flight

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzner, Shari; Cullinan, Valerie I.; Duberstein, Corey A.

    Thermal infrared video can provide essential information about bird and bat presence and activity for risk assessment studies, but the analysis of recorded video can be time-consuming and may not extract all of the available information. Automated processing makes continuous monitoring over extended periods of time feasible, and maximizes the information provided by video. This is especially important for collecting data in remote locations that are difficult for human observers to access, such as proposed offshore wind turbine sites. We present guidelines for selecting an appropriate thermal camera based on environmental conditions and the physical characteristics of the target animals.more » We developed new video image processing algorithms that automate the extraction of bird and bat flight tracks from thermal video, and that characterize the extracted tracks to support animal identification and behavior inference. The algorithms use a video peak store process followed by background masking and perceptual grouping to extract flight tracks. The extracted tracks are automatically quantified in terms that could then be used to infer animal type and possibly behavior. The developed automated processing generates results that are reproducible and verifiable, and reduces the total amount of video data that must be retained and reviewed by human experts. Finally, we suggest models for interpreting thermal imaging information.« less

  9. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    PubMed

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems.

    PubMed

    Peng, Yifan; Torii, Manabu; Wu, Cathy H; Vijay-Shanker, K

    2014-08-23

    Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task. A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations. In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus.

  11. Information extraction for enhanced access to disease outbreak reports.

    PubMed

    Grishman, Ralph; Huttunen, Silja; Yangarber, Roman

    2002-08-01

    Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.

  12. Multiple kernel learning in protein-protein interaction extraction from biomedical literature.

    PubMed

    Yang, Zhihao; Tang, Nan; Zhang, Xiao; Lin, Hongfei; Li, Yanpeng; Yang, Zhiwei

    2011-03-01

    Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature. We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information. Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the AImed corpus. As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information. Copyright © 2010 Elsevier B.V. All rights reserved.

  13. Terminology extraction from medical texts in Polish

    PubMed Central

    2014-01-01

    Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts. PMID:24976943

  14. Terminology extraction from medical texts in Polish.

    PubMed

    Marciniak, Małgorzata; Mykowiecka, Agnieszka

    2014-01-01

    Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.

  15. Smart Extraction and Analysis System for Clinical Research.

    PubMed

    Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung

    2017-05-01

    With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.

  16. Automatic Requirements Specification Extraction from Natural Language (ARSENAL)

    DTIC Science & Technology

    2014-10-01

    designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise...communication of technical descriptions between the various stakeholders (e.g., customers, designers, imple- menters) involved in the design of software systems...the accuracy of the natural language processing stage, the degree of automation, and robustness to noise. 1 2 Introduction Software systems operate in

  17. Evaluating the Impact of Depth Cue Salience in Working Three-Dimensional Mental Rotation Tasks by Means of Psychometric Experiments

    ERIC Educational Resources Information Center

    Arendasy, Martin; Sommer, Markus; Hergovich, Andreas; Feldhammer, Martina

    2011-01-01

    The gender difference in three-dimensional mental rotation is well documented in the literature. In this article we combined automatic item generation, (quasi-)experimental research designs and item response theory models of change measurement to evaluate the effect of the ability to extract the depth information conveyed in the two-dimensional…

  18. Automatic 3D power line reconstruction of multi-angular imaging power line inspection system

    NASA Astrophysics Data System (ADS)

    Zhang, Wuming; Yan, Guangjian; Wang, Ning; Li, Qiaozhi; Zhao, Wei

    2007-06-01

    We develop a multi-angular imaging power line inspection system. Its main objective is to monitor the relative distance between high voltage power line and around objects, and alert if the warning threshold is exceeded. Our multi-angular imaging power line inspection system generates DSM of the power line passage, which comprises ground surface and ground objects, for example trees and houses, etc. For the purpose of revealing the dangerous regions, where ground objects are too close to the power line, 3D power line information should be extracted at the same time. In order to improve the automation level of extraction, reduce labour costs and human errors, an automatic 3D power line reconstruction method is proposed and implemented. It can be achieved by using epipolar constraint and prior knowledge of pole tower's height. After that, the proper 3D power line information can be obtained by space intersection using found homologous projections. The flight experiment result shows that the proposed method can successfully reconstruct 3D power line, and the measurement accuracy of the relative distance satisfies the user requirement of 0.5m.

  19. Context-based automated defect classification system using multiple morphological masks

    DOEpatents

    Gleason, Shaun S.; Hunt, Martin A.; Sari-Sarraf, Hamed

    2002-01-01

    Automatic detection of defects during the fabrication of semiconductor wafers is largely automated, but the classification of those defects is still performed manually by technicians. This invention includes novel digital image analysis techniques that generate unique feature vector descriptions of semiconductor defects as well as classifiers that use these descriptions to automatically categorize the defects into one of a set of pre-defined classes. Feature extraction techniques based on multiple-focus images, multiple-defect mask images, and segmented semiconductor wafer images are used to create unique feature-based descriptions of the semiconductor defects. These feature-based defect descriptions are subsequently classified by a defect classifier into categories that depend on defect characteristics and defect contextual information, that is, the semiconductor process layer(s) with which the defect comes in contact. At the heart of the system is a knowledge database that stores and distributes historical semiconductor wafer and defect data to guide the feature extraction and classification processes. In summary, this invention takes as its input a set of images containing semiconductor defect information, and generates as its output a classification for the defect that describes not only the defect itself, but also the location of that defect with respect to the semiconductor process layers.

  20. Derivation of groundwater flow-paths based on semi-automatic extraction of lineaments from remote sensing data

    NASA Astrophysics Data System (ADS)

    Mallast, U.; Gloaguen, R.; Geyer, S.; Rödiger, T.; Siebert, C.

    2011-08-01

    In this paper we present a semi-automatic method to infer groundwater flow-paths based on the extraction of lineaments from digital elevation models. This method is especially adequate in remote and inaccessible areas where in-situ data are scarce. The combined method of linear filtering and object-based classification provides a lineament map with a high degree of accuracy. Subsequently, lineaments are differentiated into geological and morphological lineaments using auxiliary information and finally evaluated in terms of hydro-geological significance. Using the example of the western catchment of the Dead Sea (Israel/Palestine), the orientation and location of the differentiated lineaments are compared to characteristics of known structural features. We demonstrate that a strong correlation between lineaments and structural features exists. Using Euclidean distances between lineaments and wells provides an assessment criterion to evaluate the hydraulic significance of detected lineaments. Based on this analysis, we suggest that the statistical analysis of lineaments allows a delineation of flow-paths and thus significant information on groundwater movements. To validate the flow-paths we compare them to existing results of groundwater models that are based on well data.

  1. Information extraction from Italian medical reports: An ontology-driven approach.

    PubMed

    Viani, Natalia; Larizza, Cristiana; Tibollo, Valentina; Napolitano, Carlo; Priori, Silvia G; Bellazzi, Riccardo; Sacchi, Lucia

    2018-03-01

    In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible. The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports. The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results. Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Automatic extraction of property norm-like data from large text corpora.

    PubMed

    Kelly, Colin; Devereux, Barry; Korhonen, Anna

    2014-01-01

    Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.

  3. Research of infrared laser based pavement imaging and crack detection

    NASA Astrophysics Data System (ADS)

    Hong, Hanyu; Wang, Shu; Zhang, Xiuhua; Jing, Genqiang

    2013-08-01

    Road crack detection is seriously affected by many factors in actual applications, such as some shadows, road signs, oil stains, high frequency noise and so on. Due to these factors, the current crack detection methods can not distinguish the cracks in complex scenes. In order to solve this problem, a novel method based on infrared laser pavement imaging is proposed. Firstly, single sensor laser pavement imaging system is adopted to obtain pavement images, high power laser line projector is well used to resist various shadows. Secondly, the crack extraction algorithm which has merged multiple features intelligently is proposed to extract crack information. In this step, the non-negative feature and contrast feature are used to extract the basic crack information, and circular projection based on linearity feature is applied to enhance the crack area and eliminate noise. A series of experiments have been performed to test the proposed method, which shows that the proposed automatic extraction method is effective and advanced.

  4. Identifying key hospital service quality factors in online health communities.

    PubMed

    Jung, Yuchul; Hur, Cinyoung; Jung, Dain; Kim, Minki

    2015-04-07

    The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. We defined social media-based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea's two biggest online portals were used to test the effectiveness of detection of social media-based key quality factors for hospitals. To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media-based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies.

  5. GDRMS: a system for automatic extraction of the disease-centre relation

    NASA Astrophysics Data System (ADS)

    Yang, Ronggen; Zhang, Yue; Gong, Lejun

    2012-01-01

    With the rapidly increasing of biomedical literature, the deluge of new articles is leading to information overload. Extracting the available knowledge from the huge amount of biomedical literature has become a major challenge. GDRMS is developed as a tool that extracts the relationship between disease and gene, gene and gene from biomedical literatures using text mining technology. It is a ruled-based system which also provides disease-centre network visualization, constructs the disease-gene database, and represents a gene engine for understanding the function of the gene. The main focus of GDRMS is to provide a valuable opportunity to explore the relationship between disease and gene for the research community about etiology of disease.

  6. Automatic three-dimensional rib centerline extraction from CT scans for enhanced visualization and anatomical context

    NASA Astrophysics Data System (ADS)

    Ramakrishnan, Sowmya; Alvino, Christopher; Grady, Leo; Kiraly, Atilla

    2011-03-01

    We present a complete automatic system to extract 3D centerlines of ribs from thoracic CT scans. Our rib centerline system determines the positional information for the rib cage consisting of extracted rib centerlines, spinal canal centerline, pairing and labeling of ribs. We show an application of this output to produce an enhanced visualization of the rib cage by the method of Kiraly et al., in which the ribs are digitally unfolded along their centerlines. The centerline extraction consists of three stages: (a) pre-trace processing for rib localization, (b) rib centerline tracing, and (c) post-trace processing to merge the rib traces. Then we classify ribs from non-ribs and determine anatomical rib labeling. Our novel centerline tracing technique uses the Random Walker algorithm to segment the structural boundary of the rib in successive 2D cross sections orthogonal to the longitudinal direction of the ribs. Then the rib centerline is progressively traced along the rib using a 3D Kalman filter. The rib centerline extraction framework was evaluated on 149 CT datasets with varying slice spacing, dose, and under a variety of reconstruction kernels. The results of the evaluation are presented. The extraction takes approximately 20 seconds on a modern radiology workstation and performs robustly even in the presence of partial volume effects or rib pathologies such as bone metastases or fractures, making the system suitable for assisting clinicians in expediting routine rib reading for oncology and trauma applications.

  7. A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature.

    PubMed

    Tchoua, Roselyne B; Chard, Kyle; Audus, Debra; Qin, Jian; de Pablo, Juan; Foster, Ian

    2016-01-01

    A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.

  8. Automatic Hidden-Web Table Interpretation by Sibling Page Comparison

    NASA Astrophysics Data System (ADS)

    Tao, Cui; Embley, David W.

    The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction and semi-structured data management. In this paper, we offer a conceptual modeling solution for the common special case in which so-called sibling pages are available. The sibling pages we consider are pages on the hidden web, commonly generated from underlying databases. We compare them to identify and connect nonvarying components (category labels) and varying components (data values). We tested our solution using more than 2,000 tables in source pages from three different domains—car advertisements, molecular biology, and geopolitical information. Experimental results show that the system can successfully identify sibling tables, generate structure patterns, interpret tables using the generated patterns, and automatically adjust the structure patterns, if necessary, as it processes a sequence of hidden-web pages. For these activities, the system was able to achieve an overall F-measure of 94.5%.

  9. On-line object feature extraction for multispectral scene representation

    NASA Technical Reports Server (NTRS)

    Ghassemian, Hassan; Landgrebe, David

    1988-01-01

    A new on-line unsupervised object-feature extraction method is presented that reduces the complexity and costs associated with the analysis of the multispectral image data and data transmission, storage, archival and distribution. The ambiguity in the object detection process can be reduced if the spatial dependencies, which exist among the adjacent pixels, are intelligently incorporated into the decision making process. The unity relation was defined that must exist among the pixels of an object. Automatic Multispectral Image Compaction Algorithm (AMICA) uses the within object pixel-feature gradient vector as a valuable contextual information to construct the object's features, which preserve the class separability information within the data. For on-line object extraction the path-hypothesis and the basic mathematical tools for its realization are introduced in terms of a specific similarity measure and adjacency relation. AMICA is applied to several sets of real image data, and the performance and reliability of features is evaluated.

  10. 2D Automatic body-fitted structured mesh generation using advancing extraction method

    USDA-ARS?s Scientific Manuscript database

    This paper presents an automatic mesh generation algorithm for body-fitted structured meshes in Computational Fluids Dynamics (CFD) analysis using the Advancing Extraction Method (AEM). The method is applicable to two-dimensional domains with complex geometries, which have the hierarchical tree-like...

  11. 2D automatic body-fitted structured mesh generation using advancing extraction method

    USDA-ARS?s Scientific Manuscript database

    This paper presents an automatic mesh generation algorithm for body-fitted structured meshes in Computational Fluids Dynamics (CFD) analysis using the Advancing Extraction Method (AEM). The method is applicable to two-dimensional domains with complex geometries, which have the hierarchical tree-like...

  12. Vessel extraction in retinal images using automatic thresholding and Gabor Wavelet.

    PubMed

    Ali, Aziah; Hussain, Aini; Wan Zaki, Wan Mimi Diyana

    2017-07-01

    Retinal image analysis has been widely used for early detection and diagnosis of multiple systemic diseases. Accurate vessel extraction in retinal image is a crucial step towards a fully automated diagnosis system. This work affords an efficient unsupervised method for extracting blood vessels from retinal images by combining existing Gabor Wavelet (GW) method with automatic thresholding. Green channel image is extracted from color retinal image and used to produce Gabor feature image using GW. Both green channel image and Gabor feature image undergo vessel-enhancement step in order to highlight blood vessels. Next, the two vessel-enhanced images are transformed to binary images using automatic thresholding before combined to produce the final vessel output. Combining the images results in significant improvement of blood vessel extraction performance compared to using individual image. Effectiveness of the proposed method was proven via comparative analysis with existing methods validated using publicly available database, DRIVE.

  13. Template-based automatic extraction of the joint space of foot bones from CT scan

    NASA Astrophysics Data System (ADS)

    Park, Eunbi; Kim, Taeho; Park, Jinah

    2016-03-01

    Clean bone segmentation is critical in studying the joint anatomy for measuring the spacing between the bones. However, separation of the coupled bones in CT images is sometimes difficult due to ambiguous gray values coming from the noise and the heterogeneity of bone materials as well as narrowing of the joint space. For fine reconstruction of the individual local boundaries, manual operation is a common practice where the segmentation remains to be a bottleneck. In this paper, we present an automatic method for extracting the joint space by applying graph cut on Markov random field model to the region of interest (ROI) which is identified by a template of 3D bone structures. The template includes encoded articular surface which identifies the tight region of the high-intensity bone boundaries together with the fuzzy joint area of interest. The localized shape information from the template model within the ROI effectively separates the bones nearby. By narrowing the ROI down to the region including two types of tissue, the object extraction problem was reduced to binary segmentation and solved via graph cut. Based on the shape of a joint space marked by the template, the hard constraint was set by the initial seeds which were automatically generated from thresholding and morphological operations. The performance and the robustness of the proposed method are evaluated on 12 volumes of ankle CT data, where each volume includes a set of 4 tarsal bones (calcaneus, talus, navicular and cuboid).

  14. Recognition techniques for extracting information from semistructured documents

    NASA Astrophysics Data System (ADS)

    Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna

    2000-12-01

    Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.

  15. Video indexing based on image and sound

    NASA Astrophysics Data System (ADS)

    Faudemay, Pascal; Montacie, Claude; Caraty, Marie-Jose

    1997-10-01

    Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel. We first present a multi-channel and multi-modal query interface, to query sound, image and script through 'pull' and 'push' queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech. We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation as experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a carton movie, 'Ivanhoe'. For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost architectures or networks of workstations.

  16. Enhancing biomedical text summarization using semantic relation extraction.

    PubMed

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  17. Automatic generation of Web mining environments

    NASA Astrophysics Data System (ADS)

    Cibelli, Maurizio; Costagliola, Gennaro

    1999-02-01

    The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.

  18. Automatic sentence extraction for the detection of scientific paper relations

    NASA Astrophysics Data System (ADS)

    Sibaroni, Y.; Prasetiyowati, S. S.; Miftachudin, M.

    2018-03-01

    The relations between scientific papers are very useful for researchers to see the interconnection between scientific papers quickly. By observing the inter-article relationships, researchers can identify, among others, the weaknesses of existing research, performance improvements achieved to date, and tools or data typically used in research in specific fields. So far, methods that have been developed to detect paper relations include machine learning and rule-based methods. However, a problem still arises in the process of sentence extraction from scientific paper documents, which is still done manually. This manual process causes the detection of scientific paper relations longer and inefficient. To overcome this problem, this study performs an automatic sentences extraction while the paper relations are identified based on the citation sentence. The performance of the built system is then compared with that of the manual extraction system. The analysis results suggested that the automatic sentence extraction indicates a very high level of performance in the detection of paper relations, which is close to that of manual sentence extraction.

  19. Highway extraction from high resolution aerial photography using a geometric active contour model

    NASA Astrophysics Data System (ADS)

    Niu, Xutong

    Highway extraction and vehicle detection are two of the most important steps in traffic-flow analysis from multi-frame aerial photographs. The traditional method of deriving traffic flow trajectories relies on manual vehicle counting from a sequence of aerial photographs, which is tedious and time-consuming. This research presents a new framework for semi-automatic highway extraction. The basis of the new framework is an improved geometric active contour (GAC) model. This novel model seeks to minimize an objective function that transforms a problem of propagation of regular curves into an optimization problem. The implementation of curve propagation is based on level set theory. By using an implicit representation of a two-dimensional curve, a level set approach can be used to deal with topological changes naturally, and the output is unaffected by different initial positions of the curve. However, the original GAC model, on which the new model is based, only incorporates boundary information into the curve propagation process. An error-producing phenomenon called leakage is inevitable wherever there is an uncertain weak edge. In this research, region-based information is added as a constraint into the original GAC model, thereby, giving this proposed method the ability of integrating both boundary and region-based information during the curve propagation. Adding the region-based constraint eliminates the leakage problem. This dissertation applies the proposed augmented GAC model to the problem of highway extraction from high-resolution aerial photography. First, an optimized stopping criterion is designed and used in the implementation of the GAC model. It effectively saves processing time and computations. Second, a seed point propagation framework is designed and implemented. This framework incorporates highway extraction, tracking, and linking into one procedure. A seed point is usually placed at an end node of highway segments close to the boundary of the image or at a position where possible blocking may occur, such as at an overpass bridge or near vehicle crowds. These seed points can be automatically propagated throughout the entire highway network. During the process, road center points are also extracted, which introduces a search direction for solving possible blocking problems. This new framework has been successfully applied to highway network extraction from a large orthophoto mosaic. In the process, vehicles on the highway extracted from mosaic were detected with an 83% success rate.

  20. Support Vector Machine-Based Endmember Extraction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Filippi, Anthony M; Archibald, Richard K

    Introduced in this paper is the utilization of Support Vector Machines (SVMs) to automatically perform endmember extraction from hyperspectral data. The strengths of SVM are exploited to provide a fast and accurate calculated representation of high-dimensional data sets that may consist of multiple distributions. Once this representation is computed, the number of distributions can be determined without prior knowledge. For each distribution, an optimal transform can be determined that preserves informational content while reducing the data dimensionality, and hence, the computational cost. Finally, endmember extraction for the whole data set is accomplished. Results indicate that this Support Vector Machine-Based Endmembermore » Extraction (SVM-BEE) algorithm has the capability of autonomously determining endmembers from multiple clusters with computational speed and accuracy, while maintaining a robust tolerance to noise.« less

  1. Application of Machine Learning in Urban Greenery Land Cover Extraction

    NASA Astrophysics Data System (ADS)

    Qiao, X.; Li, L. L.; Li, D.; Gan, Y. L.; Hou, A. Y.

    2018-04-01

    Urban greenery is a critical part of the modern city and the greenery coverage information is essential for land resource management, environmental monitoring and urban planning. It is a challenging work to extract the urban greenery information from remote sensing image as the trees and grassland are mixed with city built-ups. In this paper, we propose a new automatic pixel-based greenery extraction method using multispectral remote sensing images. The method includes three main steps. First, a small part of the images is manually interpreted to provide prior knowledge. Secondly, a five-layer neural network is trained and optimised with the manual extraction results, which are divided to serve as training samples, verification samples and testing samples. Lastly, the well-trained neural network will be applied to the unlabelled data to perform the greenery extraction. The GF-2 and GJ-1 high resolution multispectral remote sensing images were used to extract greenery coverage information in the built-up areas of city X. It shows a favourable performance in the 619 square kilometers areas. Also, when comparing with the traditional NDVI method, the proposed method gives a more accurate delineation of the greenery region. Due to the advantage of low computational load and high accuracy, it has a great potential for large area greenery auto extraction, which saves a lot of manpower and resources.

  2. Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration’s Adverse Event Reporting System Narratives

    PubMed Central

    Polepalli Ramesh, Balaji; Belknap, Steven M; Li, Zuofeng; Frid, Nadya; West, Dennis P

    2014-01-01

    Background The Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) is a repository of spontaneously-reported adverse drug events (ADEs) for FDA-approved prescription drugs. FAERS reports include both structured reports and unstructured narratives. The narratives often include essential information for evaluation of the severity, causality, and description of ADEs that are not present in the structured data. The timely identification of unknown toxicities of prescription drugs is an important, unsolved problem. Objective The objective of this study was to develop an annotated corpus of FAERS narratives and biomedical named entity tagger to automatically identify ADE related information in the FAERS narratives. Methods We developed an annotation guideline and annotate medication information and adverse event related entities on 122 FAERS narratives comprising approximately 23,000 word tokens. A named entity tagger using supervised machine learning approaches was built for detecting medication information and adverse event entities using various categories of features. Results The annotated corpus had an agreement of over .9 Cohen’s kappa for medication and adverse event entities. The best performing tagger achieves an overall performance of 0.73 F1 score for detection of medication, adverse event and other named entities. Conclusions In this study, we developed an annotated corpus of FAERS narratives and machine learning based models for automatically extracting medication and adverse event information from the FAERS narratives. Our study is an important step towards enriching the FAERS data for postmarketing pharmacovigilance. PMID:25600332

  3. Fine-grained information extraction from German transthoracic echocardiography reports.

    PubMed

    Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank

    2015-11-12

    Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.

  4. Knowledge Representation Of CT Scans Of The Head

    NASA Astrophysics Data System (ADS)

    Ackerman, Laurens V.; Burke, M. W.; Rada, Roy

    1984-06-01

    We have been investigating diagnostic knowledge models which assist in the automatic classification of medical images by combining information extracted from each image with knowledge specific to that class of images. In a more general sense we are trying to integrate verbal and pictorial descriptions of disease via representations of knowledge, study automatic hypothesis generation as related to clinical medicine, evolve new mathematical image measures while integrating them into the total diagnostic process, and investigate ways to augment the knowledge of the physician. Specifically, we have constructed an artificial intelligence knowledge model using the technique of a production system blending pictorial and verbal knowledge about the respective CT scan and patient history. It is an attempt to tie together different sources of knowledge representation, picture feature extraction and hypothesis generation. Our knowledge reasoning and representation system (KRRS) works with data at the conscious reasoning level of the practicing physician while at the visual perceptional level we are building another production system, the picture parameter extractor (PPE). This paper describes KRRS and its relationship to PPE.

  5. Feasibility of Automatic Extraction of Electronic Health Data to Evaluate a Status Epilepticus Clinical Protocol.

    PubMed

    Hafeez, Baria; Paolicchi, Juliann; Pon, Steven; Howell, Joy D; Grinspan, Zachary M

    2016-05-01

    Status epilepticus is a common neurologic emergency in children. Pediatric medical centers often develop protocols to standardize care. Widespread adoption of electronic health records by hospitals affords the opportunity for clinicians to rapidly, and electronically evaluate protocol adherence. We reviewed the clinical data of a small sample of 7 children with status epilepticus, in order to (1) qualitatively determine the feasibility of automated data extraction and (2) demonstrate a timeline-style visualization of each patient's first 24 hours of care. Qualitatively, our observations indicate that most clinical data are well labeled in structured fields within the electronic health record, though some important information, particularly electroencephalography (EEG) data, may require manual abstraction. We conclude that a visualization that clarifies a patient's clinical course can be automatically created using the patient's electronic clinical data, supplemented with some manually abstracted data. Future work could use this timeline to evaluate adherence to status epilepticus clinical protocols. © The Author(s) 2015.

  6. Human image tracking technique applied to remote collaborative environments

    NASA Astrophysics Data System (ADS)

    Nagashima, Yoshio; Suzuki, Gen

    1993-10-01

    To support various kinds of collaborations over long distances by using visual telecommunication, it is necessary to transmit visual information related to the participants and topical materials. When people collaborate in the same workspace, they use visual cues such as facial expressions and eye movement. The realization of coexistence in a collaborative workspace requires the support of these visual cues. Therefore, it is important that the facial images be large enough to be useful. During collaborations, especially dynamic collaborative activities such as equipment operation or lectures, the participants often move within the workspace. When the people move frequently or over a wide area, the necessity for automatic human tracking increases. Using the movement area of the human being or the resolution of the extracted area, we have developed a memory tracking method and a camera tracking method for automatic human tracking. Experimental results using a real-time tracking system show that the extracted area fairly moves according to the movement of the human head.

  7. Analysis and automatic identification of sleep stages using higher order spectra.

    PubMed

    Acharya, U Rajendra; Chua, Eric Chern-Pin; Chua, Kuang Chua; Min, Lim Choo; Tamura, Toshiyo

    2010-12-01

    Electroencephalogram (EEG) signals are widely used to study the activity of the brain, such as to determine sleep stages. These EEG signals are nonlinear and non-stationary in nature. It is difficult to perform sleep staging by visual interpretation and linear techniques. Thus, we use a nonlinear technique, higher order spectra (HOS), to extract hidden information in the sleep EEG signal. In this study, unique bispectrum and bicoherence plots for various sleep stages were proposed. These can be used as visual aid for various diagnostics application. A number of HOS based features were extracted from these plots during the various sleep stages (Wakefulness, Rapid Eye Movement (REM), Stage 1-4 Non-REM) and they were found to be statistically significant with p-value lower than 0.001 using ANOVA test. These features were fed to a Gaussian mixture model (GMM) classifier for automatic identification. Our results indicate that the proposed system is able to identify sleep stages with an accuracy of 88.7%.

  8. BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature.

    PubMed

    de la Calle, Guillermo; García-Remesal, Miguel; Chiesa, Stefano; de la Iglesia, Diana; Maojo, Victor

    2009-10-07

    The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at http://edelman.dia.fi.upm.es/biri/.

  9. DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG.

    PubMed

    Supratak, Akara; Dong, Hao; Wu, Chao; Guo, Yike

    2017-11-01

    This paper proposes a deep learning model, named DeepSleepNet, for automatic sleep stage scoring based on raw single-channel EEG. Most of the existing methods rely on hand-engineered features, which require prior knowledge of sleep analysis. Only a few of them encode the temporal information, such as transition rules, which is important for identifying the next sleep stages, into the extracted features. In the proposed model, we utilize convolutional neural networks to extract time-invariant features, and bidirectional-long short-term memory to learn transition rules among sleep stages automatically from EEG epochs. We implement a two-step training algorithm to train our model efficiently. We evaluated our model using different single-channel EEGs (F4-EOG (left), Fpz-Cz, and Pz-Oz) from two public sleep data sets, that have different properties (e.g., sampling rate) and scoring standards (AASM and R&K). The results showed that our model achieved similar overall accuracy and macro F1-score (MASS: 86.2%-81.7, Sleep-EDF: 82.0%-76.9) compared with the state-of-the-art methods (MASS: 85.9%-80.5, Sleep-EDF: 78.9%-73.7) on both data sets. This demonstrated that, without changing the model architecture and the training algorithm, our model could automatically learn features for sleep stage scoring from different raw single-channel EEGs from different data sets without utilizing any hand-engineered features.

  10. Integrated Functional and Executional Modelling of Software Using Web-Based Databases

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Marietta, Roberta

    1998-01-01

    NASA's software subsystems undergo extensive modification and updates over the operational lifetimes. It is imperative that modified software should satisfy safety goals. This report discusses the difficulties encountered in doing so and discusses a solution based on integrated modelling of software, use of automatic information extraction tools, web technology and databases. To appear in an article of Journal of Database Management.

  11. Efficient content-based low-altitude images correlated network and strips reconstruction

    NASA Astrophysics Data System (ADS)

    He, Haiqing; You, Qi; Chen, Xiaoyong

    2017-01-01

    The manual intervention method is widely used to reconstruct strips for further aerial triangulation in low-altitude photogrammetry. Clearly the method for fully automatic photogrammetric data processing is not an expected way. In this paper, we explore a content-based approach without manual intervention or external information for strips reconstruction. Feature descriptors in the local spatial patterns are extracted by SIFT to construct vocabulary tree, in which these features are encoded in terms of TF-IDF numerical statistical algorithm to generate new representation for each low-altitude image. Then images correlated network is reconstructed by similarity measure, image matching and geometric graph theory. Finally, strips are reconstructed automatically by tracing straight lines and growing adjacent images gradually. Experimental results show that the proposed approach is highly effective in automatically rearranging strips of lowaltitude images and can provide rough relative orientation for further aerial triangulation.

  12. Automatic diet monitoring: a review of computer vision and wearable sensor-based methods.

    PubMed

    Hassannejad, Hamid; Matrella, Guido; Ciampolini, Paolo; De Munari, Ilaria; Mordonini, Monica; Cagnoni, Stefano

    2017-09-01

    Food intake and eating habits have a significant impact on people's health. Widespread diseases, such as diabetes and obesity, are directly related to eating habits. Therefore, monitoring diet can be a substantial base for developing methods and services to promote healthy lifestyle and improve personal and national health economy. Studies have demonstrated that manual reporting of food intake is inaccurate and often impractical. Thus, several methods have been proposed to automate the process. This article reviews the most relevant and recent researches on automatic diet monitoring, discussing their strengths and weaknesses. In particular, the article reviews two approaches to this problem, accounting for most of the work in the area. The first approach is based on image analysis and aims at extracting information about food content automatically from food images. The second one relies on wearable sensors and has the detection of eating behaviours as its main goal.

  13. Natural language processing of spoken diet records (SDRs).

    PubMed

    Lacson, Ronilda; Long, William

    2006-01-01

    Dietary assessment is a fundamental aspect of nutritional evaluation that is essential for management of obesity as well as for assessing dietary impact on chronic diseases. Various methods have been used for dietary assessment including written records, 24-hour recalls, and food frequency questionnaires. The use of mobile phones to provide real-time dietary records provides potential advantages for accessibility, ease of use and automated documentation. However, understanding even a perfect transcript of spoken dietary records (SDRs) is challenging for people. This work presents a first step towards automatic analysis of SDRs. Our approach consists of four steps - identification of food items, identification of food quantifiers, classification of food quantifiers and temporal annotation. Our method enables automatic extraction of dietary information from SDRs, which in turn allows automated mapping to a Diet History Questionnaire dietary database. Our model has an accuracy of 90%. This work demonstrates the feasibility of automatically processing SDRs.

  14. Textural-Contextual Labeling and Metadata Generation for Remote Sensing Applications

    NASA Technical Reports Server (NTRS)

    Kiang, Richard K.

    1999-01-01

    Despite the extensive research and the advent of several new information technologies in the last three decades, machine labeling of ground categories using remotely sensed data has not become a routine process. Considerable amount of human intervention is needed to achieve a level of acceptable labeling accuracy. A number of fundamental reasons may explain why machine labeling has not become automatic. In addition, there may be shortcomings in the methodology for labeling ground categories. The spatial information of a pixel, whether textural or contextual, relates a pixel to its surroundings. This information should be utilized to improve the performance of machine labeling of ground categories. Landsat-4 Thematic Mapper (TM) data taken in July 1982 over an area in the vicinity of Washington, D.C. are used in this study. On-line texture extraction by neural networks may not be the most efficient way to incorporate textural information into the labeling process. Texture features are pre-computed from cooccurrence matrices and then combined with a pixel's spectral and contextual information as the input to a neural network. The improvement in labeling accuracy with spatial information included is significant. The prospect of automatic generation of metadata consisting of ground categories, textural and contextual information is discussed.

  15. Application of Magnetic Nanoparticles in Pretreatment Device for POPs Analysis in Water

    NASA Astrophysics Data System (ADS)

    Chu, Dongzhi; Kong, Xiangfeng; Wu, Bingwei; Fan, Pingping; Cao, Xuan; Zhang, Ting

    2018-01-01

    In order to reduce process time and labour force of POPs pretreatment, and solve the problem that extraction column was easily clogged, the paper proposed a new technology of extraction and enrichment which used magnetic nanoparticles. Automatic pretreatment system had automatic sampling unit, extraction enrichment unit and elution enrichment unit. The paper briefly introduced the preparation technology of magnetic nanoparticles, and detailly introduced the structure and control system of automatic pretreatment system. The result of magnetic nanoparticles mass recovery experiments showed that the system had POPs analysis preprocessing capability, and the recovery rate of magnetic nanoparticles were over 70%. In conclusion, the author proposed three points optimization recommendation.

  16. Search Analytics: Automated Learning, Analysis, and Search with Open Source

    NASA Astrophysics Data System (ADS)

    Hundman, K.; Mattmann, C. A.; Hyon, J.; Ramirez, P.

    2016-12-01

    The sheer volume of unstructured scientific data makes comprehensive human analysis impossible, resulting in missed opportunities to identify relationships, trends, gaps, and outliers. As the open source community continues to grow, tools like Apache Tika, Apache Solr, Stanford's DeepDive, and Data-Driven Documents (D3) can help address this challenge. With a focus on journal publications and conference abstracts often in the form of PDF and Microsoft Office documents, we've initiated an exploratory NASA Advanced Concepts project aiming to use the aforementioned open source text analytics tools to build a data-driven justification for the HyspIRI Decadal Survey mission. We call this capability Search Analytics, and it fuses and augments these open source tools to enable the automatic discovery and extraction of salient information. In the case of HyspIRI, a hyperspectral infrared imager mission, key findings resulted from the extractions and visualizations of relationships from thousands of unstructured scientific documents. The relationships include links between satellites (e.g. Landsat 8), domain-specific measurements (e.g. spectral coverage) and subjects (e.g. invasive species). Using the above open source tools, Search Analytics mined and characterized a corpus of information that would be infeasible for a human to process. More broadly, Search Analytics offers insights into various scientific and commercial applications enabled through missions and instrumentation with specific technical capabilities. For example, the following phrases were extracted in close proximity within a publication: "In this study, hyperspectral images…with high spatial resolution (1 m) were analyzed to detect cutleaf teasel in two areas. …Classification of cutleaf teasel reached a users accuracy of 82 to 84%." Without reading a single paper we can use Search Analytics to automatically identify that a 1 m spatial resolution provides a cutleaf teasel detection users accuracy of 82-84%, which could have tangible, direct downstream implications for crop protection. Automatically assimilating this information expedites and supplements human analysis, and, ultimately, Search Analytics and its foundation of open source tools will result in more efficient scientific investment and research.

  17. Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

    PubMed Central

    Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng

    2011-01-01

    A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677

  18. Automatic sleep staging using empirical mode decomposition, discrete wavelet transform, time-domain, and nonlinear dynamics features of heart rate variability signals.

    PubMed

    Ebrahimi, Farideh; Setarehdan, Seyed-Kamaledin; Ayala-Moyeda, Jose; Nazeran, Homer

    2013-10-01

    The conventional method for sleep staging is to analyze polysomnograms (PSGs) recorded in a sleep lab. The electroencephalogram (EEG) is one of the most important signals in PSGs but recording and analysis of this signal presents a number of technical challenges, especially at home. Instead, electrocardiograms (ECGs) are much easier to record and may offer an attractive alternative for home sleep monitoring. The heart rate variability (HRV) signal proves suitable for automatic sleep staging. Thirty PSGs from the Sleep Heart Health Study (SHHS) database were used. Three feature sets were extracted from 5- and 0.5-min HRV segments: time-domain features, nonlinear-dynamics features and time-frequency features. The latter was achieved by using empirical mode decomposition (EMD) and discrete wavelet transform (DWT) methods. Normalized energies in important frequency bands of HRV signals were computed using time-frequency methods. ANOVA and t-test were used for statistical evaluations. Automatic sleep staging was based on HRV signal features. The ANOVA followed by a post hoc Bonferroni was used for individual feature assessment. Most features were beneficial for sleep staging. A t-test was used to compare the means of extracted features in 5- and 0.5-min HRV segments. The results showed that the extracted features means were statistically similar for a small number of features. A separability measure showed that time-frequency features, especially EMD features, had larger separation than others. There was not a sizable difference in separability of linear features between 5- and 0.5-min HRV segments but separability of nonlinear features, especially EMD features, decreased in 0.5-min HRV segments. HRV signal features were classified by linear discriminant (LD) and quadratic discriminant (QD) methods. Classification results based on features from 5-min segments surpassed those obtained from 0.5-min segments. The best result was obtained from features using 5-min HRV segments classified by the LD classifier. A combination of linear/nonlinear features from HRV signals is effective in automatic sleep staging. Moreover, time-frequency features are more informative than others. In addition, a separability measure and classification results showed that HRV signal features, especially nonlinear features, extracted from 5-min segments are more discriminative than those from 0.5-min segments in automatic sleep staging. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  19. Method for automatic detection of wheezing in lung sounds.

    PubMed

    Riella, R J; Nohama, P; Maia, J M

    2009-07-01

    The present report describes the development of a technique for automatic wheezing recognition in digitally recorded lung sounds. This method is based on the extraction and processing of spectral information from the respiratory cycle and the use of these data for user feedback and automatic recognition. The respiratory cycle is first pre-processed, in order to normalize its spectral information, and its spectrogram is then computed. After this procedure, the spectrogram image is processed by a two-dimensional convolution filter and a half-threshold in order to increase the contrast and isolate its highest amplitude components, respectively. Thus, in order to generate more compressed data to automatic recognition, the spectral projection from the processed spectrogram is computed and stored as an array. The higher magnitude values of the array and its respective spectral values are then located and used as inputs to a multi-layer perceptron artificial neural network, which results an automatic indication about the presence of wheezes. For validation of the methodology, lung sounds recorded from three different repositories were used. The results show that the proposed technique achieves 84.82% accuracy in the detection of wheezing for an isolated respiratory cycle and 92.86% accuracy for the detection of wheezes when detection is carried out using groups of respiratory cycles obtained from the same person. Also, the system presents the original recorded sound and the post-processed spectrogram image for the user to draw his own conclusions from the data.

  20. Information extraction from multi-institutional radiology reports.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Intelligence Surveillance And Reconnaissance Full Motion Video Automatic Anomaly Detection Of Crowd Movements: System Requirements For Airborne Application

    DTIC Science & Technology

    The collection of Intelligence , Surveillance, and Reconnaissance (ISR) Full Motion Video (FMV) is growing at an exponential rate, and the manual... intelligence for the warfighter. This paper will address the question of how can automatic pattern extraction, based on computer vision, extract anomalies in

  2. A dual growing method for the automatic extraction of individual trees from mobile laser scanning data

    NASA Astrophysics Data System (ADS)

    Li, Lin; Li, Dalin; Zhu, Haihong; Li, You

    2016-10-01

    Street trees interlaced with other objects in cluttered point clouds of urban scenes inhibit the automatic extraction of individual trees. This paper proposes a method for the automatic extraction of individual trees from mobile laser scanning data, according to the general constitution of trees. Two components of each individual tree - a trunk and a crown can be extracted by the dual growing method. This method consists of coarse classification, through which most of artifacts are removed; the automatic selection of appropriate seeds for individual trees, by which the common manual initial setting is avoided; a dual growing process that separates one tree from others by circumscribing a trunk in an adaptive growing radius and segmenting a crown in constrained growing regions; and a refining process that draws a singular trunk from the interlaced other objects. The method is verified by two datasets with over 98% completeness and over 96% correctness. The low mean absolute percentage errors in capturing the morphological parameters of individual trees indicate that this method can output individual trees with high precision.

  3. Automatic Human Movement Assessment With Switching Linear Dynamic System: Motion Segmentation and Motor Performance.

    PubMed

    de Souza Baptista, Roberto; Bo, Antonio P L; Hayashibe, Mitsuhiro

    2017-06-01

    Performance assessment of human movement is critical in diagnosis and motor-control rehabilitation. Recent developments in portable sensor technology enable clinicians to measure spatiotemporal aspects to aid in the neurological assessment. However, the extraction of quantitative information from such measurements is usually done manually through visual inspection. This paper presents a novel framework for automatic human movement assessment that executes segmentation and motor performance parameter extraction in time-series of measurements from a sequence of human movements. We use the elements of a Switching Linear Dynamic System model as building blocks to translate formal definitions and procedures from human movement analysis. Our approach provides a method for users with no expertise in signal processing to create models for movements using labeled dataset and later use it for automatic assessment. We validated our framework on preliminary tests involving six healthy adult subjects that executed common movements in functional tests and rehabilitation exercise sessions, such as sit-to-stand and lateral elevation of the arms and five elderly subjects, two of which with limited mobility, that executed the sit-to-stand movement. The proposed method worked on random motion sequences for the dual purpose of movement segmentation (accuracy of 72%-100%) and motor performance assessment (mean error of 0%-12%).

  4. Accelerometry-based classification of human activities using Markov modeling.

    PubMed

    Mannini, Andrea; Sabatini, Angelo Maria

    2011-01-01

    Accelerometers are a popular choice as body-motion sensors: the reason is partly in their capability of extracting information that is useful for automatically inferring the physical activity in which the human subject is involved, beside their role in feeding biomechanical parameters estimators. Automatic classification of human physical activities is highly attractive for pervasive computing systems, whereas contextual awareness may ease the human-machine interaction, and in biomedicine, whereas wearable sensor systems are proposed for long-term monitoring. This paper is concerned with the machine learning algorithms needed to perform the classification task. Hidden Markov Model (HMM) classifiers are studied by contrasting them with Gaussian Mixture Model (GMM) classifiers. HMMs incorporate the statistical information available on movement dynamics into the classification process, without discarding the time history of previous outcomes as GMMs do. An example of the benefits of the obtained statistical leverage is illustrated and discussed by analyzing two datasets of accelerometer time series.

  5. Impact of JPEG2000 compression on endmember extraction and unmixing of remotely sensed hyperspectral data

    NASA Astrophysics Data System (ADS)

    Martin, Gabriel; Gonzalez-Ruiz, Vicente; Plaza, Antonio; Ortiz, Juan P.; Garcia, Inmaculada

    2010-07-01

    Lossy hyperspectral image compression has received considerable interest in recent years due to the extremely high dimensionality of the data. However, the impact of lossy compression on spectral unmixing techniques has not been widely studied. These techniques characterize mixed pixels (resulting from insufficient spatial resolution) in terms of a suitable combination of spectrally pure substances (called endmembers) weighted by their estimated fractional abundances. This paper focuses on the impact of JPEG2000-based lossy compression of hyperspectral images on the quality of the endmembers extracted by different algorithms. The three considered algorithms are the orthogonal subspace projection (OSP), which uses only spatial information, and the automatic morphological endmember extraction (AMEE) and spatial spectral endmember extraction (SSEE), which integrate both spatial and spectral information in the search for endmembers. The impact of compression on the resulting abundance estimation based on the endmembers derived by different methods is also substantiated. Experimental results are conducted using a hyperspectral data set collected by NASA Jet Propulsion Laboratory over the Cuprite mining district in Nevada. The experimental results are quantitatively analyzed using reference information available from U.S. Geological Survey, resulting in recommendations to specialists interested in applying endmember extraction and unmixing algorithms to compressed hyperspectral data.

  6. BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.

    PubMed

    Schomburg, Ida; Chang, Antje; Placzek, Sandra; Söhngen, Carola; Rother, Michael; Lang, Maren; Munaretto, Cornelia; Ulas, Susanne; Stelzer, Michael; Grote, Andreas; Scheer, Maurice; Schomburg, Dietmar

    2013-01-01

    The BRENDA (BRaunschweig ENzyme DAtabase) enzyme portal (http://www.brenda-enzymes.org) is the main information system of functional biochemical and molecular enzyme data and provides access to seven interconnected databases. BRENDA contains 2.7 million manually annotated data on enzyme occurrence, function, kinetics and molecular properties. Each entry is connected to a reference and the source organism. Enzyme ligands are stored with their structures and can be accessed via their names, synonyms or via a structure search. FRENDA (Full Reference ENzyme DAta) and AMENDA (Automatic Mining of ENzyme DAta) are based on text mining methods and represent a complete survey of PubMed abstracts with information on enzymes in different organisms, tissues or organelles. The supplemental database DRENDA provides more than 910 000 new EC number-disease relations in more than 510 000 references from automatic search and a classification of enzyme-disease-related information. KENDA (Kinetic ENzyme DAta), a new amendment extracts and displays kinetic values from PubMed abstracts. The integration of the EnzymeDetector offers an automatic comparison, evaluation and prediction of enzyme function annotations for prokaryotic genomes. The biochemical reaction database BKM-react contains non-redundant enzyme-catalysed and spontaneous reactions and was developed to facilitate and accelerate the construction of biochemical models.

  7. Extraction of small boat harmonic signatures from passive sonar.

    PubMed

    Ogden, George L; Zurk, Lisa M; Jones, Mark E; Peterson, Mary E

    2011-06-01

    This paper investigates the extraction of acoustic signatures from small boats using a passive sonar system. Noise radiated from a small boats consists of broadband noise and harmonically related tones that correspond to engine and propeller specifications. A signal processing method to automatically extract the harmonic structure of noise radiated from small boats is developed. The Harmonic Extraction and Analysis Tool (HEAT) estimates the instantaneous fundamental frequency of the harmonic tones, refines the fundamental frequency estimate using a Kalman filter, and automatically extracts the amplitudes of the harmonic tonals to generate a harmonic signature for the boat. Results are presented that show the HEAT algorithms ability to extract these signatures. © 2011 Acoustical Society of America

  8. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

    PubMed Central

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/ PMID:27694210

  9. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.

    PubMed

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. NEMO: Extraction and normalization of organization names from PubMed affiliations.

    PubMed

    Jonnalagadda, Siddhartha Reddy; Topham, Philip

    2010-10-04

    Today, there are more than 18 million articles related to biomedical research indexed in MEDLINE, and information derived from them could be used effectively to save the great amount of time and resources spent by government agencies in understanding the scientific landscape, including key opinion leaders and centers of excellence. Associating biomedical articles with organization names could significantly benefit the pharmaceutical marketing industry, health care funding agencies and public health officials and be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or collaborators. Large amount of extracted information helps in disambiguating organization names using machine-learning algorithms. We propose NEMO, a system for extracting organization names in the affiliation and normalizing them to a canonical organization name. Our parsing process involves multi-layered rule matching with multiple dictionaries. The system achieves more than 98% f-score in extracting organization names. Our process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. A high precision was also observed in normalization. NEMO is the missing link in associating each biomedical paper and its authors to an organization name in its canonical form and the Geopolitical location of the organization. This research could potentially help in analyzing large social networks of organizations for landscaping a particular topic, improving performance of author disambiguation, adding weak links in the co-author network of authors, augmenting NLM's MARS system for correcting errors in OCR output of affiliation field, and automatically indexing the PubMed citations with the normalized organization name and country. Our system is available as a graphical user interface available for download along with this paper.

  11. Vision-Based Finger Detection, Tracking, and Event Identification Techniques for Multi-Touch Sensing and Display Systems

    PubMed Central

    Chen, Yen-Lin; Liang, Wen-Yew; Chiang, Chuan-Yen; Hsieh, Tung-Ju; Lee, Da-Cheng; Yuan, Shyan-Ming; Chang, Yang-Lang

    2011-01-01

    This study presents efficient vision-based finger detection, tracking, and event identification techniques and a low-cost hardware framework for multi-touch sensing and display applications. The proposed approach uses a fast bright-blob segmentation process based on automatic multilevel histogram thresholding to extract the pixels of touch blobs obtained from scattered infrared lights captured by a video camera. The advantage of this automatic multilevel thresholding approach is its robustness and adaptability when dealing with various ambient lighting conditions and spurious infrared noises. To extract the connected components of these touch blobs, a connected-component analysis procedure is applied to the bright pixels acquired by the previous stage. After extracting the touch blobs from each of the captured image frames, a blob tracking and event recognition process analyzes the spatial and temporal information of these touch blobs from consecutive frames to determine the possible touch events and actions performed by users. This process also refines the detection results and corrects for errors and occlusions caused by noise and errors during the blob extraction process. The proposed blob tracking and touch event recognition process includes two phases. First, the phase of blob tracking associates the motion correspondence of blobs in succeeding frames by analyzing their spatial and temporal features. The touch event recognition process can identify meaningful touch events based on the motion information of touch blobs, such as finger moving, rotating, pressing, hovering, and clicking actions. Experimental results demonstrate that the proposed vision-based finger detection, tracking, and event identification system is feasible and effective for multi-touch sensing applications in various operational environments and conditions. PMID:22163990

  12. Multi-sensor image registration based on algebraic projective invariants.

    PubMed

    Li, Bin; Wang, Wei; Ye, Hao

    2013-04-22

    A new automatic feature-based registration algorithm is presented for multi-sensor images with projective deformation. Contours are firstly extracted from both reference and sensed images as basic features in the proposed method. Since it is difficult to design a projective-invariant descriptor from the contour information directly, a new feature named Five Sequential Corners (FSC) is constructed based on the corners detected from the extracted contours. By introducing algebraic projective invariants, we design a descriptor for each FSC that is ensured to be robust against projective deformation. Further, no gray scale related information is required in calculating the descriptor, thus it is also robust against the gray scale discrepancy between the multi-sensor image pairs. Experimental results utilizing real image pairs are presented to show the merits of the proposed registration method.

  13. Predicate Argument Structure Frames for Modeling Information in Operative Notes

    PubMed Central

    Wang, Yan; Pakhomov, Serguei; Melton, Genevieve B.

    2015-01-01

    The rich information about surgical procedures contained in operative notes is a valuable data source for improving the clinical evidence base and clinical research. In this study, we propose a set of Predicate Argument Structure (PAS) frames for surgical action verbs to assist in the creation of an information extraction (IE) system to automatically extract details about the techniques, equipment, and operative steps from operative notes. We created PropBank style PAS frames for the 30 top surgical action verbs based on examination of randomly selected sample sentences from 3,000 Laparoscopic Cholecystectomy notes. To assess completeness of the PAS frames to represent usage of same action verbs, we evaluated the PAS frames created on sample sentences from operative notes of 6 other gastrointestinal surgical procedures. Our results showed that the PAS frames created with one type of surgery can successfully denote the usage of the same verbs in operative notes of broader surgical categories. PMID:23920664

  14. Extended morphological processing: a practical method for automatic spot detection of biological markers from microscopic images.

    PubMed

    Kimori, Yoshitaka; Baba, Norio; Morone, Nobuhiro

    2010-07-08

    A reliable extraction technique for resolving multiple spots in light or electron microscopic images is essential in investigations of the spatial distribution and dynamics of specific proteins inside cells and tissues. Currently, automatic spot extraction and characterization in complex microscopic images poses many challenges to conventional image processing methods. A new method to extract closely located, small target spots from biological images is proposed. This method starts with a simple but practical operation based on the extended morphological top-hat transformation to subtract an uneven background. The core of our novel approach is the following: first, the original image is rotated in an arbitrary direction and each rotated image is opened with a single straight line-segment structuring element. Second, the opened images are unified and then subtracted from the original image. To evaluate these procedures, model images of simulated spots with closely located targets were created and the efficacy of our method was compared to that of conventional morphological filtering methods. The results showed the better performance of our method. The spots of real microscope images can be quantified to confirm that the method is applicable in a given practice. Our method achieved effective spot extraction under various image conditions, including aggregated target spots, poor signal-to-noise ratio, and large variations in the background intensity. Furthermore, it has no restrictions with respect to the shape of the extracted spots. The features of our method allow its broad application in biological and biomedical image information analysis.

  15. Three-dimensional tracking for efficient fire fighting in complex situations

    NASA Astrophysics Data System (ADS)

    Akhloufi, Moulay; Rossi, Lucile

    2009-05-01

    Each year, hundred millions hectares of forests burn causing human and economic losses. For efficient fire fighting, the personnel in the ground need tools permitting the prediction of fire front propagation. In this work, we present a new technique for automatically tracking fire spread in three-dimensional space. The proposed approach uses a stereo system to extract a 3D shape from fire images. A new segmentation technique is proposed and permits the extraction of fire regions in complex unstructured scenes. It works in the visible spectrum and combines information extracted from YUV and RGB color spaces. Unlike other techniques, our algorithm does not require previous knowledge about the scene. The resulting fire regions are classified into different homogenous zones using clustering techniques. Contours are then extracted and a feature detection algorithm is used to detect interest points like local maxima and corners. Extracted points from stereo images are then used to compute the 3D shape of the fire front. The resulting data permits to build the fire volume. The final model is used to compute important spatial and temporal fire characteristics like: spread dynamics, local orientation, heading direction, etc. Tests conducted on the ground show the efficiency of the proposed scheme. This scheme is being integrated with a fire spread mathematical model in order to predict and anticipate the fire behaviour during fire fighting. Also of interest to fire-fighters, is the proposed automatic segmentation technique that can be used in early detection of fire in complex scenes.

  16. Methods for automatically analyzing humpback song units.

    PubMed

    Rickwood, Peter; Taylor, Andrew

    2008-03-01

    This paper presents mathematical techniques for automatically extracting and analyzing bioacoustic signals. Automatic techniques are described for isolation of target signals from background noise, extraction of features from target signals and unsupervised classification (clustering) of the target signals based on these features. The only user-provided inputs, other than raw sound, is an initial set of signal processing and control parameters. Of particular note is that the number of signal categories is determined automatically. The techniques, applied to hydrophone recordings of humpback whales (Megaptera novaeangliae), produce promising initial results, suggesting that they may be of use in automated analysis of not only humpbacks, but possibly also in other bioacoustic settings where automated analysis is desirable.

  17. An algorithm for automatic parameter adjustment for brain extraction in BrainSuite

    NASA Astrophysics Data System (ADS)

    Rajagopal, Gautham; Joshi, Anand A.; Leahy, Richard M.

    2017-02-01

    Brain Extraction (classification of brain and non-brain tissue) of MRI brain images is a crucial pre-processing step necessary for imaging-based anatomical studies of the human brain. Several automated methods and software tools are available for performing this task, but differences in MR image parameters (pulse sequence, resolution) and instrumentand subject-dependent noise and artefacts affect the performance of these automated methods. We describe and evaluate a method that automatically adapts the default parameters of the Brain Surface Extraction (BSE) algorithm to optimize a cost function chosen to reflect accurate brain extraction. BSE uses a combination of anisotropic filtering, Marr-Hildreth edge detection, and binary morphology for brain extraction. Our algorithm automatically adapts four parameters associated with these steps to maximize the brain surface area to volume ratio. We evaluate the method on a total of 109 brain volumes with ground truth brain masks generated by an expert user. A quantitative evaluation of the performance of the proposed algorithm showed an improvement in the mean (s.d.) Dice coefficient from 0.8969 (0.0376) for default parameters to 0.9509 (0.0504) for the optimized case. These results indicate that automatic parameter optimization can result in significant improvements in definition of the brain mask.

  18. THE APPLICATION OF ENGLISH-WORD MORPHOLOGY TO AUTOMATIC INDEXING AND EXTRACTING. ANNUAL SUMMARY REPORT.

    ERIC Educational Resources Information Center

    DOLBY, J.L.; AND OTHERS

    THE STUDY IS CONCERNED WITH THE LINGUISTIC PROBLEM INVOLVED IN TEXT COMPRESSION--EXTRACTING, INDEXING, AND THE AUTOMATIC CREATION OF SPECIAL-PURPOSE CITATION DICTIONARIES. IN SPITE OF EARLY SUCCESS IN USING LARGE-SCALE COMPUTERS TO AUTOMATE CERTAIN HUMAN TASKS, THESE PROBLEMS REMAIN AMONG THE MOST DIFFICULT TO SOLVE. ESSENTIALLY, THE PROBLEM IS TO…

  19. A new artefacts resistant method for automatic lineament extraction using Multi-Hillshade Hierarchic Clustering (MHHC)

    NASA Astrophysics Data System (ADS)

    Šilhavý, Jakub; Minár, Jozef; Mentlík, Pavel; Sládek, Ján

    2016-07-01

    This paper presents a new method of automatic lineament extraction which includes the removal of the 'artefacts effect' which is associated with the process of raster based analysis. The core of the proposed Multi-Hillshade Hierarchic Clustering (MHHC) method incorporates a set of variously illuminated and rotated hillshades in combination with hierarchic clustering of derived 'protolineaments'. The algorithm also includes classification into positive and negative lineaments. MHHC was tested in two different territories in Bohemian Forest and Central Western Carpathians. The original vector-based algorithm was developed for comparison of the individual lineaments proximity. Its use confirms the compatibility of manual and automatic extraction and their similar relationships to structural data in the study areas.

  20. 2D automatic body-fitted structured mesh generation using advancing extraction method

    NASA Astrophysics Data System (ADS)

    Zhang, Yaoxin; Jia, Yafei

    2018-01-01

    This paper presents an automatic mesh generation algorithm for body-fitted structured meshes in Computational Fluids Dynamics (CFD) analysis using the Advancing Extraction Method (AEM). The method is applicable to two-dimensional domains with complex geometries, which have the hierarchical tree-like topography with extrusion-like structures (i.e., branches or tributaries) and intrusion-like structures (i.e., peninsula or dikes). With the AEM, the hierarchical levels of sub-domains can be identified, and the block boundary of each sub-domain in convex polygon shape in each level can be extracted in an advancing scheme. In this paper, several examples were used to illustrate the effectiveness and applicability of the proposed algorithm for automatic structured mesh generation, and the implementation of the method.

  1. Automatic extraction of discontinuity orientation from rock mass surface 3D point cloud

    NASA Astrophysics Data System (ADS)

    Chen, Jianqin; Zhu, Hehua; Li, Xiaojun

    2016-10-01

    This paper presents a new method for extracting discontinuity orientation automatically from rock mass surface 3D point cloud. The proposed method consists of four steps: (1) automatic grouping of discontinuity sets using an improved K-means clustering method, (2) discontinuity segmentation and optimization, (3) discontinuity plane fitting using Random Sample Consensus (RANSAC) method, and (4) coordinate transformation of discontinuity plane. The method is first validated by the point cloud of a small piece of a rock slope acquired by photogrammetry. The extracted discontinuity orientations are compared with measured ones in the field. Then it is applied to a publicly available LiDAR data of a road cut rock slope at Rockbench repository. The extracted discontinuity orientations are compared with the method proposed by Riquelme et al. (2014). The results show that the presented method is reliable and of high accuracy, and can meet the engineering needs.

  2. Automatic extraction of blocks from 3D point clouds of fractured rock

    NASA Astrophysics Data System (ADS)

    Chen, Na; Kemeny, John; Jiang, Qinghui; Pan, Zhiwen

    2017-12-01

    This paper presents a new method for extracting blocks and calculating block size automatically from rock surface 3D point clouds. Block size is an important rock mass characteristic and forms the basis for several rock mass classification schemes. The proposed method consists of four steps: 1) the automatic extraction of discontinuities using an improved Ransac Shape Detection method, 2) the calculation of discontinuity intersections based on plane geometry, 3) the extraction of block candidates based on three discontinuities intersecting one another to form corners, and 4) the identification of "true" blocks using an improved Floodfill algorithm. The calculated block sizes were compared with manual measurements in two case studies, one with fabricated cardboard blocks and the other from an actual rock mass outcrop. The results demonstrate that the proposed method is accurate and overcomes the inaccuracies, safety hazards, and biases of traditional techniques.

  3. Building Extraction from Remote Sensing Data Using Fully Convolutional Networks

    NASA Astrophysics Data System (ADS)

    Bittner, K.; Cui, S.; Reinartz, P.

    2017-05-01

    Building detection and footprint extraction are highly demanded for many remote sensing applications. Though most previous works have shown promising results, the automatic extraction of building footprints still remains a nontrivial topic, especially in complex urban areas. Recently developed extensions of the CNN framework made it possible to perform dense pixel-wise classification of input images. Based on these abilities we propose a methodology, which automatically generates a full resolution binary building mask out of a Digital Surface Model (DSM) using a Fully Convolution Network (FCN) architecture. The advantage of using the depth information is that it provides geometrical silhouettes and allows a better separation of buildings from background as well as through its invariance to illumination and color variations. The proposed framework has mainly two steps. Firstly, the FCN is trained on a large set of patches consisting of normalized DSM (nDSM) as inputs and available ground truth building mask as target outputs. Secondly, the generated predictions from FCN are viewed as unary terms for a Fully connected Conditional Random Fields (FCRF), which enables us to create a final binary building mask. A series of experiments demonstrate that our methodology is able to extract accurate building footprints which are close to the buildings original shapes to a high degree. The quantitative and qualitative analysis show the significant improvements of the results in contrast to the multy-layer fully connected network from our previous work.

  4. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association.

    PubMed

    Ma, Jian; Casey, Cameron P; Zheng, Xueyun; Ibrahim, Yehia M; Wilkins, Christopher S; Renslow, Ryan S; Thomas, Dennis G; Payne, Samuel H; Monroe, Matthew E; Smith, Richard D; Teeguarden, Justin G; Baker, Erin S; Metz, Thomas O

    2017-09-01

    Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library . erin.baker@pnnl.gov or thomas.metz@pnnl.gov. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  5. Remotely-interrogated high data rate free space laser communications link

    DOEpatents

    Ruggiero, Anthony J [Livermore, CA

    2007-05-29

    A system and method of remotely extracting information from a communications station by interrogation with a low power beam. Nonlinear phase conjugation of the low power beam results in a high power encoded return beam that automatically tracks the input beam and is corrected for atmospheric distortion. Intracavity nondegenerate four wave mixing is used in a broad area semiconductor laser in the communications station to produce the return beam.

  6. Array data extractor (ADE): a LabVIEW program to extract and merge gene array data.

    PubMed

    Kurtenbach, Stefan; Kurtenbach, Sarah; Zoidl, Georg

    2013-12-01

    Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Although existing software allows for complex data analyses, the LabVIEW based program presented here, "Array Data Extractor (ADE)", provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge.

  7. Enhancing Comparative Effectiveness Research With Automated Pediatric Pneumonia Detection in a Multi-Institutional Clinical Repository: A PHIS+ Pilot Study.

    PubMed

    Meystre, Stephane; Gouripeddi, Ramkiran; Tieder, Joel; Simmons, Jeffrey; Srivastava, Rajendu; Shah, Samir

    2017-05-15

    Community-acquired pneumonia is a leading cause of pediatric morbidity. Administrative data are often used to conduct comparative effectiveness research (CER) with sufficient sample sizes to enhance detection of important outcomes. However, such studies are prone to misclassification errors because of the variable accuracy of discharge diagnosis codes. The aim of this study was to develop an automated, scalable, and accurate method to determine the presence or absence of pneumonia in children using chest imaging reports. The multi-institutional PHIS+ clinical repository was developed to support pediatric CER by expanding an administrative database of children's hospitals with detailed clinical data. To develop a scalable approach to find patients with bacterial pneumonia more accurately, we developed a Natural Language Processing (NLP) application to extract relevant information from chest diagnostic imaging reports. Domain experts established a reference standard by manually annotating 282 reports to train and then test the NLP application. Findings of pleural effusion, pulmonary infiltrate, and pneumonia were automatically extracted from the reports and then used to automatically classify whether a report was consistent with bacterial pneumonia. Compared with the annotated diagnostic imaging reports reference standard, the most accurate implementation of machine learning algorithms in our NLP application allowed extracting relevant findings with a sensitivity of .939 and a positive predictive value of .925. It allowed classifying reports with a sensitivity of .71, a positive predictive value of .86, and a specificity of .962. When compared with each of the domain experts manually annotating these reports, the NLP application allowed for significantly higher sensitivity (.71 vs .527) and similar positive predictive value and specificity . NLP-based pneumonia information extraction of pediatric diagnostic imaging reports performed better than domain experts in this pilot study. NLP is an efficient method to extract information from a large collection of imaging reports to facilitate CER. ©Stephane Meystre, Ramkiran Gouripeddi, Joel Tieder, Jeffrey Simmons, Rajendu Srivastava, Samir Shah. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.05.2017.

  8. Automatic Extraction of Road Markings from Mobile Laser-Point Cloud Using Intensity Data

    NASA Astrophysics Data System (ADS)

    Yao, L.; Chen, Q.; Qin, C.; Wu, H.; Zhang, S.

    2018-04-01

    With the development of intelligent transportation, road's high precision information data has been widely applied in many fields. This paper proposes a concise and practical way to extract road marking information from point cloud data collected by mobile mapping system (MMS). The method contains three steps. Firstly, road surface is segmented through edge detection from scan lines. Then the intensity image is generated by inverse distance weighted (IDW) interpolation and the road marking is extracted by using adaptive threshold segmentation based on integral image without intensity calibration. Moreover, the noise is reduced by removing a small number of plaque pixels from binary image. Finally, point cloud mapped from binary image is clustered into marking objects according to Euclidean distance, and using a series of algorithms including template matching and feature attribute filtering for the classification of linear markings, arrow markings and guidelines. Through processing the point cloud data collected by RIEGL VUX-1 in case area, the results show that the F-score of marking extraction is 0.83, and the average classification rate is 0.9.

  9. Deep SOMs for automated feature extraction and classification from big data streaming

    NASA Astrophysics Data System (ADS)

    Sakkari, Mohamed; Ejbali, Ridha; Zaied, Mourad

    2017-03-01

    In this paper, we proposed a deep self-organizing map model (Deep-SOMs) for automated features extracting and learning from big data streaming which we benefit from the framework Spark for real time streams and highly parallel data processing. The SOMs deep architecture is based on the notion of abstraction (patterns automatically extract from the raw data, from the less to more abstract). The proposed model consists of three hidden self-organizing layers, an input and an output layer. Each layer is made up of a multitude of SOMs, each map only focusing at local headmistress sub-region from the input image. Then, each layer trains the local information to generate more overall information in the higher layer. The proposed Deep-SOMs model is unique in terms of the layers architecture, the SOMs sampling method and learning. During the learning stage we use a set of unsupervised SOMs for feature extraction. We validate the effectiveness of our approach on large data sets such as Leukemia dataset and SRBCT. Results of comparison have shown that the Deep-SOMs model performs better than many existing algorithms for images classification.

  10. Automatic Detection of Optic Disc in Retinal Image by Using Keypoint Detection, Texture Analysis, and Visual Dictionary Techniques

    PubMed Central

    Bayır, Şafak

    2016-01-01

    With the advances in the computer field, methods and techniques in automatic image processing and analysis provide the opportunity to detect automatically the change and degeneration in retinal images. Localization of the optic disc is extremely important for determining the hard exudate lesions or neovascularization, which is the later phase of diabetic retinopathy, in computer aided eye disease diagnosis systems. Whereas optic disc detection is fairly an easy process in normal retinal images, detecting this region in the retinal image which is diabetic retinopathy disease may be difficult. Sometimes information related to optic disc and hard exudate information may be the same in terms of machine learning. We presented a novel approach for efficient and accurate localization of optic disc in retinal images having noise and other lesions. This approach is comprised of five main steps which are image processing, keypoint extraction, texture analysis, visual dictionary, and classifier techniques. We tested our proposed technique on 3 public datasets and obtained quantitative results. Experimental results show that an average optic disc detection accuracy of 94.38%, 95.00%, and 90.00% is achieved, respectively, on the following public datasets: DIARETDB1, DRIVE, and ROC. PMID:27110272

  11. Co-evolutionary data mining for fuzzy rules: automatic fitness function creation phase space, and experiments

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Blank, Joseph A.

    2003-03-01

    An approach is being explored that involves embedding a fuzzy logic based resource manager in an electronic game environment. Game agents can function under their own autonomous logic or human control. This approach automates the data mining problem. The game automatically creates a cleansed database reflecting the domain expert's knowledge, it calls a data mining function, a genetic algorithm, for data mining of the data base as required and allows easy evaluation of the information extracted. The co-evolutionary fitness functions, chromosomes and stopping criteria for ending the game are discussed. Genetic algorithm and genetic program based data mining procedures are discussed that automatically discover new fuzzy rules and strategies. The strategy tree concept and its relationship to co-evolutionary data mining are examined as well as the associated phase space representation of fuzzy concepts. The overlap of fuzzy concepts in phase space reduces the effective strategies available to adversaries. Co-evolutionary data mining alters the geometric properties of the overlap region known as the admissible region of phase space significantly enhancing the performance of the resource manager. Procedures for validation of the information data mined are discussed and significant experimental results provided.

  12. Natural Language Processing in Radiology: A Systematic Review.

    PubMed

    Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A

    2016-05-01

    Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.

  13. Event-based text mining for biology and functional genomics

    PubMed Central

    Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

    2015-01-01

    The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365

  14. Drug drug interaction extraction from the literature using a recursive neural network

    PubMed Central

    Lim, Sangrak; Lee, Kyubum

    2018-01-01

    Detecting drug-drug interactions (DDI) is important because information on DDIs can help prevent adverse effects from drug combinations. Since there are many new DDI-related papers published in the biomedical domain, manually extracting DDI information from the literature is a laborious task. However, text mining can be used to find DDIs in the biomedical literature. Among the recently developed neural networks, we use a Recursive Neural Network to improve the performance of DDI extraction. Our recursive neural network model uses a position feature, a subtree containment feature, and an ensemble method to improve the performance of DDI extraction. Compared with the state-of-the-art models, the DDI detection and type classifiers of our model performed 4.4% and 2.8% better, respectively, on the DDIExtraction Challenge’13 test data. We also validated our model on the PK DDI corpus that consists of two types of DDIs data: in vivo DDI and in vitro DDI. Compared with the existing model, our detection classifier performed 2.3% and 6.7% better on in vivo and in vitro data respectively. The results of our validation demonstrate that our model can automatically extract DDIs better than existing models. PMID:29373599

  15. Enhancing Biomedical Text Summarization Using Semantic Relation Extraction

    PubMed Central

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336

  16. Automatic detection of adverse events to predict drug label changes using text and data mining techniques.

    PubMed

    Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki

    2013-11-01

    The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.

  17. Image-based automatic recognition of larvae

    NASA Astrophysics Data System (ADS)

    Sang, Ru; Yu, Guiying; Fan, Weijun; Guo, Tiantai

    2010-08-01

    As the main objects, imagoes have been researched in quarantine pest recognition in these days. However, pests in their larval stage are latent, and the larvae spread abroad much easily with the circulation of agricultural and forest products. It is presented in this paper that, as the new research objects, larvae are recognized by means of machine vision, image processing and pattern recognition. More visional information is reserved and the recognition rate is improved as color image segmentation is applied to images of larvae. Along with the characteristics of affine invariance, perspective invariance and brightness invariance, scale invariant feature transform (SIFT) is adopted for the feature extraction. The neural network algorithm is utilized for pattern recognition, and the automatic identification of larvae images is successfully achieved with satisfactory results.

  18. The IEEE GRSS Standardized Remote Sensing Data Website: A Step Towards "Science 2.0" in Remote Sensing

    NASA Astrophysics Data System (ADS)

    Dell'Acqua, Fabio; Iannelli, Gianni Cristian; Kerekes, John; Lisini, Gianni; Moser, Gabriele; Ricardi, Niccolo; Pierce, Leland

    2016-08-01

    The issue of homogeneity in performance assessment of proposed algorithms for information extraction is generally perceived also in the Earth Observation (EO) domain. Different authors propose different datasets to test their developed algorithms and to the reader it is frequently difficult to assess which is better for his/her specific application, given the wide variability in test sets that makes pure comparison of e.g. accuracy values less meaningful than one would desire. With our work, we gave a modest contribution to ease the problem by making it possible to automatically distribute a limited set of possible "standard" open datasets, together with some ground truth info, and automatically assess processing results provided by the users.

  19. Retrieval Algorithms for Road Surface Modelling Using Laser-Based Mobile Mapping.

    PubMed

    Jaakkola, Anttoni; Hyyppä, Juha; Hyyppä, Hannu; Kukko, Antero

    2008-09-01

    Automated processing of the data provided by a laser-based mobile mapping system will be a necessity due to the huge amount of data produced. In the future, vehiclebased laser scanning, here called mobile mapping, should see considerable use for road environment modelling. Since the geometry of the scanning and point density is different from airborne laser scanning, new algorithms are needed for information extraction. In this paper, we propose automatic methods for classifying the road marking and kerbstone points and modelling the road surface as a triangulated irregular network. On the basis of experimental tests, the mean classification accuracies obtained using automatic method for lines, zebra crossings and kerbstones were 80.6%, 92.3% and 79.7%, respectively.

  20. Shape based segmentation of MRIs of the bones in the knee using phase and intensity information

    NASA Astrophysics Data System (ADS)

    Fripp, Jurgen; Bourgeat, Pierrick; Crozier, Stuart; Ourselin, Sébastien

    2007-03-01

    The segmentation of the bones from MR images is useful for performing subsequent segmentation and quantitative measurements of cartilage tissue. In this paper, we present a shape based segmentation scheme for the bones that uses texture features derived from the phase and intensity information in the complex MR image. The phase can provide additional information about the tissue interfaces, but due to the phase unwrapping problem, this information is usually discarded. By using a Gabor filter bank on the complex MR image, texture features (including phase) can be extracted without requiring phase unwrapping. These texture features are then analyzed using a support vector machine classifier to obtain probability tissue matches. The segmentation of the bone is fully automatic and performed using a 3D active shape model based approach driven using gradient and texture information. The 3D active shape model is automatically initialized using a robust affine registration. The approach is validated using a database of 18 FLASH MR images that are manually segmented, with an average segmentation overlap (Dice similarity coefficient) of 0.92 compared to 0.9 obtained using the classifier only.

  1. Literature mining of protein-residue associations with graph rules learned through distant supervision.

    PubMed

    Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin

    2012-10-05

    We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  2. Quantifying Human Visible Color Variation from High Definition Digital Images of Orb Web Spiders.

    PubMed

    Tapia-McClung, Horacio; Ajuria Ibarra, Helena; Rao, Dinesh

    2016-01-01

    Digital processing and analysis of high resolution images of 30 individuals of the orb web spider Verrucosa arenata were performed to extract and quantify human visible colors present on the dorsal abdomen of this species. Color extraction was performed with minimal user intervention using an unsupervised algorithm to determine groups of colors on each individual spider, which was then analyzed in order to quantify and classify the colors obtained, both spatially and using energy and entropy measures of the digital images. Analysis shows that the colors cover a small region of the visible spectrum, are not spatially homogeneously distributed over the patterns and from an entropic point of view, colors that cover a smaller region on the whole pattern carry more information than colors covering a larger region. This study demonstrates the use of processing tools to create automatic systems to extract valuable information from digital images that are precise, efficient and helpful for the understanding of the underlying biology.

  3. Quantifying Human Visible Color Variation from High Definition Digital Images of Orb Web Spiders

    PubMed Central

    Ajuria Ibarra, Helena; Rao, Dinesh

    2016-01-01

    Digital processing and analysis of high resolution images of 30 individuals of the orb web spider Verrucosa arenata were performed to extract and quantify human visible colors present on the dorsal abdomen of this species. Color extraction was performed with minimal user intervention using an unsupervised algorithm to determine groups of colors on each individual spider, which was then analyzed in order to quantify and classify the colors obtained, both spatially and using energy and entropy measures of the digital images. Analysis shows that the colors cover a small region of the visible spectrum, are not spatially homogeneously distributed over the patterns and from an entropic point of view, colors that cover a smaller region on the whole pattern carry more information than colors covering a larger region. This study demonstrates the use of processing tools to create automatic systems to extract valuable information from digital images that are precise, efficient and helpful for the understanding of the underlying biology. PMID:27902724

  4. Identifying Key Hospital Service Quality Factors in Online Health Communities

    PubMed Central

    Jung, Yuchul; Hur, Cinyoung; Jung, Dain

    2015-01-01

    Background The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. Objective As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. Methods We defined social media–based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea’s two biggest online portals were used to test the effectiveness of detection of social media–based key quality factors for hospitals. Results To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media–based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). Conclusions These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies. PMID:25855612

  5. An approach to 3D model fusion in GIS systems and its application in a future ECDIS

    NASA Astrophysics Data System (ADS)

    Liu, Tao; Zhao, Depeng; Pan, Mingyang

    2016-04-01

    Three-dimensional (3D) computer graphics technology is widely used in various areas and causes profound changes. As an information carrier, 3D models are becoming increasingly important. The use of 3D models greatly helps to improve the cartographic expression and design. 3D models are more visually efficient, quicker and easier to understand and they can express more detailed geographical information. However, it is hard to efficiently and precisely fuse 3D models in local systems. The purpose of this study is to propose an automatic and precise approach to fuse 3D models in geographic information systems (GIS). It is the basic premise for subsequent uses of 3D models in local systems, such as attribute searching, spatial analysis, and so on. The basic steps of our research are: (1) pose adjustment by principal component analysis (PCA); (2) silhouette extraction by simple mesh silhouette extraction and silhouette merger; (3) size adjustment; (4) position matching. Finally, we implement the above methods in our system Automotive Intelligent Chart (AIC) 3D Electronic Chart Display and Information Systems (ECDIS). The fusion approach we propose is a common method and each calculation step is carefully designed. This approach solves the problem of cross-platform model fusion. 3D models can be from any source. They may be stored in the local cache or retrieved from Internet, or may be manually created by different tools or automatically generated by different programs. The system can be any kind of 3D GIS system.

  6. The CoFactor database: organic cofactors in enzyme catalysis.

    PubMed

    Fischer, Julia D; Holliday, Gemma L; Thornton, Janet M

    2010-10-01

    Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. CoFactor provides a web interface to access hand-curated data extracted from the literature on organic enzyme cofactors in biocatalysis, as well as automatically collected information. CoFactor includes information on the conformational and solvent accessibility variation of the enzyme-bound cofactors, as well as mechanistic and structural information about the hosting enzymes. The database is publicly available and can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/CoFactor.

  7. Indirect tissue electrophoresis: a new method for analyzing solid tissue protein.

    PubMed

    Smith, A C

    1988-01-01

    1. The eye lens core (nucleus) has been a valuable source of molecular biologic information. 2. In these studies, lens nuclei are usually homogenized so that any protein information related to anatomical subdivisions, or layers, of the nucleus is lost. 3. The present report is of a new method, indirect tissue electrophoresis (ITE), which, when applied to fish lens nuclei, permitted (a) automatic correlation of protein information with anatomic layer, (b) production of large, clear electrophoretic patterns even from small tissue samples and (c) detection of more proteins than in liquid extracts of homogenized tissues. 4. ITE seems potentially applicable to a variety of solid tissues.

  8. Automated concept-level information extraction to reduce the need for custom software and rules development.

    PubMed

    D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D

    2011-01-01

    Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.

  9. Pattern Mining for Extraction of mentions of Adverse Drug Reactions from User Comments

    PubMed Central

    Nikfarjam, Azadeh; Gonzalez, Graciela H.

    2011-01-01

    Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%. PMID:22195162

  10. Virtual Surveyor based Object Extraction from Airborne LiDAR data

    NASA Astrophysics Data System (ADS)

    Habib, Md. Ahsan

    Topographic feature detection of land cover from LiDAR data is important in various fields - city planning, disaster response and prevention, soil conservation, infrastructure or forestry. In recent years, feature classification, compliant with Object-Based Image Analysis (OBIA) methodology has been gaining traction in remote sensing and geographic information science (GIS). In OBIA, the LiDAR image is first divided into meaningful segments called object candidates. This results, in addition to spectral values, in a plethora of new information such as aggregated spectral pixel values, morphology, texture, context as well as topology. Traditional nonparametric segmentation methods rely on segmentations at different scales to produce a hierarchy of semantically significant objects. Properly tuned scale parameters are, therefore, imperative in these methods for successful subsequent classification. Recently, some progress has been made in the development of methods for tuning the parameters for automatic segmentation. However, researchers found that it is very difficult to automatically refine the tuning with respect to each object class present in the scene. Moreover, due to the relative complexity of real-world objects, the intra-class heterogeneity is very high, which leads to over-segmentation. Therefore, the method fails to deliver correctly many of the new segment features. In this dissertation, a new hierarchical 3D object segmentation algorithm called Automatic Virtual Surveyor based Object Extracted (AVSOE) is presented. AVSOE segments objects based on their distinct geometric concavity/convexity. This is achieved by strategically mapping the sloping surface, which connects the object to its background. Further analysis produces hierarchical decomposition of objects to its sub-objects at a single scale level. Extensive qualitative and qualitative results are presented to demonstrate the efficacy of this hierarchical segmentation approach.

  11. The CHEMDNER corpus of chemicals and drugs and its annotation principles.

    PubMed

    Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso

    2015-01-01

    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

  12. The CHEMDNER corpus of chemicals and drugs and its annotation principles

    PubMed Central

    2015-01-01

    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773

  13. 46 CFR 161.002-2 - Types of fire-protective systems.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ..., but not be limited to, automatic fire and smoke detecting systems, manual fire alarm systems, sample extraction smoke detection systems, watchman's supervisory systems, and combinations of these systems. (b) Automatic fire detecting systems. For the purpose of this subpart, automatic fire and smoke detecting...

  14. 46 CFR 161.002-2 - Types of fire-protective systems.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ..., but not be limited to, automatic fire and smoke detecting systems, manual fire alarm systems, sample extraction smoke detection systems, watchman's supervisory systems, and combinations of these systems. (b) Automatic fire detecting systems. For the purpose of this subpart, automatic fire and smoke detecting...

  15. LiDAR DTMs and anthropogenic feature extraction: testing the feasibility of geomorphometric parameters in floodplains

    NASA Astrophysics Data System (ADS)

    Sofia, G.; Tarolli, P.; Dalla Fontana, G.

    2012-04-01

    In floodplains, massive investments in land reclamation have always played an important role in the past for flood protection. In these contexts, human alteration is reflected by artificial features ('Anthropogenic features'), such as banks, levees or road scarps, that constantly increase and change, in response to the rapid growth of human populations. For these areas, various existing and emerging applications require up-to-date, accurate and sufficiently attributed digital data, but such information is usually lacking, especially when dealing with large-scale applications. More recently, National or Local Mapping Agencies, in Europe, are moving towards the generation of digital topographic information that conforms to reality and are highly reliable and up to date. LiDAR Digital Terrain Models (DTMs) covering large areas are readily available for public authorities, and there is a greater and more widespread interest in the application of such information by agencies responsible for land management for the development of automated methods aimed at solving geomorphological and hydrological problems. Automatic feature recognition based upon DTMs can offer, for large-scale applications, a quick and accurate method that can help in improving topographic databases, and that can overcome some of the problems associated with traditional, field-based, geomorphological mapping, such as restrictions on access, and constraints of time or costs. Although anthropogenic features as levees and road scarps are artificial structures that actually do not belong to what is usually defined as the bare ground surface, they are implicitly embedded in digital terrain models (DTMs). Automatic feature recognition based upon DTMs, therefore, can offer a quick and accurate method that does not require additional data, and that can help in improving flood defense asset information, flood modeling or other applications. In natural contexts, morphological indicators derived from high resolution topography have been proven to be reliable for feasible applications. The use of statistical operators as thresholds for these geomorphic parameters, furthermore, showed a high reliability for feature extraction in mountainous environments. The goal of this research is to test if these morphological indicators and objective thresholds can be feasible also in floodplains, where features assume different characteristics and other artificial disturbances might be present. In the work, three different geomorphic parameters are tested and applied at different scales on a LiDAR DTM of typical alluvial plain's area in the North East of Italy. The box-plot is applied to identify the threshold for feature extraction, and a filtering procedure is proposed, to improve the quality of the final results. The effectiveness of the different geomorphic parameters is analyzed, comparing automatically derived features with the surveyed ones. The results highlight the capability of high resolution topography, geomorphic indicators and statistical thresholds for anthropogenic features extraction and characterization in a floodplains context.

  16. Data Warehouse Design from HL7 Clinical Document Architecture Schema.

    PubMed

    Pecoraro, Fabrizio; Luzi, Daniela; Ricci, Fabrizio L

    2015-01-01

    This paper proposes a semi-automatic approach to extract clinical information structured in a HL7 Clinical Document Architecture (CDA) and transform it in a data warehouse dimensional model schema. It is based on a conceptual framework published in a previous work that maps the dimensional model primitives with CDA elements. Its feasibility is demonstrated providing a case study based on the analysis of vital signs gathered during laboratory tests.

  17. Using NLP to identify cancer cases in imaging reports drawn from radiology information systems.

    PubMed

    Patrick, Jon; Asgari, Pooyan; Li, Min; Nguyen, Dung

    2013-01-01

    A Natural Language processing (NLP) classifier has been developed for the Victorian and NSW Cancer Registries with the purpose of automatically identifying cancer reports from imaging services, transmitting them to the Registries and then extracting pertinent cancer information. Large scale trials conducted on over 40,000 reports show the sensitivity for identifying reportable cancer reports is above 98% with a specificity above 96%. Detection of tumour stream, report purpose, and a variety of extracted content is generally above 90% specificity. The differences between report layout and authoring strategies across imaging services appear to require different classifiers to retain this high level of accuracy. Linkage of the imaging data with existing registry records (hospital and pathology reports) to derive stage and recurrence of cancer has commenced and shown very promising results.

  18. Computerized image analysis for acetic acid induced intraepithelial lesions

    NASA Astrophysics Data System (ADS)

    Li, Wenjing; Ferris, Daron G.; Lieberman, Rich W.

    2008-03-01

    Cervical Intraepithelial Neoplasia (CIN) exhibits certain morphologic features that can be identified during a visual inspection exam. Immature and dysphasic cervical squamous epithelium turns white after application of acetic acid during the exam. The whitening process occurs visually over several minutes and subjectively discriminates between dysphasic and normal tissue. Digital imaging technologies allow us to assist the physician analyzing the acetic acid induced lesions (acetowhite region) in a fully automatic way. This paper reports a study designed to measure multiple parameters of the acetowhitening process from two images captured with a digital colposcope. One image is captured before the acetic acid application, and the other is captured after the acetic acid application. The spatial change of the acetowhitening is extracted using color and texture information in the post acetic acid image; the temporal change is extracted from the intensity and color changes between the post acetic acid and pre acetic acid images with an automatic alignment. The imaging and data analysis system has been evaluated with a total of 99 human subjects and demonstrate its potential to screening underserved women where access to skilled colposcopists is limited.

  19. Study of Burn Scar Extraction Automatically Based on Level Set Method using Remote Sensing Data

    PubMed Central

    Liu, Yang; Dai, Qin; Liu, JianBo; Liu, ShiBin; Yang, Jin

    2014-01-01

    Burn scar extraction using remote sensing data is an efficient way to precisely evaluate burn area and measure vegetation recovery. Traditional burn scar extraction methodologies have no well effect on burn scar image with blurred and irregular edges. To address these issues, this paper proposes an automatic method to extract burn scar based on Level Set Method (LSM). This method utilizes the advantages of the different features in remote sensing images, as well as considers the practical needs of extracting the burn scar rapidly and automatically. This approach integrates Change Vector Analysis (CVA), Normalized Difference Vegetation Index (NDVI) and the Normalized Burn Ratio (NBR) to obtain difference image and modifies conventional Level Set Method Chan-Vese (C-V) model with a new initial curve which results from a binary image applying K-means method on fitting errors of two near-infrared band images. Landsat 5 TM and Landsat 8 OLI data sets are used to validate the proposed method. Comparison with conventional C-V model, OSTU algorithm, Fuzzy C-mean (FCM) algorithm are made to show that the proposed approach can extract the outline curve of fire burn scar effectively and exactly. The method has higher extraction accuracy and less algorithm complexity than that of the conventional C-V model. PMID:24503563

  20. Semi-automatic segmentation of brain tumors using population and individual information.

    PubMed

    Wu, Yao; Yang, Wei; Jiang, Jun; Li, Shuanqian; Feng, Qianjin; Chen, Wufan

    2013-08-01

    Efficient segmentation of tumors in medical images is of great practical importance in early diagnosis and radiation plan. This paper proposes a novel semi-automatic segmentation method based on population and individual statistical information to segment brain tumors in magnetic resonance (MR) images. First, high-dimensional image features are extracted. Neighborhood components analysis is proposed to learn two optimal distance metrics, which contain population and patient-specific information, respectively. The probability of each pixel belonging to the foreground (tumor) and the background is estimated by the k-nearest neighborhood classifier under the learned optimal distance metrics. A cost function for segmentation is constructed through these probabilities and is optimized using graph cuts. Finally, some morphological operations are performed to improve the achieved segmentation results. Our dataset consists of 137 brain MR images, including 68 for training and 69 for testing. The proposed method overcomes segmentation difficulties caused by the uneven gray level distribution of the tumors and even can get satisfactory results if the tumors have fuzzy edges. Experimental results demonstrate that the proposed method is robust to brain tumor segmentation.

  1. From gaze cueing to perspective taking: Revisiting the claim that we automatically compute where or what other people are looking at

    PubMed Central

    Bukowski, Henryk; Hietanen, Jari K.; Samson, Dana

    2015-01-01

    ABSTRACT Two paradigms have shown that people automatically compute what or where another person is looking at. In the visual perspective-taking paradigm, participants judge how many objects they see; whereas, in the gaze cueing paradigm, participants identify a target. Unlike in the former task, in the latter task, the influence of what or where the other person is looking at is only observed when the other person is presented alone before the task-relevant objects. We show that this discrepancy across the two paradigms is not due to differences in visual settings (Experiment 1) or available time to extract the directional information (Experiment 2), but that it is caused by how attention is deployed in response to task instructions (Experiment 3). Thus, the mere presence of another person in the field of view is not sufficient to compute where/what that person is looking at, which qualifies the claimed automaticity of such computations. PMID:26924936

  2. From gaze cueing to perspective taking: Revisiting the claim that we automatically compute where or what other people are looking at.

    PubMed

    Bukowski, Henryk; Hietanen, Jari K; Samson, Dana

    2015-09-14

    Two paradigms have shown that people automatically compute what or where another person is looking at. In the visual perspective-taking paradigm, participants judge how many objects they see; whereas, in the gaze cueing paradigm, participants identify a target. Unlike in the former task, in the latter task, the influence of what or where the other person is looking at is only observed when the other person is presented alone before the task-relevant objects. We show that this discrepancy across the two paradigms is not due to differences in visual settings (Experiment 1) or available time to extract the directional information (Experiment 2), but that it is caused by how attention is deployed in response to task instructions (Experiment 3). Thus, the mere presence of another person in the field of view is not sufficient to compute where/what that person is looking at, which qualifies the claimed automaticity of such computations.

  3. Computed Tomography-Based Biomarker for Longitudinal Assessment of Disease Burden in Pulmonary Tuberculosis.

    PubMed

    Gordaliza, P M; Muñoz-Barrutia, A; Via, L E; Sharpe, S; Desco, M; Vaquero, J J

    2018-05-29

    Computed tomography (CT) images enable capturing specific manifestations of tuberculosis (TB) that are undetectable using common diagnostic tests, which suffer from limited specificity. In this study, we aimed to automatically quantify the burden of Mycobacterium tuberculosis (Mtb) using biomarkers extracted from x-ray CT images. Nine macaques were aerosol-infected with Mtb and treated with various antibiotic cocktails. Chest CT scans were acquired in all animals at specific times independently of disease progression. First, a fully automatic segmentation of the healthy lungs from the acquired chest CT volumes was performed and air-like structures were extracted. Next, unsegmented pulmonary regions corresponding to damaged parenchymal tissue and TB lesions were included. CT biomarkers were extracted by classification of the probability distribution of the intensity of the segmented images into three tissue types: (1) Healthy tissue, parenchyma free from infection; (2) soft diseased tissue, and (3) hard diseased tissue. The probability distribution of tissue intensities was assumed to follow a Gaussian mixture model. The thresholds identifying each region were automatically computed using an expectation-maximization algorithm. The estimated longitudinal course of TB infection shows that subjects that have followed the same antibiotic treatment present a similar response (relative change in the diseased volume) with respect to baseline. More interestingly, the correlation between the diseased volume (soft tissue + hard tissue), which was manually delineated by an expert, and the automatically extracted volume with the proposed method was very strong (R 2  ≈ 0.8). We present a methodology that is suitable for automatic extraction of a radiological biomarker from CT images for TB disease burden. The method could be used to describe the longitudinal evolution of Mtb infection in a clinical trial devoted to the design of new drugs.

  4. A new approach for automatic matching of ground control points in urban areas from heterogeneous images

    NASA Astrophysics Data System (ADS)

    Cong, Chao; Liu, Dingsheng; Zhao, Lingjun

    2008-12-01

    This paper discusses a new method for the automatic matching of ground control points (GCPs) between satellite remote sensing Image and digital raster graphic (DRG) in urban areas. The key of this method is to automatically extract tie point pairs according to geographic characters from such heterogeneous images. Since there are big differences between such heterogeneous images respect to texture and corner features, more detail analyzations are performed to find similarities and differences between high resolution remote sensing Image and (DRG). Furthermore a new algorithms based on the fuzzy-c means (FCM) method is proposed to extract linear feature in remote sensing Image. Based on linear feature, crossings and corners extracted from these features are chosen as GCPs. On the other hand, similar method was used to find same features from DRGs. Finally, Hausdorff Distance was adopted to pick matching GCPs from above two GCP groups. Experiences shown the method can extract GCPs from such images with a reasonable RMS error.

  5. Automatic Feature Extraction from Planetary Images

    NASA Technical Reports Server (NTRS)

    Troglio, Giulia; Le Moigne, Jacqueline; Benediktsson, Jon A.; Moser, Gabriele; Serpico, Sebastiano B.

    2010-01-01

    With the launch of several planetary missions in the last decade, a large amount of planetary images has already been acquired and much more will be available for analysis in the coming years. The image data need to be analyzed, preferably by automatic processing techniques because of the huge amount of data. Although many automatic feature extraction methods have been proposed and utilized for Earth remote sensing images, these methods are not always applicable to planetary data that often present low contrast and uneven illumination characteristics. Different methods have already been presented for crater extraction from planetary images, but the detection of other types of planetary features has not been addressed yet. Here, we propose a new unsupervised method for the extraction of different features from the surface of the analyzed planet, based on the combination of several image processing techniques, including a watershed segmentation and the generalized Hough Transform. The method has many applications, among which image registration and can be applied to arbitrary planetary images.

  6. Automatic Contour Extraction of Facial Organs for Frontal Facial Images with Various Facial Expressions

    NASA Astrophysics Data System (ADS)

    Kobayashi, Hiroshi; Suzuki, Seiji; Takahashi, Hisanori; Tange, Akira; Kikuchi, Kohki

    This study deals with a method to realize automatic contour extraction of facial features such as eyebrows, eyes and mouth for the time-wise frontal face with various facial expressions. Because Snakes which is one of the most famous methods used to extract contours, has several disadvantages, we propose a new method to overcome these issues. We define the elastic contour model in order to hold the contour shape and then determine the elastic energy acquired by the amount of modification of the elastic contour model. Also we utilize the image energy obtained by brightness differences of the control points on the elastic contour model. Applying the dynamic programming method, we determine the contour position where the total value of the elastic energy and the image energy becomes minimum. Employing 1/30s time-wise facial frontal images changing from neutral to one of six typical facial expressions obtained from 20 subjects, we have estimated our method and find it enables high accuracy automatic contour extraction of facial features.

  7. Automatic extraction and identification of users' responses in Facebook medical quizzes.

    PubMed

    Rodríguez-González, Alejandro; Menasalvas Ruiz, Ernestina; Mayer Pujadas, Miguel A

    2016-04-01

    In the last few years the use of social media in medicine has grown exponentially, providing a new area of research based on the analysis and use of Web 2.0 capabilities. In addition, the use of social media in medical education is a subject of particular interest which has been addressed in several studies. One example of this application is the medical quizzes of The New England Journal of Medicine (NEJM) that regularly publishes a set of questions through their Facebook timeline. We present an approach for the automatic extraction of medical quizzes and their associated answers on a Facebook platform by means of a set of computer-based methods and algorithms. We have developed a tool for the extraction and analysis of medical quizzes stored on Facebook timeline at the NEJM Facebook page, based on a set of computer-based methods and algorithms using Java. The system is divided into two main modules: Crawler and Data retrieval. The system was launched on December 31, 2014 and crawled through a total of 3004 valid posts and 200,081 valid comments. The first post was dated on July 23, 2009 and the last one on December 30, 2014. 285 quizzes were analyzed with 32,780 different users providing answers to the aforementioned quizzes. Of the 285 quizzes, patterns were found in 261 (91.58%). From these 261 quizzes where trends were found, we saw that users follow trends of incorrect answers in 13 quizzes and trends of correct answers in 248. This tool is capable of automatically identifying the correct and wrong answers to a quiz provided on Facebook posts in a text format to a quiz, with a small rate of false negative cases and this approach could be applicable to the extraction and analysis of other sources after including some adaptations of the information on the Internet. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  8. D Central Line Extraction of Fossil Oyster Shells

    NASA Astrophysics Data System (ADS)

    Djuricic, A.; Puttonen, E.; Harzhauser, M.; Mandic, O.; Székely, B.; Pfeifer, N.

    2016-06-01

    Photogrammetry provides a powerful tool to digitally document protected, inaccessible, and rare fossils. This saves manpower in relation to current documentation practice and makes the fragile specimens more available for paleontological analysis and public education. In this study, high resolution orthophoto (0.5 mm) and digital surface models (1 mm) are used to define fossil boundaries that are then used as an input to automatically extract fossil length information via central lines. In general, central lines are widely used in geosciences as they ease observation, monitoring and evaluation of object dimensions. Here, the 3D central lines are used in a novel paleontological context to study fossilized oyster shells with photogrammetric and LiDAR-obtained 3D point cloud data. 3D central lines of 1121 Crassostrea gryphoides oysters of various shapes and sizes were computed in the study. Central line calculation included: i) Delaunay triangulation between the fossil shell boundary points and formation of the Voronoi diagram; ii) extraction of Voronoi vertices and construction of a connected graph tree from them; iii) reduction of the graph to the longest possible central line via Dijkstra's algorithm; iv) extension of longest central line to the shell boundary and smoothing by an adjustment of cubic spline curve; and v) integration of the central line into the corresponding 3D point cloud. The resulting longest path estimate for the 3D central line is a size parameter that can be applied in oyster shell age determination both in paleontological and biological applications. Our investigation evaluates ability and performance of the central line method to measure shell sizes accurately by comparing automatically extracted central lines with manually collected reference data used in paleontological analysis. Our results show that the automatically obtained central line length overestimated the manually collected reference by 1.5% in the test set, which is deemed sufficient for the selected paleontological application, namely shell age determination.

  9. a Framework of Change Detection Based on Combined Morphologica Features and Multi-Index Classification

    NASA Astrophysics Data System (ADS)

    Li, S.; Zhang, S.; Yang, D.

    2017-09-01

    Remote sensing images are particularly well suited for analysis of land cover change. In this paper, we present a new framework for detection of changing land cover using satellite imagery. Morphological features and a multi-index are used to extract typical objects from the imagery, including vegetation, water, bare land, buildings, and roads. Our method, based on connected domains, is different from traditional methods; it uses image segmentation to extract morphological features, while the enhanced vegetation index (EVI), the differential water index (NDWI) are used to extract vegetation and water, and a fragmentation index is used to the correct extraction results of water. HSV transformation and threshold segmentation extract and remove the effects of shadows on extraction results. Change detection is performed on these results. One of the advantages of the proposed framework is that semantic information is extracted automatically using low-level morphological features and indexes. Another advantage is that the proposed method detects specific types of change without any training samples. A test on ZY-3 images demonstrates that our framework has a promising capability to detect change.

  10. Mapping from Space - Ontology Based Map Production Using Satellite Imageries

    NASA Astrophysics Data System (ADS)

    Asefpour Vakilian, A.; Momeni, M.

    2013-09-01

    Determination of the maximum ability for feature extraction from satellite imageries based on ontology procedure using cartographic feature determination is the main objective of this research. Therefore, a special ontology has been developed to extract maximum volume of information available in different high resolution satellite imageries and compare them to the map information layers required in each specific scale due to unified specification for surveying and mapping. ontology seeks to provide an explicit and comprehensive classification of entities in all sphere of being. This study proposes a new method for automatic maximum map feature extraction and reconstruction of high resolution satellite images. For example, in order to extract building blocks to produce 1 : 5000 scale and smaller maps, the road networks located around the building blocks should be determined. Thus, a new building index has been developed based on concepts obtained from ontology. Building blocks have been extracted with completeness about 83%. Then, road networks have been extracted and reconstructed to create a uniform network with less discontinuity on it. In this case, building blocks have been extracted with proper performance and the false positive value from confusion matrix was reduced by about 7%. Results showed that vegetation cover and water features have been extracted completely (100%) and about 71% of limits have been extracted. Also, the proposed method in this article had the ability to produce a map with largest scale possible from any multi spectral high resolution satellite imagery equal to or smaller than 1 : 5000.

  11. Mapping from Space - Ontology Based Map Production Using Satellite Imageries

    NASA Astrophysics Data System (ADS)

    Asefpour Vakilian, A.; Momeni, M.

    2013-09-01

    Determination of the maximum ability for feature extraction from satellite imageries based on ontology procedure using cartographic feature determination is the main objective of this research. Therefore, a special ontology has been developed to extract maximum volume of information available in different high resolution satellite imageries and compare them to the map information layers required in each specific scale due to unified specification for surveying and mapping. ontology seeks to provide an explicit and comprehensive classification of entities in all sphere of being. This study proposes a new method for automatic maximum map feature extraction and reconstruction of high resolution satellite images. For example, in order to extract building blocks to produce 1 : 5000 scale and smaller maps, the road networks located around the building blocks should be determined. Thus, a new building index has been developed based on concepts obtained from ontology. Building blocks have been extracted with completeness about 83 %. Then, road networks have been extracted and reconstructed to create a uniform network with less discontinuity on it. In this case, building blocks have been extracted with proper performance and the false positive value from confusion matrix was reduced by about 7 %. Results showed that vegetation cover and water features have been extracted completely (100 %) and about 71 % of limits have been extracted. Also, the proposed method in this article had the ability to produce a map with largest scale possible from any multi spectral high resolution satellite imagery equal to or smaller than 1 : 5000.

  12. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach

    PubMed Central

    2012-01-01

    Background Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. Methods We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. Results We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. Conclusions We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data. PMID:22759462

  13. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.

    PubMed

    Ratkovic, Zorana; Golik, Wiktoria; Warnier, Pierre

    2012-06-26

    Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data.

  14. Extracting rate changes in transcriptional regulation from MEDLINE abstracts.

    PubMed

    Liu, Wenting; Miao, Kui; Li, Guangxia; Chang, Kuiyu; Zheng, Jie; Rajapakse, Jagath C

    2014-01-01

    Time delays are important factors that are often neglected in gene regulatory network (GRN) inference models. Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. In this paper, we mine evidences for time delays related to the transcriptional regulation of yeast from the PubMed abstracts. Since the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidences of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidences for time delays (shorter or longer) in GRN. Thus, we focus on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labeled with such events. In order to capture these events automatically, we create an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also propose effective feature extraction methods based on the created ontology to identify the direct evidences with specific details of these events. Our ontologies outperform existing state-of-the-art gene regulation ontologies in the automatic rule learning method applied to our corpus. The proposed deterministic ontology rule-based method can achieve comparable performance to the automatic rule learning method based on decision trees. This demonstrates the effectiveness of our ontology in identifying rate-changing events. We also tested the effectiveness of the proposed feature mining methods on detecting direct evidence of events. Experimental results show that the machine learning method on these features achieves an F1-score of 71.43%. The manually labeled corpus of events relating to rate changes in transcriptional regulation for yeast is available in https://sites.google.com/site/wentingntu/data. The created ontologies summarized both biological causes of rate changes in transcriptional regulation and corresponding positive and negative textual patterns from the corpus. They are demonstrated to be effective in identifying rate-changing events, which shows the benefits of combining textual patterns and biological knowledge on extracting complex biological events.

  15. Automatic Rail Extraction and Celarance Check with a Point Cloud Captured by Mls in a Railway

    NASA Astrophysics Data System (ADS)

    Niina, Y.; Honma, R.; Honma, Y.; Kondo, K.; Tsuji, K.; Hiramatsu, T.; Oketani, E.

    2018-05-01

    Recently, MLS (Mobile Laser Scanning) has been successfully used in a road maintenance. In this paper, we present the application of MLS for the inspection of clearance along railway tracks of West Japan Railway Company. Point clouds around the track are captured by MLS mounted on a bogie and rail position can be determined by matching the shape of the ideal rail head with respect to the point cloud by ICP algorithm. A clearance check is executed automatically with virtual clearance model laid along the extracted rail. As a result of evaluation, the accuracy of extracting rail positions is less than 3 mm. With respect to the automatic clearance check, the objects inside the clearance and the ones related to a contact line is successfully detected by visual confirmation.

  16. Robust extraction of the aorta and pulmonary artery from 3D MDCT image data

    NASA Astrophysics Data System (ADS)

    Taeprasartsit, Pinyo; Higgins, William E.

    2010-03-01

    Accurate definition of the aorta and pulmonary artery from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. This work presents robust methods for defining the aorta and pulmonary artery in the central chest. The methods work on both contrast enhanced and no-contrast 3D MDCT image data. The automatic methods use a common approach employing model fitting and selection and adaptive refinement. During the occasional event that more precise vascular extraction is desired or the method fails, we also have an alternate semi-automatic fail-safe method. The semi-automatic method extracts the vasculature by extending the medial axes into a user-guided direction. A ground-truth study over a series of 40 human 3D MDCT images demonstrates the efficacy, accuracy, robustness, and efficiency of the methods.

  17. Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks

    PubMed Central

    2012-01-01

    Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614

  18. Automated Classification of Heritage Buildings for As-Built Bim Using Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Bassier, M.; Vergauwen, M.; Van Genechten, B.

    2017-08-01

    Semantically rich three dimensional models such as Building Information Models (BIMs) are increasingly used in digital heritage. They provide the required information to varying stakeholders during the different stages of the historic buildings life cyle which is crucial in the conservation process. The creation of as-built BIM models is based on point cloud data. However, manually interpreting this data is labour intensive and often leads to misinterpretations. By automatically classifying the point cloud, the information can be proccesed more effeciently. A key aspect in this automated scan-to-BIM process is the classification of building objects. In this research we look to automatically recognise elements in existing buildings to create compact semantic information models. Our algorithm efficiently extracts the main structural components such as floors, ceilings, roofs, walls and beams despite the presence of significant clutter and occlusions. More specifically, Support Vector Machines (SVM) are proposed for the classification. The algorithm is evaluated using real data of a variety of existing buildings. The results prove that the used classifier recognizes the objects with both high precision and recall. As a result, entire data sets are reliably labelled at once. The approach enables experts to better document and process heritage assets.

  19. JANIS: NEA JAva-based Nuclear Data Information System

    NASA Astrophysics Data System (ADS)

    Soppera, Nicolas; Bossant, Manuel; Cabellos, Oscar; Dupont, Emmeric; Díez, Carlos J.

    2017-09-01

    JANIS (JAva-based Nuclear Data Information System) software is developed by the OECD Nuclear Energy Agency (NEA) Data Bank to facilitate the visualization and manipulation of nuclear data, giving access to evaluated nuclear data libraries, such as ENDF, JEFF, JENDL, TENDL etc., and also to experimental nuclear data (EXFOR) and bibliographical references (CINDA). It is available as a standalone Java program, downloadable and distributed on DVD and also a web application available on the NEA website. One of the main new features in JANIS is the scripting capability via command line, which notably automatizes plots generation and permits automatically extracting data from the JANIS database. Recent NEA software developments rely on these JANIS features to access nuclear data, for example the Nuclear Data Sensitivity Tool (NDaST) makes use of covariance data in BOXER and COVERX formats, which are retrieved from the JANIS database. New features added in this version of the JANIS software are described along this paper with some examples.

  20. Automatic emotional expression analysis from eye area

    NASA Astrophysics Data System (ADS)

    Akkoç, Betül; Arslan, Ahmet

    2015-02-01

    Eyes play an important role in expressing emotions in nonverbal communication. In the present study, emotional expression classification was performed based on the features that were automatically extracted from the eye area. Fırst, the face area and the eye area were automatically extracted from the captured image. Afterwards, the parameters to be used for the analysis through discrete wavelet transformation were obtained from the eye area. Using these parameters, emotional expression analysis was performed through artificial intelligence techniques. As the result of the experimental studies, 6 universal emotions consisting of expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success rate of 84% using artificial neural networks.

  1. Towards automatic music transcription: note extraction based on independent subspace analysis

    NASA Astrophysics Data System (ADS)

    Wellhausen, Jens; Hoynck, Michael

    2005-01-01

    Due to the increasing amount of music available electronically the need of automatic search, retrieval and classification systems for music becomes more and more important. In this paper an algorithm for automatic transcription of polyphonic piano music into MIDI data is presented, which is a very interesting basis for database applications, music analysis and music classification. The first part of the algorithm performs a note accurate temporal audio segmentation. In the second part, the resulting segments are examined using Independent Subspace Analysis to extract sounding notes. Finally, the results are used to build a MIDI file as a new representation of the piece of music which is examined.

  2. Towards automatic music transcription: note extraction based on independent subspace analysis

    NASA Astrophysics Data System (ADS)

    Wellhausen, Jens; Höynck, Michael

    2004-12-01

    Due to the increasing amount of music available electronically the need of automatic search, retrieval and classification systems for music becomes more and more important. In this paper an algorithm for automatic transcription of polyphonic piano music into MIDI data is presented, which is a very interesting basis for database applications, music analysis and music classification. The first part of the algorithm performs a note accurate temporal audio segmentation. In the second part, the resulting segments are examined using Independent Subspace Analysis to extract sounding notes. Finally, the results are used to build a MIDI file as a new representation of the piece of music which is examined.

  3. Comparison of Landsat-8, ASTER and Sentinel 1 satellite remote sensing data in automatic lineaments extraction: A case study of Sidi Flah-Bouskour inlier, Moroccan Anti Atlas

    NASA Astrophysics Data System (ADS)

    Adiri, Zakaria; El Harti, Abderrazak; Jellouli, Amine; Lhissou, Rachid; Maacha, Lhou; Azmi, Mohamed; Zouhair, Mohamed; Bachaoui, El Mostafa

    2017-12-01

    Certainly, lineament mapping occupies an important place in several studies, including geology, hydrogeology and topography etc. With the help of remote sensing techniques, lineaments can be better identified due to strong advances in used data and methods. This allowed exceeding the usual classical procedures and achieving more precise results. The aim of this work is the comparison of ASTER, Landsat-8 and Sentinel 1 data sensors in automatic lineament extraction. In addition to image data, the followed approach includes the use of the pre-existing geological map, the Digital Elevation Model (DEM) as well as the ground truth. Through a fully automatic approach consisting of a combination of edge detection algorithm and line-linking algorithm, we have found the optimal parameters for automatic lineament extraction in the study area. Thereafter, the comparison and the validation of the obtained results showed that the Sentinel 1 data are more efficient in restitution of lineaments. This indicates the performance of the radar data compared to those optical in this kind of study.

  4. MEMS product engineering: methodology and tools

    NASA Astrophysics Data System (ADS)

    Ortloff, Dirk; Popp, Jens; Schmidt, Thilo; Hahn, Kai; Mielke, Matthias; Brück, Rainer

    2011-03-01

    The development of MEMS comprises the structural design as well as the definition of an appropriate manufacturing process. Technology constraints have a considerable impact on the device design and vice-versa. Product design and technology development are therefore concurrent tasks. Based on a comprehensive methodology the authors introduce a software environment that links commercial design tools from both area into a common design flow. In this paper emphasis is put on automatic low threshold data acquisition. The intention is to collect and categorize development data for further developments with minimum overhead and minimum disturbance of established business processes. As a first step software tools that automatically extract data from spreadsheets or file-systems and put them in context with existing information are presented. The developments are currently carried out in a European research project.

  5. SHIELD: FITGALAXY -- A Software Package for Automatic Aperture Photometry of Extended Sources

    NASA Astrophysics Data System (ADS)

    Marshall, Melissa

    2013-01-01

    Determining the parameters of extended sources, such as galaxies, is a common but time-consuming task. Finding a photometric aperture that encompasses the majority of the flux of a source and identifying and excluding contaminating objects is often done by hand - a lengthy and difficult to reproduce process. To make extracting information from large data sets both quick and repeatable, I have developed a program called FITGALAXY, written in IDL. This program uses minimal user input to automatically fit an aperture to, and perform aperture and surface photometry on, an extended source. FITGALAXY also automatically traces the outlines of surface brightness thresholds and creates surface brightness profiles, which can then be used to determine the radial properties of a source. Finally, the program performs automatic masking of contaminating sources. Masks and apertures can be applied to multiple images (regardless of the WCS solution or plate scale) in order to accurately measure the same source at different wavelengths. I present the fluxes, as measured by the program, of a selection of galaxies from the Local Volume Legacy Survey. I then compare these results with the fluxes given by Dale et al. (2009) in order to assess the accuracy of FITGALAXY.

  6. Automatic movie skimming with general tempo analysis

    NASA Astrophysics Data System (ADS)

    Lee, Shih-Hung; Yeh, Chia-Hung; Kuo, C. C. J.

    2003-11-01

    Story units are extracted by general tempo analysis including tempos analysis including tempos of audio and visual information in this research. Although many schemes have been proposed to successfully segment video data into shots using basic low-level features, how to group shots into meaningful units called story units is still a challenging problem. By focusing on a certain type of video such as sport or news, we can explore models with the specific application domain knowledge. For movie contents, many heuristic rules based on audiovisual clues have been proposed with limited success. We propose a method to extract story units using general tempo analysis. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.

  7. Automated measurement of birefringence - Development and experimental evaluation of the techniques

    NASA Technical Reports Server (NTRS)

    Voloshin, A. S.; Redner, A. S.

    1989-01-01

    Traditional photoelasticity has started to lose its appeal since it requires a well-trained specialist to acquire and interpret results. A spectral-contents-analysis approach may help to revive this old, but still useful technique. Light intensity of the beam passed through the stressed specimen contains all the information necessary to automatically extract the value of retardation. This is done by using a photodiode array to investigate the spectral contents of the light beam. Three different techniques to extract the value of retardation from the spectral contents of the light are discussed and evaluated. An experimental system was built which demonstrates the ability to evaluate retardation values in real time.

  8. Automatic Extraction of Planetary Image Features

    NASA Technical Reports Server (NTRS)

    Troglio, G.; LeMoigne, J.; Moser, G.; Serpico, S. B.; Benediktsson, J. A.

    2009-01-01

    With the launch of several Lunar missions such as the Lunar Reconnaissance Orbiter (LRO) and Chandrayaan-1, a large amount of Lunar images will be acquired and will need to be analyzed. Although many automatic feature extraction methods have been proposed and utilized for Earth remote sensing images, these methods are not always applicable to Lunar data that often present low contrast and uneven illumination characteristics. In this paper, we propose a new method for the extraction of Lunar features (that can be generalized to other planetary images), based on the combination of several image processing techniques, a watershed segmentation and the generalized Hough Transform. This feature extraction has many applications, among which image registration.

  9. Kansas environmental and resource study: A Great Plains model, tasks 1-6

    NASA Technical Reports Server (NTRS)

    Haralick, R. M.; Kanemasu, E. T.; Morain, S. A.; Yarger, H. L. (Principal Investigator); Ulaby, F. T.; Shanmugam, K. S.; Williams, D. L.; Mccauley, J. R.; Mcnaughton, J. L.

    1972-01-01

    There are no author identified significant results in this report. Environmental and resources investigations in Kansas utilizing ERTS-1 imagery are summarized for the following areas: (1) use of feature extraction techniqued for texture context information in ERTS imagery; (2) interpretation and automatic image enhancement; (3) water use, production, and disease detection and predictions for wheat; (4) ERTS-1 agricultural statistics; (5) monitoring fresh water resources; and (6) ground pattern analysis in the Great Plains.

  10. Phase fluctuation spectra: New radio science information to become available in the DSN tracking system Mark III-77

    NASA Technical Reports Server (NTRS)

    Berman, A. L.

    1977-01-01

    An algorithm was developed for the continuous and automatic computation of Doppler noise concurrently at four sample rate intervals, evenly spanning three orders of magnitude. Average temporal Doppler phase fluctuation spectra will be routinely available in the DSN tracking system Mark III-77 and require little additional processing. The basic (noise) data will be extracted from the archival tracking data file (ATDF) of the tracking data management system.

  11. Crystal phase identification

    DOEpatents

    Michael, Joseph R.; Goehner, Raymond P.; Schlienger, Max E.

    2001-01-01

    A method and apparatus for determining the crystalline phase and crystalline characteristics of a sample. This invention provides a method and apparatus for unambiguously identifying and determining the crystalline phase and crystalline characteristics of a sample by using an electron beam generator, such as a scanning electron microscope, to obtain a backscattered electron Kikuchi pattern of a sample, and extracting crystallographic and composition data that is matched to database information to provide a quick and automatic method to identify crystalline phases.

  12. Count Me In! on the Automaticity of Numerosity Processing

    ERIC Educational Resources Information Center

    Naparstek, Sharon; Henik, Avishai

    2010-01-01

    Extraction of numerosity (i.e., enumeration) is an essential component of mathematical abilities. The current study asked how automatic is the processing of numerosity and whether automatic activation is task dependent. Participants were presented with displays containing a variable number of digits and were asked to pay attention to the number of…

  13. Kernel-Based Relevance Analysis with Enhanced Interpretability for Detection of Brain Activity Patterns

    PubMed Central

    Alvarez-Meza, Andres M.; Orozco-Gutierrez, Alvaro; Castellanos-Dominguez, German

    2017-01-01

    We introduce Enhanced Kernel-based Relevance Analysis (EKRA) that aims to support the automatic identification of brain activity patterns using electroencephalographic recordings. EKRA is a data-driven strategy that incorporates two kernel functions to take advantage of the available joint information, associating neural responses to a given stimulus condition. Regarding this, a Centered Kernel Alignment functional is adjusted to learning the linear projection that best discriminates the input feature set, optimizing the required free parameters automatically. Our approach is carried out in two scenarios: (i) feature selection by computing a relevance vector from extracted neural features to facilitating the physiological interpretation of a given brain activity task, and (ii) enhanced feature selection to perform an additional transformation of relevant features aiming to improve the overall identification accuracy. Accordingly, we provide an alternative feature relevance analysis strategy that allows improving the system performance while favoring the data interpretability. For the validation purpose, EKRA is tested in two well-known tasks of brain activity: motor imagery discrimination and epileptic seizure detection. The obtained results show that the EKRA approach estimates a relevant representation space extracted from the provided supervised information, emphasizing the salient input features. As a result, our proposal outperforms the state-of-the-art methods regarding brain activity discrimination accuracy with the benefit of enhanced physiological interpretation about the task at hand. PMID:29056897

  14. Model of experts for decision support in the diagnosis of leukemia patients.

    PubMed

    Corchado, Juan M; De Paz, Juan F; Rodríguez, Sara; Bajo, Javier

    2009-07-01

    Recent advances in the field of biomedicine, specifically in the field of genomics, have led to an increase in the information available for conducting expression analysis. Expression analysis is a technique used in transcriptomics, a branch of genomics that deals with the study of messenger ribonucleic acid (mRNA) and the extraction of information contained in the genes. This increase in information is reflected in the exon arrays, which require the use of new techniques in order to extract the information. The purpose of this study is to provide a tool based on a mixture of experts model that allows the analysis of the information contained in the exon arrays, from which automatic classifications for decision support in diagnoses of leukemia patients can be made. The proposed model integrates several cooperative algorithms characterized for their efficiency for data processing, filtering, classification and knowledge extraction. The Cancer Institute of the University of Salamanca is making an effort to develop tools to automate the evaluation of data and to facilitate de analysis of information. This proposal is a step forward in this direction and the first step toward the development of a mixture of experts tool that integrates different cognitive and statistical approaches to deal with the analysis of exon arrays. The mixture of experts model presented within this work provides great capacities for learning and adaptation to the characteristics of the problem in consideration, using novel algorithms in each of the stages of the analysis process that can be easily configured and combined, and provides results that notably improve those provided by the existing methods for exon arrays analysis. The material used consists of data from exon arrays provided by the Cancer Institute that contain samples from leukemia patients. The methodology used consists of a system based on a mixture of experts. Each one of the experts incorporates novel artificial intelligence techniques that improve the process of carrying out various tasks such as pre-processing, filtering, classification and extraction of knowledge. This article will detail the manner in which individual experts are combined so that together they generate a system capable of extracting knowledge, thus permitting patients to be classified in an automatic and efficient manner that is also comprehensible for medical personnel. The system has been tested in a real setting and has been used for classifying patients who suffer from different forms of leukemia at various stages. Personnel from the Cancer Institute supervised and participated throughout the testing period. Preliminary results are promising, notably improving the results obtained with previously used tools. The medical staff from the Cancer Institute considers the tools that have been developed to be positive and very useful in a supporting capacity for carrying out their daily tasks. Additionally the mixture of experts supplies a tool for the extraction of necessary information in order to explain the associations that have been made in simple terms. That is, it permits the extraction of knowledge for each classification made and generalized in order to be used in subsequent classifications. This allows for a large amount of learning and adaptation within the proposed system.

  15. [Technologies for Complex Intelligent Clinical Data Analysis].

    PubMed

    Baranov, A A; Namazova-Baranova, L S; Smirnov, I V; Devyatkin, D A; Shelmanov, A O; Vishneva, E A; Antonova, E V; Smirnov, V I

    2016-01-01

    The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient's features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented. Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality. the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center. Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as "negation" (indicates that the disease is absent), "no patient" (indicates that the disease refers to the patient's family member, but not to the patient), "severity of illness", disease course", "body region to which the disease refers". Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the methodfor determining the most informative patients'features are also proposed. Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records ofpatients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases. The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare.

  16. Array data extractor (ADE): a LabVIEW program to extract and merge gene array data

    PubMed Central

    2013-01-01

    Background Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Findings Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Conclusions Although existing software allows for complex data analyses, the LabVIEW based program presented here, “Array Data Extractor (ADE)”, provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge. PMID:24289243

  17. A semantic model for multimodal data mining in healthcare information systems.

    PubMed

    Iakovidis, Dimitris; Smailis, Christos

    2012-01-01

    Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.

  18. Application of snakes and dynamic programming optimisation technique in modeling of buildings in informal settlement areas

    NASA Astrophysics Data System (ADS)

    Rüther, Heinz; Martine, Hagai M.; Mtalo, E. G.

    This paper presents a novel approach to semiautomatic building extraction in informal settlement areas from aerial photographs. The proposed approach uses a strategy of delineating buildings by optimising their approximate building contour position. Approximate building contours are derived automatically by locating elevation blobs in digital surface models. Building extraction is then effected by means of the snakes algorithm and the dynamic programming optimisation technique. With dynamic programming, the building contour optimisation problem is realized through a discrete multistage process and solved by the "time-delayed" algorithm, as developed in this work. The proposed building extraction approach is a semiautomatic process, with user-controlled operations linking fully automated subprocesses. Inputs into the proposed building extraction system are ortho-images and digital surface models, the latter being generated through image matching techniques. Buildings are modeled as "lumps" or elevation blobs in digital surface models, which are derived by altimetric thresholding of digital surface models. Initial windows for building extraction are provided by projecting the elevation blobs centre points onto an ortho-image. In the next step, approximate building contours are extracted from the ortho-image by region growing constrained by edges. Approximate building contours thus derived are inputs into the dynamic programming optimisation process in which final building contours are established. The proposed system is tested on two study areas: Marconi Beam in Cape Town, South Africa, and Manzese in Dar es Salaam, Tanzania. Sixty percent of buildings in the study areas have been extracted and verified and it is concluded that the proposed approach contributes meaningfully to the extraction of buildings in moderately complex and crowded informal settlement areas.

  19. 3D Orbit Visualization for Earth-Observing Missions

    NASA Technical Reports Server (NTRS)

    Jacob, Joseph C.; Plesea, Lucian; Chafin, Brian G.; Weiss, Barry H.

    2011-01-01

    This software visualizes orbit paths for the Orbiting Carbon Observatory (OCO), but was designed to be general and applicable to any Earth-observing mission. The software uses the Google Earth user interface to provide a visual mechanism to explore spacecraft orbit paths, ground footprint locations, and local cloud cover conditions. In addition, a drill-down capability allows for users to point and click on a particular observation frame to pop up ancillary information such as data product filenames and directory paths, latitude, longitude, time stamp, column-average dry air mole fraction of carbon dioxide, and solar zenith angle. This software can be integrated with the ground data system for any Earth-observing mission to automatically generate daily orbit path data products in Google Earth KML format. These KML data products can be directly loaded into the Google Earth application for interactive 3D visualization of the orbit paths for each mission day. Each time the application runs, the daily orbit paths are encapsulated in a KML file for each mission day since the last time the application ran. Alternatively, the daily KML for a specified mission day may be generated. The application automatically extracts the spacecraft position and ground footprint geometry as a function of time from a daily Level 1B data product created and archived by the mission s ground data system software. In addition, ancillary data, such as the column-averaged dry air mole fraction of carbon dioxide and solar zenith angle, are automatically extracted from a Level 2 mission data product. Zoom, pan, and rotate capability are provided through the standard Google Earth interface. Cloud cover is indicated with an image layer from the MODIS (Moderate Resolution Imaging Spectroradiometer) aboard the Aqua satellite, which is automatically retrieved from JPL s OnEarth Web service.

  20. Non-rigid registration of 3D ultrasound for neurosurgery using automatic feature detection and matching.

    PubMed

    Machado, Inês; Toews, Matthew; Luo, Jie; Unadkat, Prashin; Essayed, Walid; George, Elizabeth; Teodoro, Pedro; Carvalho, Herculano; Martins, Jorge; Golland, Polina; Pieper, Steve; Frisken, Sarah; Golby, Alexandra; Wells, William

    2018-06-04

    The brain undergoes significant structural change over the course of neurosurgery, including highly nonlinear deformation and resection. It can be informative to recover the spatial mapping between structures identified in preoperative surgical planning and the intraoperative state of the brain. We present a novel feature-based method for achieving robust, fully automatic deformable registration of intraoperative neurosurgical ultrasound images. A sparse set of local image feature correspondences is first estimated between ultrasound image pairs, after which rigid, affine and thin-plate spline models are used to estimate dense mappings throughout the image. Correspondences are derived from 3D features, distinctive generic image patterns that are automatically extracted from 3D ultrasound images and characterized in terms of their geometry (i.e., location, scale, and orientation) and a descriptor of local image appearance. Feature correspondences between ultrasound images are achieved based on a nearest-neighbor descriptor matching and probabilistic voting model similar to the Hough transform. Experiments demonstrate our method on intraoperative ultrasound images acquired before and after opening of the dura mater, during resection and after resection in nine clinical cases. A total of 1620 automatically extracted 3D feature correspondences were manually validated by eleven experts and used to guide the registration. Then, using manually labeled corresponding landmarks in the pre- and post-resection ultrasound images, we show that our feature-based registration reduces the mean target registration error from an initial value of 3.3 to 1.5 mm. This result demonstrates that the 3D features promise to offer a robust and accurate solution for 3D ultrasound registration and to correct for brain shift in image-guided neurosurgery.

  1. Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

    PubMed Central

    Gómez-Adorno, Helena; Sidorov, Grigori; Pinto, David; Vilariño, Darnes; Gelbukh, Alexander

    2016-01-01

    We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution. PMID:27589740

  2. A new license plate extraction framework based on fast mean shift

    NASA Astrophysics Data System (ADS)

    Pan, Luning; Li, Shuguang

    2010-08-01

    License plate extraction is considered to be the most crucial step of Automatic license plate recognition (ALPR) system. In this paper, a region-based license plate hybrid detection method is proposed to solve practical problems under complex background in which existing large quantity of disturbing information. In this method, coarse license plate location is carried out firstly to get the head part of a vehicle. Then a new Fast Mean Shift method based on random sampling of Kernel Density Estimate (KDE) is adopted to segment the color vehicle images, in order to get candidate license plate regions. The remarkable speed-up it brings makes Mean Shift segmentation more suitable for this application. Feature extraction and classification is used to accurately separate license plate from other candidate regions. At last, tilted license plate regulation is used for future recognition steps.

  3. Detection of reflecting surfaces by a statistical model

    NASA Astrophysics Data System (ADS)

    He, Qiang; Chu, Chee-Hung H.

    2009-02-01

    Remote sensing is widely used assess the destruction from natural disasters and to plan relief and recovery operations. How to automatically extract useful features and segment interesting objects from digital images, including remote sensing imagery, becomes a critical task for image understanding. Unfortunately, current research on automated feature extraction is ignorant of contextual information. As a result, the fidelity of populating attributes corresponding to interesting features and objects cannot be satisfied. In this paper, we present an exploration on meaningful object extraction integrating reflecting surfaces. Detection of specular reflecting surfaces can be useful in target identification and then can be applied to environmental monitoring, disaster prediction and analysis, military, and counter-terrorism. Our method is based on a statistical model to capture the statistical properties of specular reflecting surfaces. And then the reflecting surfaces are detected through cluster analysis.

  4. Morphometric information to reduce the semantic gap in the characterization of microscopic images of thyroid nodules.

    PubMed

    Macedo, Alessandra A; Pessotti, Hugo C; Almansa, Luciana F; Felipe, Joaquim C; Kimura, Edna T

    2016-07-01

    The analyses of several systems for medical-imaging processing typically support the extraction of image attributes, but do not comprise some information that characterizes images. For example, morphometry can be applied to find new information about the visual content of an image. The extension of information may result in knowledge. Subsequently, results of mappings can be applied to recognize exam patterns, thus improving the accuracy of image retrieval and allowing a better interpretation of exam results. Although successfully applied in breast lesion images, the morphometric approach is still poorly explored in thyroid lesions due to the high subjectivity thyroid examinations. This paper presents a theoretical-practical study, considering Computer Aided Diagnosis (CAD) and Morphometry, to reduce the semantic discontinuity between medical image features and human interpretation of image content. The proposed method aggregates the content of microscopic images characterized by morphometric information and other image attributes extracted by traditional object extraction algorithms. This method carries out segmentation, feature extraction, image labeling and classification. Morphometric analysis was included as an object extraction method in order to verify the improvement of its accuracy for automatic classification of microscopic images. To validate this proposal and verify the utility of morphometric information to characterize thyroid images, a CAD system was created to classify real thyroid image-exams into Papillary Cancer, Goiter and Non-Cancer. Results showed that morphometric information can improve the accuracy and precision of image retrieval and the interpretation of results in computer-aided diagnosis. For example, in the scenario where all the extractors are combined with the morphometric information, the CAD system had its best performance (70% of precision in Papillary cases). Results signalized a positive use of morphometric information from images to reduce semantic discontinuity between human interpretation and image characterization. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Automatic segmentation of the liver using multi-planar anatomy and deformable surface model in abdominal contrast-enhanced CT images

    NASA Astrophysics Data System (ADS)

    Jang, Yujin; Hong, Helen; Chung, Jin Wook; Yoon, Young Ho

    2012-02-01

    We propose an effective technique for the extraction of liver boundary based on multi-planar anatomy and deformable surface model in abdominal contrast-enhanced CT images. Our method is composed of four main steps. First, for extracting an optimal volume circumscribing a liver, lower and side boundaries are defined by positional information of pelvis and rib. An upper boundary is defined by separating the lungs and heart from CT images. Second, for extracting an initial liver volume, optimal liver volume is smoothed by anisotropic diffusion filtering and is segmented using adaptively selected threshold value. Third, for removing neighbor organs from initial liver volume, morphological opening and connected component labeling are applied to multiple planes. Finally, for refining the liver boundaries, deformable surface model is applied to a posterior liver surface and missing left robe in previous step. Then, probability summation map is generated by calculating regional information of the segmented liver in coronal plane, which is used for restoring the inaccurate liver boundaries. Experimental results show that our segmentation method can accurately extract liver boundaries without leakage to neighbor organs in spite of various liver shape and ambiguous boundary.

  6. Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras.

    PubMed

    Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

    2016-06-24

    Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system.

  7. Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras

    PubMed Central

    Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

    2016-01-01

    Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system. PMID:27347961

  8. Bridging the semantic gap in sports

    NASA Astrophysics Data System (ADS)

    Li, Baoxin; Errico, James; Pan, Hao; Sezan, M. Ibrahim

    2003-01-01

    One of the major challenges facing current media management systems and the related applications is the so-called "semantic gap" between the rich meaning that a user desires and the shallowness of the content descriptions that are automatically extracted from the media. In this paper, we address the problem of bridging this gap in the sports domain. We propose a general framework for indexing and summarizing sports broadcast programs. The framework is based on a high-level model of sports broadcast video using the concept of an event, defined according to domain-specific knowledge for different types of sports. Within this general framework, we develop automatic event detection algorithms that are based on automatic analysis of the visual and aural signals in the media. We have successfully applied the event detection algorithms to different types of sports including American football, baseball, Japanese sumo wrestling, and soccer. Event modeling and detection contribute to the reduction of the semantic gap by providing rudimentary semantic information obtained through media analysis. We further propose a novel approach, which makes use of independently generated rich textual metadata, to fill the gap completely through synchronization of the information-laden textual data with the basic event segments. An MPEG-7 compliant prototype browsing system has been implemented to demonstrate semantic retrieval and summarization of sports video.

  9. A preliminary approach to creating an overview of lactoferrin multi-functionality utilizing a text mining method.

    PubMed

    Shimazaki, Kei-ichi; Kushida, Tatsuya

    2010-06-01

    Lactoferrin is a multi-functional metal-binding glycoprotein that exhibits many biological functions of interest to many researchers from the fields of clinical medicine, dentistry, pharmacology, veterinary medicine, nutrition and milk science. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public data repositories. However, as the literature is expanding daily, this presents challenges in understanding the larger picture of lactoferrin function and mechanisms. In order to overcome the "analysis paralysis" associated with lactoferrin information, we attempted to apply a text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo). This information extraction system uses natural language processing and text mining technology. This system analyzes the sentences and titles from abstracts stored in the PubMed database, and can automatically extract binary relations that consist of interactions between genes/proteins, chemicals and diseases/functions. We expect that such information visualization analysis will be useful in determining novel relationships among a multitude of lactoferrin functions and mechanisms. We have demonstrated the utilization of this method to find pathways of lactoferrin participation in neovascularization, Helicobacter pylori attack on gastric mucosa, atopic dermatitis and lipid metabolism.

  10. Robot Command Interface Using an Audio-Visual Speech Recognition System

    NASA Astrophysics Data System (ADS)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  11. Using Activity-Related Behavioural Features towards More Effective Automatic Stress Detection

    PubMed Central

    Giakoumis, Dimitris; Drosou, Anastasios; Cipresso, Pietro; Tzovaras, Dimitrios; Hassapis, George; Gaggioli, Andrea; Riva, Giuseppe

    2012-01-01

    This paper introduces activity-related behavioural features that can be automatically extracted from a computer system, with the aim to increase the effectiveness of automatic stress detection. The proposed features are based on processing of appropriate video and accelerometer recordings taken from the monitored subjects. For the purposes of the present study, an experiment was conducted that utilized a stress-induction protocol based on the stroop colour word test. Video, accelerometer and biosignal (Electrocardiogram and Galvanic Skin Response) recordings were collected from nineteen participants. Then, an explorative study was conducted by following a methodology mainly based on spatiotemporal descriptors (Motion History Images) that are extracted from video sequences. A large set of activity-related behavioural features, potentially useful for automatic stress detection, were proposed and examined. Experimental evaluation showed that several of these behavioural features significantly correlate to self-reported stress. Moreover, it was found that the use of the proposed features can significantly enhance the performance of typical automatic stress detection systems, commonly based on biosignal processing. PMID:23028461

  12. Rapid, Potentially Automatable, Method Extract Biomarkers for HPLC/ESI/MS/MS to Detect and Identify BW Agents

    DTIC Science & Technology

    1997-11-01

    status can sometimes be reflected in the infectious potential or drug resistance of those pathogens. For example, in Mycobacterium tuberculosis ... Mycobacterium tuberculosis , its antibiotic resistance and prediction of pathogenicity amongst Mycobacterium spp. based on signature lipid biomarkers ...TITLE AND SUBTITLE Rapid, Potentially Automatable, Method Extract Biomarkers for HPLC/ESI/MS/MS to Detect and Identify BW Agents 5a. CONTRACT NUMBER 5b

  13. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs

    PubMed Central

    Chen, Haijian; Han, Dongmei; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  14. Road Network Extraction from Dsm by Mathematical Morphology and Reasoning

    NASA Astrophysics Data System (ADS)

    Li, Yan; Wu, Jianliang; Zhu, Lin; Tachibana, Kikuo

    2016-06-01

    The objective of this research is the automatic extraction of the road network in a scene of the urban area from a high resolution digital surface model (DSM). Automatic road extraction and modeling from remote sensed data has been studied for more than one decade. The methods vary greatly due to the differences of data types, regions, resolutions et al. An advanced automatic road network extraction scheme is proposed to address the issues of tedium steps on segmentation, recognition and grouping. It is on the basis of a geometric road model which describes a multiple-level structure. The 0-dimension element is intersection. The 1-dimension elements are central line and side. The 2-dimension element is plane, which is generated from the 1-dimension elements. The key feature of the presented approach is the cross validation for the three road elements which goes through the entire procedure of their extraction. The advantage of our model and method is that linear elements of the road can be derived directly, without any complex, non-robust connection hypothesis. An example of Japanese scene is presented to display the procedure and the performance of the approach.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crain, Steven P.; Yang, Shuang-Hong; Zha, Hongyuan

    Access to health information by consumers is ham- pered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, com- mon and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations us- ing consumermore » questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.« less

  16. Automatic Centerline Extraction of Coverd Roads by Surrounding Objects from High Resolution Satellite Images

    NASA Astrophysics Data System (ADS)

    Kamangir, H.; Momeni, M.; Satari, M.

    2017-09-01

    This paper presents an automatic method to extract road centerline networks from high and very high resolution satellite images. The present paper addresses the automated extraction roads covered with multiple natural and artificial objects such as trees, vehicles and either shadows of buildings or trees. In order to have a precise road extraction, this method implements three stages including: classification of images based on maximum likelihood algorithm to categorize images into interested classes, modification process on classified images by connected component and morphological operators to extract pixels of desired objects by removing undesirable pixels of each class, and finally line extraction based on RANSAC algorithm. In order to evaluate performance of the proposed method, the generated results are compared with ground truth road map as a reference. The evaluation performance of the proposed method using representative test images show completeness values ranging between 77% and 93%.

  17. Focal liver lesions segmentation and classification in nonenhanced T2-weighted MRI.

    PubMed

    Gatos, Ilias; Tsantis, Stavros; Karamesini, Maria; Spiliopoulos, Stavros; Karnabatidis, Dimitris; Hazle, John D; Kagadis, George C

    2017-07-01

    To automatically segment and classify focal liver lesions (FLLs) on nonenhanced T2-weighted magnetic resonance imaging (MRI) scans using a computer-aided diagnosis (CAD) algorithm. 71 FLLs (30 benign lesions, 19 hepatocellular carcinomas, and 22 metastases) on T2-weighted MRI scans were delineated by the proposed CAD scheme. The FLL segmentation procedure involved wavelet multiscale analysis to extract accurate edge information and mean intensity values for consecutive edges computed using horizontal and vertical analysis that were fed into the subsequent fuzzy C-means algorithm for final FLL border extraction. Texture information for each extracted lesion was derived using 42 first- and second-order textural features from grayscale value histogram, co-occurrence, and run-length matrices. Twelve morphological features were also extracted to capture any shape differentiation between classes. Feature selection was performed with stepwise multilinear regression analysis that led to a reduced feature subset. A multiclass Probabilistic Neural Network (PNN) classifier was then designed and used for lesion classification. PNN model evaluation was performed using the leave-one-out (LOO) method and receiver operating characteristic (ROC) curve analysis. The mean overlap between the automatically segmented FLLs and the manual segmentations performed by radiologists was 0.91 ± 0.12. The highest classification accuracies in the PNN model for the benign, hepatocellular carcinoma, and metastatic FLLs were 94.1%, 91.4%, and 94.1%, respectively, with sensitivity/specificity values of 90%/97.3%, 89.5%/92.2%, and 90.9%/95.6% respectively. The overall classification accuracy for the proposed system was 90.1%. Our diagnostic system using sophisticated FLL segmentation and classification algorithms is a powerful tool for routine clinical MRI-based liver evaluation and can be a supplement to contrast-enhanced MRI to prevent unnecessary invasive procedures. © 2017 American Association of Physicists in Medicine.

  18. A framework for feature extraction from hospital medical data with applications in risk prediction.

    PubMed

    Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha

    2014-12-30

    Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

  19. Automatic acquisition of motion trajectories: tracking hockey players

    NASA Astrophysics Data System (ADS)

    Okuma, Kenji; Little, James J.; Lowe, David

    2003-12-01

    Computer systems that have the capability of analyzing complex and dynamic scenes play an essential role in video annotation. Scenes can be complex in such a way that there are many cluttered objects with different colors, shapes and sizes, and can be dynamic with multiple interacting moving objects and a constantly changing background. In reality, there are many scenes that are complex, dynamic, and challenging enough for computers to describe. These scenes include games of sports, air traffic, car traffic, street intersections, and cloud transformations. Our research is about the challenge of inventing a descriptive computer system that analyzes scenes of hockey games where multiple moving players interact with each other on a constantly moving background due to camera motions. Ultimately, such a computer system should be able to acquire reliable data by extracting the players" motion as their trajectories, querying them by analyzing the descriptive information of data, and predict the motions of some hockey players based on the result of the query. Among these three major aspects of the system, we primarily focus on visual information of the scenes, that is, how to automatically acquire motion trajectories of hockey players from video. More accurately, we automatically analyze the hockey scenes by estimating parameters (i.e., pan, tilt, and zoom) of the broadcast cameras, tracking hockey players in those scenes, and constructing a visual description of the data by displaying trajectories of those players. Many technical problems in vision such as fast and unpredictable players' motions and rapid camera motions make our challenge worth tackling. To the best of our knowledge, there have not been any automatic video annotation systems for hockey developed in the past. Although there are many obstacles to overcome, our efforts and accomplishments would hopefully establish the infrastructure of the automatic hockey annotation system and become a milestone for research in automatic video annotation in this domain.

  20. A data reduction package for multiple object spectroscopy

    NASA Technical Reports Server (NTRS)

    Hill, J. M.; Eisenhamer, J. D.; Silva, D. R.

    1986-01-01

    Experience with fiber-optic spectrometers has demonstrated improvements in observing efficiency for clusters of 30 or more objects that must in turn be matched by data reduction capability increases. The Medusa Automatic Reduction System reduces data generated by multiobject spectrometers in the form of two-dimensional images containing 44 to 66 individual spectra, using both software and hardware improvements to efficiently extract the one-dimensional spectra. Attention is given to the ridge-finding algorithm for automatic location of the spectra in the CCD frame. A simultaneous extraction of calibration frames allows an automatic wavelength calibration routine to determine dispersion curves, and both line measurements and cross-correlation techniques are used to determine galaxy redshifts.

  1. Infrared Cephalic-Vein to Assist Blood Extraction Tasks: Automatic Projection and Recognition

    NASA Astrophysics Data System (ADS)

    Lagüela, S.; Gesto, M.; Riveiro, B.; González-Aguilera, D.

    2017-05-01

    Thermal infrared band is not commonly used in photogrammetric and computer vision algorithms, mainly due to the low spatial resolution of this type of imagery. However, this band captures sub-superficial information, increasing the capabilities of visible bands regarding applications. This fact is especially important in biomedicine and biometrics, allowing the geometric characterization of interior organs and pathologies with photogrammetric principles, as well as the automatic identification and labelling using computer vision algorithms. This paper presents advances of close-range photogrammetry and computer vision applied to thermal infrared imagery, with the final application of Augmented Reality in order to widen its application in the biomedical field. In this case, the thermal infrared image of the arm is acquired and simultaneously projected on the arm, together with the identification label of the cephalic-vein. This way, blood analysts are assisted in finding the vein for blood extraction, especially in those cases where the identification by the human eye is a complex task. Vein recognition is performed based on the Gaussian temperature distribution in the area of the vein, while the calibration between projector and thermographic camera is developed through feature extraction and pattern recognition. The method is validated through its application to a set of volunteers, with different ages and genres, in such way that different conditions of body temperature and vein depth are covered for the applicability and reproducibility of the method.

  2. Information Pre-Processing using Domain Meta-Ontology and Rule Learning System

    NASA Astrophysics Data System (ADS)

    Ranganathan, Girish R.; Biletskiy, Yevgen

    Around the globe, extraordinary amounts of documents are being created by Enterprises and by users outside these Enterprises. The documents created in the Enterprises constitute the main focus of the present chapter. These documents are used to perform numerous amounts of machine processing. While using thesedocuments for machine processing, lack of semantics of the information in these documents may cause misinterpretation of the information, thereby inhibiting the productiveness of computer assisted analytical work. Hence, it would be profitable to the Enterprises if they use well defined domain ontologies which will serve as rich source(s) of semantics for the information in the documents. These domain ontologies can be created manually, semi-automatically or fully automatically. The focus of this chapter is to propose an intermediate solution which will enable relatively easy creation of these domain ontologies. The process of extracting and capturing domain ontologies from these voluminous documents requires extensive involvement of domain experts and application of methods of ontology learning that are substantially labor intensive; therefore, some intermediate solutions which would assist in capturing domain ontologies must be developed. This chapter proposes a solution in this direction which involves building a meta-ontology that will serve as an intermediate information source for the main domain ontology. This chapter proposes a solution in this direction which involves building a meta-ontology as a rapid approach in conceptualizing a domain of interest from huge amount of source documents. This meta-ontology can be populated by ontological concepts, attributes and relations from documents, and then refined in order to form better domain ontology either through automatic ontology learning methods or some other relevant ontology building approach.

  3. An automatic method to detect and track the glottal gap from high speed videoendoscopic images.

    PubMed

    Andrade-Miranda, Gustavo; Godino-Llorente, Juan I; Moro-Velázquez, Laureano; Gómez-García, Jorge Andrés

    2015-10-29

    The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration. In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed. Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds. The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.

  4. Image Processing

    NASA Technical Reports Server (NTRS)

    1987-01-01

    A new spinoff product was derived from Geospectra Corporation's expertise in processing LANDSAT data in a software package. Called ATOM (for Automatic Topographic Mapping), it's capable of digitally extracting elevation information from stereo photos taken by spaceborne cameras. ATOM offers a new dimension of realism in applications involving terrain simulations, producing extremely precise maps of an area's elevations at a lower cost than traditional methods. ATOM has a number of applications involving defense training simulations and offers utility in architecture, urban planning, forestry, petroleum and mineral exploration.

  5. Open-Source Programming for Automated Generation of Graphene Raman Spectral Maps

    NASA Astrophysics Data System (ADS)

    Vendola, P.; Blades, M.; Pierre, W.; Jedlicka, S.; Rotkin, S. V.

    Raman microscopy is a useful tool for studying the structural characteristics of graphene deposited onto substrates. However, extracting useful information from the Raman spectra requires data processing and 2D map generation. An existing home-built confocal Raman microscope was optimized for graphene samples and programmed to automatically generate Raman spectral maps across a specified area. In particular, an open source data collection scheme was generated to allow the efficient collection and analysis of the Raman spectral data for future use. NSF ECCS-1509786.

  6. Automated kidney detection for 3D ultrasound using scan line searching

    NASA Astrophysics Data System (ADS)

    Noll, Matthias; Nadolny, Anne; Wesarg, Stefan

    2016-04-01

    Ultrasound (U/S) is a fast and non-expensive imaging modality that is used for the examination of various anatomical structures, e.g. the kidneys. One important task for automatic organ tracking or computer-aided diagnosis is the identification of the organ region. During this process the exact information about the transducer location and orientation is usually unavailable. This renders the implementation of such automatic methods exceedingly challenging. In this work we like to introduce a new automatic method for the detection of the kidney in 3D U/S images. This novel technique analyses the U/S image data along virtual scan lines. Here, characteristic texture changes when entering and leaving the symmetric tissue regions of the renal cortex are searched for. A subsequent feature accumulation along a second scan direction produces a 2D heat map of renal cortex candidates, from which the kidney location is extracted in two steps. First, the strongest candidate as well as its counterpart are extracted by heat map intensity ranking and renal cortex size analysis. This process exploits the heat map gap caused by the renal pelvis region. Substituting the renal pelvis detection with this combined cortex tissue feature increases the detection robustness. In contrast to model based methods that generate characteristic pattern matches, our method is simpler and therefore faster. An evaluation performed on 61 3D U/S data sets showed, that in 55 cases showing none or minor shadowing the kidney location could be correctly identified.

  7. Automatic differential analysis of NMR experiments in complex samples.

    PubMed

    Margueritte, Laure; Markov, Petar; Chiron, Lionel; Starck, Jean-Philippe; Vonthron-Sénécheau, Catherine; Bourjot, Mélanie; Delsuc, Marc-André

    2018-06-01

    Liquid state nuclear magnetic resonance (NMR) is a powerful tool for the analysis of complex mixtures of unknown molecules. This capacity has been used in many analytical approaches: metabolomics, identification of active compounds in natural extracts, and characterization of species, and such studies require the acquisition of many diverse NMR measurements on series of samples. Although acquisition can easily be performed automatically, the number of NMR experiments involved in these studies increases very rapidly, and this data avalanche requires to resort to automatic processing and analysis. We present here a program that allows the autonomous, unsupervised processing of a large corpus of 1D, 2D, and diffusion-ordered spectroscopy experiments from a series of samples acquired in different conditions. The program provides all the signal processing steps, as well as peak-picking and bucketing of 1D and 2D spectra, the program and its components are fully available. In an experiment mimicking the search of a bioactive species in a natural extract, we use it for the automatic detection of small amounts of artemisinin added to a series of plant extracts and for the generation of the spectral fingerprint of this molecule. This program called Plasmodesma is a novel tool that should be useful to decipher complex mixtures, particularly in the discovery of biologically active natural products from plants extracts but can also in drug discovery or metabolomics studies. Copyright © 2017 John Wiley & Sons, Ltd.

  8. New auto-segment method of cerebral hemorrhage

    NASA Astrophysics Data System (ADS)

    Wang, Weijiang; Shen, Tingzhi; Dang, Hua

    2007-12-01

    A novel method for Computerized tomography (CT) cerebral hemorrhage (CH) image automatic segmentation is presented in the paper, which uses expert system that models human knowledge about the CH automatic segmentation problem. The algorithm adopts a series of special steps and extracts some easy ignored CH features which can be found by statistic results of mass real CH images, such as region area, region CT number, region smoothness and some statistic CH region relationship. And a seven steps' extracting mechanism will ensure these CH features can be got correctly and efficiently. By using these CH features, a decision tree which models the human knowledge about the CH automatic segmentation problem has been built and it will ensure the rationality and accuracy of the algorithm. Finally some experiments has been taken to verify the correctness and reasonable of the automatic segmentation, and the good correct ratio and fast speed make it possible to be widely applied into practice.

  9. Automatic detection of typical dust devils from Mars landscape images

    NASA Astrophysics Data System (ADS)

    Ogohara, Kazunori; Watanabe, Takeru; Okumura, Susumu; Hatanaka, Yuji

    2018-02-01

    This paper presents an improved algorithm for automatic detection of Martian dust devils that successfully extracts tiny bright dust devils and obscured large dust devils from two subtracted landscape images. These dust devils are frequently observed using visible cameras onboard landers or rovers. Nevertheless, previous research on automated detection of dust devils has not focused on these common types of dust devils, but on dust devils that appear on images to be irregularly bright and large. In this study, we detect these common dust devils automatically using two kinds of parameter sets for thresholding when binarizing subtracted images. We automatically extract dust devils from 266 images taken by the Spirit rover to evaluate our algorithm. Taking dust devils detected by visual inspection to be ground truth, the precision, recall and F-measure values are 0.77, 0.86, and 0.81, respectively.

  10. The impact of OCR accuracy on automated cancer classification of pathology reports.

    PubMed

    Zuccon, Guido; Nguyen, Anthony N; Bergheim, Anton; Wickman, Sandra; Grayson, Narelle

    2012-01-01

    To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.

  11. Laser-based structural sensing and surface damage detection

    NASA Astrophysics Data System (ADS)

    Guldur, Burcu

    Damage due to age or accumulated damage from hazards on existing structures poses a worldwide problem. In order to evaluate the current status of aging, deteriorating and damaged structures, it is vital to accurately assess the present conditions. It is possible to capture the in situ condition of structures by using laser scanners that create dense three-dimensional point clouds. This research investigates the use of high resolution three-dimensional terrestrial laser scanners with image capturing abilities as tools to capture geometric range data of complex scenes for structural engineering applications. Laser scanning technology is continuously improving, with commonly available scanners now capturing over 1,000,000 texture-mapped points per second with an accuracy of ~2 mm. However, automatically extracting meaningful information from point clouds remains a challenge, and the current state-of-the-art requires significant user interaction. The first objective of this research is to use widely accepted point cloud processing steps such as registration, feature extraction, segmentation, surface fitting and object detection to divide laser scanner data into meaningful object clusters and then apply several damage detection methods to these clusters. This required establishing a process for extracting important information from raw laser-scanned data sets such as the location, orientation and size of objects in a scanned region, and location of damaged regions on a structure. For this purpose, first a methodology for processing range data to identify objects in a scene is presented and then, once the objects from model library are correctly detected and fitted into the captured point cloud, these fitted objects are compared with the as-is point cloud of the investigated object to locate defects on the structure. The algorithms are demonstrated on synthetic scenes and validated on range data collected from test specimens and test-bed bridges. The second objective of this research is to combine useful information extracted from laser scanner data with color information, which provides information in the fourth dimension that enables detection of damage types such as cracks, corrosion, and related surface defects that are generally difficult to detect using only laser scanner data; moreover, the color information also helps to track volumetric changes on structures such as spalling. Although using images with varying resolution to detect cracks is an extensively researched topic, damage detection using laser scanners with and without color images is a new research area that holds many opportunities for enhancing the current practice of visual inspections. The aim is to combine the best features of laser scans and images to create an automatic and effective surface damage detection method, which will reduce the need for skilled labor during visual inspections and allow automatic documentation of related information. This work enables developing surface damage detection strategies that integrate existing condition rating criteria for a wide range damage types that are collected under three main categories: small deformations already existing on the structure (cracks); damage types that induce larger deformations, but where the initial topology of the structure has not changed appreciably (e.g., bent members); and large deformations where localized changes in the topology of the structure have occurred (e.g., rupture, discontinuities and spalling). The effectiveness of the developed damage detection algorithms are validated by comparing the detection results with the measurements taken from test specimens and test-bed bridges.

  12. Automatic Content Creation for Games to Train Students Distinguishing Similar Chinese Characters

    NASA Astrophysics Data System (ADS)

    Lai, Kwong-Hung; Leung, Howard; Tang, Jeff K. T.

    In learning Chinese, many students often have the problem of mixing up similar characters. This can cause misunderstanding and miscommunication in daily life. It is thus important for students learning the Chinese language to be able to distinguish similar characters and understand their proper usage. In this paper, we propose a game style framework in which the game content in identifying similar Chinese characters in idioms and words is created automatically. Our prior work on analyzing students’ Chinese handwriting can be applied in the similarity measure of Chinese characters. We extend this work by adding the component of radical extraction to speed up the search process. Experimental results show that the proposed method is more accurate and faster in finding more similar Chinese characters compared with the baseline method without considering the radical information.

  13. Content-based cell pathology image retrieval by combining different features

    NASA Astrophysics Data System (ADS)

    Zhou, Guangquan; Jiang, Lu; Luo, Limin; Bao, Xudong; Shu, Huazhong

    2004-04-01

    Content Based Color Cell Pathology Image Retrieval is one of the newest computer image processing applications in medicine. Recently, some algorithms have been developed to achieve this goal. Because of the particularity of cell pathology images, the result of the image retrieval based on single characteristic is not satisfactory. A new method for pathology image retrieval by combining color, texture and morphologic features to search cell images is proposed. Firstly, nucleus regions of leukocytes in images are automatically segmented by K-mean clustering method. Then single leukocyte region is detected by utilizing thresholding algorithm segmentation and mathematics morphology. The features that include color, texture and morphologic features are extracted from single leukocyte to represent main attribute in the search query. The features are then normalized because the numerical value range and physical meaning of extracted features are different. Finally, the relevance feedback system is introduced. So that the system can automatically adjust the weights of different features and improve the results of retrieval system according to the feedback information. Retrieval results using the proposed method fit closely with human perception and are better than those obtained with the methods based on single feature.

  14. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  15. Contour matching for a fish recognition and migration-monitoring system

    NASA Astrophysics Data System (ADS)

    Lee, Dah-Jye; Schoenberger, Robert B.; Shiozawa, Dennis; Xu, Xiaoqian; Zhan, Pengcheng

    2004-12-01

    Fish migration is being monitored year round to provide valuable information for the study of behavioral responses of fish to environmental variations. However, currently all monitoring is done by human observers. An automatic fish recognition and migration monitoring system is more efficient and can provide more accurate data. Such a system includes automatic fish image acquisition, contour extraction, fish categorization, and data storage. Shape is a very important characteristic and shape analysis and shape matching are studied for fish recognition. Previous work focused on finding critical landmark points on fish shape using curvature function analysis. Fish recognition based on landmark points has shown satisfying results. However, the main difficulty of this approach is that landmark points sometimes cannot be located very accurately. Whole shape matching is used for fish recognition in this paper. Several shape descriptors, such as Fourier descriptors, polygon approximation and line segments, are tested. A power cepstrum technique has been developed in order to improve the categorization speed using contours represented in tangent space with normalized length. Design and integration including image acquisition, contour extraction and fish categorization are discussed in this paper. Fish categorization results based on shape analysis and shape matching are also included.

  16. Extracting Product Features and Opinion Words Using Pattern Knowledge in Customer Reviews

    PubMed Central

    Lynn, Khin Thidar

    2013-01-01

    Due to the development of e-commerce and web technology, most of online Merchant sites are able to write comments about purchasing products for customer. Customer reviews expressed opinion about products or services which are collectively referred to as customer feedback data. Opinion extraction about products from customer reviews is becoming an interesting area of research and it is motivated to develop an automatic opinion mining application for users. Therefore, efficient method and techniques are needed to extract opinions from reviews. In this paper, we proposed a novel idea to find opinion words or phrases for each feature from customer reviews in an efficient way. Our focus in this paper is to get the patterns of opinion words/phrases about the feature of product from the review text through adjective, adverb, verb, and noun. The extracted features and opinions are useful for generating a meaningful summary that can provide significant informative resource to help the user as well as merchants to track the most suitable choice of product. PMID:24459430

  17. Extracting product features and opinion words using pattern knowledge in customer reviews.

    PubMed

    Htay, Su Su; Lynn, Khin Thidar

    2013-01-01

    Due to the development of e-commerce and web technology, most of online Merchant sites are able to write comments about purchasing products for customer. Customer reviews expressed opinion about products or services which are collectively referred to as customer feedback data. Opinion extraction about products from customer reviews is becoming an interesting area of research and it is motivated to develop an automatic opinion mining application for users. Therefore, efficient method and techniques are needed to extract opinions from reviews. In this paper, we proposed a novel idea to find opinion words or phrases for each feature from customer reviews in an efficient way. Our focus in this paper is to get the patterns of opinion words/phrases about the feature of product from the review text through adjective, adverb, verb, and noun. The extracted features and opinions are useful for generating a meaningful summary that can provide significant informative resource to help the user as well as merchants to track the most suitable choice of product.

  18. Dynamics of processing invisible faces in the brain: automatic neural encoding of facial expression information.

    PubMed

    Jiang, Yi; Shannon, Robert W; Vizueta, Nathalie; Bernat, Edward M; Patrick, Christopher J; He, Sheng

    2009-02-01

    The fusiform face area (FFA) and the superior temporal sulcus (STS) are suggested to process facial identity and facial expression information respectively. We recently demonstrated a functional dissociation between the FFA and the STS as well as correlated sensitivity of the STS and the amygdala to facial expressions using an interocular suppression paradigm [Jiang, Y., He, S., 2006. Cortical responses to invisible faces: dissociating subsystems for facial-information processing. Curr. Biol. 16, 2023-2029.]. In the current event-related brain potential (ERP) study, we investigated the temporal dynamics of facial information processing. Observers viewed neutral, fearful, and scrambled face stimuli, either visibly or rendered invisible through interocular suppression. Relative to scrambled face stimuli, intact visible faces elicited larger positive P1 (110-130 ms) and larger negative N1 or N170 (160-180 ms) potentials at posterior occipital and bilateral occipito-temporal regions respectively, with the N170 amplitude significantly greater for fearful than neutral faces. Invisible intact faces generated a stronger signal than scrambled faces at 140-200 ms over posterior occipital areas whereas invisible fearful faces (compared to neutral and scrambled faces) elicited a significantly larger negative deflection starting at 220 ms along the STS. These results provide further evidence for cortical processing of facial information without awareness and elucidate the temporal sequence of automatic facial expression information extraction.

  19. Crowdsourcing the Measurement of Interstate Conflict

    PubMed Central

    2016-01-01

    Much of the data used to measure conflict is extracted from news reports. This is typically accomplished using either expert coders to quantify the relevant information or machine coders to automatically extract data from documents. Although expert coding is costly, it produces quality data. Machine coding is fast and inexpensive, but the data are noisy. To diminish the severity of this tradeoff, we introduce a method for analyzing news documents that uses crowdsourcing, supplemented with computational approaches. The new method is tested on documents about Militarized Interstate Disputes, and its accuracy ranges between about 68 and 76 percent. This is shown to be a considerable improvement over automated coding, and to cost less and be much faster than expert coding. PMID:27310427

  20. Quantitative, simultaneous, and collinear eye-tracked, high dynamic range optical coherence tomography at 850 and 1060 nm

    NASA Astrophysics Data System (ADS)

    Mooser, Matthias; Burri, Christian; Stoller, Markus; Luggen, David; Peyer, Michael; Arnold, Patrik; Meier, Christoph; Považay, Boris

    2017-07-01

    Ocular optical coherence tomography at the wavelengths ranges of 850 and 1060 nm have been integrated with a confocal scanning laser ophthalmoscope eye-tracker as a clinical commercial-class system. Collinear optics enables an exact overlap of the different channels to produce precisely overlapping depth-scans for evaluating the similarities and differences between the wavelengths to extract additional physiologic information. A reliable segmentation algorithm utilizing Graphcuts has been implemented and applied to automatically extract retinal and choroidal shape in cross-sections and volumes. The device has been tested in normals and pathologies including a cross-sectional and longitudinal study of myopia progress and control with a duplicate instrument in Asian children.

  1. Sulci segmentation using geometric active contours

    NASA Astrophysics Data System (ADS)

    Torkaman, Mahsa; Zhu, Liangjia; Karasev, Peter; Tannenbaum, Allen

    2017-02-01

    Sulci are groove-like regions lying in the depth of the cerebral cortex between gyri, which together, form a folded appearance in human and mammalian brains. Sulci play an important role in the structural analysis of the brain, morphometry (i.e., the measurement of brain structures), anatomical labeling and landmark-based registration.1 Moreover, sulcal morphological changes are related to cortical thickness, whose measurement may provide useful information for studying variety of psychiatric disorders. Manually extracting sulci requires complying with complex protocols, which make the procedure both tedious and error prone.2 In this paper, we describe an automatic procedure, employing geometric active contours, which extract the sulci. Sulcal boundaries are obtained by minimizing a certain energy functional whose minimum is attained at the boundary of the given sulci.

  2. Automatic Classification of High Resolution Satellite Imagery - a Case Study for Urban Areas in the Kingdom of Saudi Arabia

    NASA Astrophysics Data System (ADS)

    Maas, A.; Alrajhi, M.; Alobeid, A.; Heipke, C.

    2017-05-01

    Updating topographic geospatial databases is often performed based on current remotely sensed images. To automatically extract the object information (labels) from the images, supervised classifiers are being employed. Decisions to be taken in this process concern the definition of the classes which should be recognised, the features to describe each class and the training data necessary in the learning part of classification. With a view to large scale topographic databases for fast developing urban areas in the Kingdom of Saudi Arabia we conducted a case study, which investigated the following two questions: (a) which set of features is best suitable for the classification?; (b) what is the added value of height information, e.g. derived from stereo imagery? Using stereoscopic GeoEye and Ikonos satellite data we investigate these two questions based on our research on label tolerant classification using logistic regression and partly incorrect training data. We show that in between five and ten features can be recommended to obtain a stable solution, that height information consistently yields an improved overall classification accuracy of about 5%, and that label noise can be successfully modelled and thus only marginally influences the classification results.

  3. Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

    PubMed

    Becker, Matthias; Böckmann, Britta

    2016-01-01

    Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.

  4. Computer-aided screening system for cervical precancerous cells based on field emission scanning electron microscopy and energy dispersive x-ray images and spectra

    NASA Astrophysics Data System (ADS)

    Jusman, Yessi; Ng, Siew-Cheok; Hasikin, Khairunnisa; Kurnia, Rahmadi; Osman, Noor Azuan Bin Abu; Teoh, Kean Hooi

    2016-10-01

    The capability of field emission scanning electron microscopy and energy dispersive x-ray spectroscopy (FE-SEM/EDX) to scan material structures at the microlevel and characterize the material with its elemental properties has inspired this research, which has developed an FE-SEM/EDX-based cervical cancer screening system. The developed computer-aided screening system consisted of two parts, which were the automatic features of extraction and classification. For the automatic features extraction algorithm, the image and spectra of cervical cells features extraction algorithm for extracting the discriminant features of FE-SEM/EDX data was introduced. The system automatically extracted two types of features based on FE-SEM/EDX images and FE-SEM/EDX spectra. Textural features were extracted from the FE-SEM/EDX image using a gray level co-occurrence matrix technique, while the FE-SEM/EDX spectra features were calculated based on peak heights and corrected area under the peaks using an algorithm. A discriminant analysis technique was employed to predict the cervical precancerous stage into three classes: normal, low-grade intraepithelial squamous lesion (LSIL), and high-grade intraepithelial squamous lesion (HSIL). The capability of the developed screening system was tested using 700 FE-SEM/EDX spectra (300 normal, 200 LSIL, and 200 HSIL cases). The accuracy, sensitivity, and specificity performances were 98.2%, 99.0%, and 98.0%, respectively.

  5. Automatic machine learning based prediction of cardiovascular events in lung cancer screening data

    NASA Astrophysics Data System (ADS)

    de Vos, Bob D.; de Jong, Pim A.; Wolterink, Jelmer M.; Vliegenthart, Rozemarijn; Wielingen, Geoffrey V. F.; Viergever, Max A.; Išgum, Ivana

    2015-03-01

    Calcium burden determined in CT images acquired in lung cancer screening is a strong predictor of cardiovascular events (CVEs). This study investigated whether subjects undergoing such screening who are at risk of a CVE can be identified using automatic image analysis and subject characteristics. Moreover, the study examined whether these individuals can be identified using solely image information, or if a combination of image and subject data is needed. A set of 3559 male subjects undergoing Dutch-Belgian lung cancer screening trial was included. Low-dose non-ECG synchronized chest CT images acquired at baseline were analyzed (1834 scanned in the University Medical Center Groningen, 1725 in the University Medical Center Utrecht). Aortic and coronary calcifications were identified using previously developed automatic algorithms. A set of features describing number, volume and size distribution of the detected calcifications was computed. Age of the participants was extracted from image headers. Features describing participants' smoking status, smoking history and past CVEs were obtained. CVEs that occurred within three years after the imaging were used as outcome. Support vector machine classification was performed employing different feature sets using sets of only image features, or a combination of image and subject related characteristics. Classification based solely on the image features resulted in the area under the ROC curve (Az) of 0.69. A combination of image and subject features resulted in an Az of 0.71. The results demonstrate that subjects undergoing lung cancer screening who are at risk of CVE can be identified using automatic image analysis. Adding subject information slightly improved the performance.

  6. Automatic detection and recognition of signs from natural scenes.

    PubMed

    Chen, Xilin; Yang, Jie; Zhang, Jing; Waibel, Alex

    2004-01-01

    In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.

  7. Seismpol_ a visual-basic computer program for interactive and automatic earthquake waveform analysis

    NASA Astrophysics Data System (ADS)

    Patanè, Domenico; Ferrari, Ferruccio

    1997-11-01

    A Microsoft Visual-Basic computer program for waveform analysis of seismic signals is presented. The program combines interactive and automatic processing of digital signals using data recorded by three-component seismic stations. The analysis procedure can be used in either an interactive earthquake analysis or an automatic on-line processing of seismic recordings. The algorithm works in the time domain using the Covariance Matrix Decomposition method (CMD), so that polarization characteristics may be computed continuously in real time and seismic phases can be identified and discriminated. Visual inspection of the particle motion in hortogonal planes of projection (hodograms) reduces the danger of misinterpretation derived from the application of the polarization filter. The choice of time window and frequency intervals improves the quality of the extracted polarization information. In fact, the program uses a band-pass Butterworth filter to process the signals in the frequency domain by analysis of a selected signal window into a series of narrow frequency bands. Significant results supported by well defined polarizations and source azimuth estimates for P and S phases are also obtained for short-period seismic events (local microearthquakes).

  8. Enhanced automatic artifact detection based on independent component analysis and Renyi's entropy.

    PubMed

    Mammone, Nadia; Morabito, Francesco Carlo

    2008-09-01

    Artifacts are disturbances that may occur during signal acquisition and may affect their processing. The aim of this paper is to propose a technique for automatically detecting artifacts from the electroencephalographic (EEG) recordings. In particular, a technique based on both Independent Component Analysis (ICA) to extract artifactual signals and on Renyi's entropy to automatically detect them is presented. This technique is compared to the widely known approach based on ICA and the joint use of kurtosis and Shannon's entropy. The novel processing technique is shown to detect on average 92.6% of the artifactual signals against the average 68.7% of the previous technique on the studied available database. Moreover, Renyi's entropy is shown to be able to detect muscle and very low frequency activity as well as to discriminate them from other kinds of artifacts. In order to achieve an efficient rejection of the artifacts while minimizing the information loss, future efforts will be devoted to the improvement of blind artifact separation from EEG in order to ensure a very efficient isolation of the artifactual activity from any signals deriving from other brain tasks.

  9. Ganges-Brahmaputra-Meghna Delta Connectivity Analysis Using New Tools for the Automatic Extraction of Channel Networks from Remotely Sensed Imagery

    NASA Astrophysics Data System (ADS)

    Jarriel, T. M.; Isikdogan, F.; Passalacqua, P.; Bovik, A.

    2017-12-01

    River deltas are one of the environmental ecosystems most threatened by climate change and anthropogenic activity. While their low elevation gradients and fertile soil have made them optimal for human inhabitation and diverse ecologic growth, it also makes them susceptible to adverse effects of sea level rise, flooding, subsidence, and manmade structures such as dams, levees, and dikes. One particularly large and threatened delta that is the focus area of this study, is the Ganges-Brahmaputra-Meghna Delta (GBMD) on the southern coast of Bangladesh/West Bengal India. In this study we analyze the GBMD channel network, identify areas of maximum change of the network, and use this information to predict how the network will respond under future scenarios. Landsat images of the delta from 1973 to 2017 are analyzed using new tools for the automatic extraction of channel networks from remotely sensed imagery [Isikdogan et al., 2017a, Isikdogan et al., 2017b]. The tools return channel width and channel centerline location at the resolution of the input imagery (30 m). Channel location variance over time is computed using the combined data from 1973 to 2017 and, based on this information, zones of highest change in the system are identified (Figure 1). Network metrics measuring characteristics of the delta's channels and islands are calculated for each year of the study and compared to the variance results in order to identify what metrics capture this change. These results provide both a method to identify zones of the GBMD that are currently experiencing the most change, as well as a means to predict what areas of the delta will experience network changes in the future. This information will be useful for informing coastal sustainability decisions about what areas of such a large and complex network should be the focus of remediation and mitigation efforts. Isikdogan, F., A. Bovik, P. Passalacqua (2017a), RivaMap: An Automated River Analysis and Mapping Engine, Remote Sensing of Environment, in press. Isikdogan, F., A. Bovik, P. Passalacqua (2017b), River Network Extraction by Deep Convolutional Neural Networks, IEEE Geoscience and Remote Sensing Letters, under review.

  10. Pre-processing liquid chromatography/high-resolution mass spectrometry data: extracting pure mass spectra by deconvolution from the invariance of isotopic distribution.

    PubMed

    Krishnan, Shaji; Verheij, Elwin E R; Bas, Richard C; Hendriks, Margriet W B; Hankemeier, Thomas; Thissen, Uwe; Coulier, Leon

    2013-05-15

    Mass spectra obtained by deconvolution of liquid chromatography/high-resolution mass spectrometry (LC/HRMS) data can be impaired by non-informative mass-over-charge (m/z) channels. This impairment of mass spectra can have significant negative influence on further post-processing, like quantification and identification. A metric derived from the knowledge of errors in isotopic distribution patterns, and quality of the signal within a pre-defined mass chromatogram block, has been developed to pre-select all informative m/z channels. This procedure results in the clean-up of deconvoluted mass spectra by maintaining the intensity counts from m/z channels that originate from a specific compound/molecular ion, for example, molecular ion, adducts, (13) C-isotopes, multiply charged ions and removing all m/z channels that are not related to the specific peak. The methodology has been successfully demonstrated for two sets of high-resolution LC/MS data. The approach described is therefore thought to be a useful tool in the automatic processing of LC/HRMS data. It clearly shows the advantages compared to other approaches like peak picking and de-isotoping in the sense that all information is retained while non-informative data is removed automatically. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Morphological feature extraction for the classification of digital images of cancerous tissues.

    PubMed

    Thiran, J P; Macq, B

    1996-10-01

    This paper presents a new method for automatic recognition of cancerous tissues from an image of a microscopic section. Based on the shape and the size analysis of the observed cells, this method provides the physician with nonsubjective numerical values for four criteria of malignancy. This automatic approach is based on mathematical morphology, and more specifically on the use of Geodesy. This technique is used first to remove the background noise from the image and then to operate a segmentation of the nuclei of the cells and an analysis of their shape, their size, and their texture. From the values of the extracted criteria, an automatic classification of the image (cancerous or not) is finally operated.

  12. Fully Automatic Speech-Based Analysis of the Semantic Verbal Fluency Task.

    PubMed

    König, Alexandra; Linz, Nicklas; Tröger, Johannes; Wolters, Maria; Alexandersson, Jan; Robert, Phillipe

    2018-06-08

    Semantic verbal fluency (SVF) tests are routinely used in screening for mild cognitive impairment (MCI). In this task, participants name as many items as possible of a semantic category under a time constraint. Clinicians measure task performance manually by summing the number of correct words and errors. More fine-grained variables add valuable information to clinical assessment, but are time-consuming. Therefore, the aim of this study is to investigate whether automatic analysis of the SVF could provide these as accurate as manual and thus, support qualitative screening of neurocognitive impairment. SVF data were collected from 95 older people with MCI (n = 47), Alzheimer's or related dementias (ADRD; n = 24), and healthy controls (HC; n = 24). All data were annotated manually and automatically with clusters and switches. The obtained metrics were validated using a classifier to distinguish HC, MCI, and ADRD. Automatically extracted clusters and switches were highly correlated (r = 0.9) with manually established values, and performed as well on the classification task separating HC from persons with ADRD (area under curve [AUC] = 0.939) and MCI (AUC = 0.758). The results show that it is possible to automate fine-grained analyses of SVF data for the assessment of cognitive decline. © 2018 S. Karger AG, Basel.

  13. A method for real-time implementation of HOG feature extraction

    NASA Astrophysics Data System (ADS)

    Luo, Hai-bo; Yu, Xin-rong; Liu, Hong-mei; Ding, Qing-hai

    2011-08-01

    Histogram of oriented gradient (HOG) is an efficient feature extraction scheme, and HOG descriptors are feature descriptors which is widely used in computer vision and image processing for the purpose of biometrics, target tracking, automatic target detection(ATD) and automatic target recognition(ATR) etc. However, computation of HOG feature extraction is unsuitable for hardware implementation since it includes complicated operations. In this paper, the optimal design method and theory frame for real-time HOG feature extraction based on FPGA were proposed. The main principle is as follows: firstly, the parallel gradient computing unit circuit based on parallel pipeline structure was designed. Secondly, the calculation of arctangent and square root operation was simplified. Finally, a histogram generator based on parallel pipeline structure was designed to calculate the histogram of each sub-region. Experimental results showed that the HOG extraction can be implemented in a pixel period by these computing units.

  14. Automatic generation of investigator bibliographies for institutional research networking systems.

    PubMed

    Johnson, Stephen B; Bales, Michael E; Dine, Daniel; Bakken, Suzanne; Albert, Paul J; Weng, Chunhua

    2014-10-01

    Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Automatic generation of investigator bibliographies for institutional research networking systems

    PubMed Central

    Johnson, Stephen B.; Bales, Michael E.; Dine, Daniel; Bakken, Suzanne; Albert, Paul J.; Weng, Chunhua

    2014-01-01

    Objective Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. Methods ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. Results About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss’ kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. Discussion ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators. PMID:24694772

  16. HYPOTrace: image analysis software for measuring hypocotyl growth and shape demonstrated on Arabidopsis seedlings undergoing photomorphogenesis.

    PubMed

    Wang, Liya; Uilecan, Ioan Vlad; Assadi, Amir H; Kozmik, Christine A; Spalding, Edgar P

    2009-04-01

    Analysis of time series of images can quantify plant growth and development, including the effects of genetic mutations (phenotypes) that give information about gene function. Here is demonstrated a software application named HYPOTrace that automatically extracts growth and shape information from electronic gray-scale images of Arabidopsis (Arabidopsis thaliana) seedlings. Key to the method is the iterative application of adaptive local principal components analysis to extract a set of ordered midline points (medial axis) from images of the seedling hypocotyl. Pixel intensity is weighted to avoid the medial axis being diverted by the cotyledons in areas where the two come in contact. An intensity feature useful for terminating the midline at the hypocotyl apex was isolated in each image by subtracting the baseline with a robust local regression algorithm. Applying the algorithm to time series of images of Arabidopsis seedlings responding to light resulted in automatic quantification of hypocotyl growth rate, apical hook opening, and phototropic bending with high spatiotemporal resolution. These functions are demonstrated here on wild-type, cryptochrome1, and phototropin1 seedlings for the purpose of showing that HYPOTrace generated expected results and to show how much richer the machine-vision description is compared to methods more typical in plant biology. HYPOTrace is expected to benefit seedling development research, particularly in the photomorphogenesis field, by replacing many tedious, error-prone manual measurements with a precise, largely automated computational tool.

  17. Towards semi-automatic rock mass discontinuity orientation and set analysis from 3D point clouds

    NASA Astrophysics Data System (ADS)

    Guo, Jiateng; Liu, Shanjun; Zhang, Peina; Wu, Lixin; Zhou, Wenhui; Yu, Yinan

    2017-06-01

    Obtaining accurate information on rock mass discontinuities for deformation analysis and the evaluation of rock mass stability is important. Obtaining measurements for high and steep zones with the traditional compass method is difficult. Photogrammetry, three-dimensional (3D) laser scanning and other remote sensing methods have gradually become mainstream methods. In this study, a method that is based on a 3D point cloud is proposed to semi-automatically extract rock mass structural plane information. The original data are pre-treated prior to segmentation by removing outlier points. The next step is to segment the point cloud into different point subsets. Various parameters, such as the normal, dip/direction and dip, can be calculated for each point subset after obtaining the equation of the best fit plane for the relevant point subset. A cluster analysis (a point subset that satisfies some conditions and thus forms a cluster) is performed based on the normal vectors by introducing the firefly algorithm (FA) and the fuzzy c-means (FCM) algorithm. Finally, clusters that belong to the same discontinuity sets are merged and coloured for visualization purposes. A prototype system is developed based on this method to extract the points of the rock discontinuity from a 3D point cloud. A comparison with existing software shows that this method is feasible. This method can provide a reference for rock mechanics, 3D geological modelling and other related fields.

  18. Learning to File: Reconfiguring Information and Information Work in the Early Twentieth Century.

    PubMed

    Robertson, Craig

    2017-01-01

    This article uses textbooks and advertisements to explore the formal and informal ways in which people were introduced to vertical filing in the early twentieth century. Through the privileging of "system" an ideal mode of paperwork emerged in which a clerk could "grasp" information simply by hand without having to understand or comprehend its content. A file clerk's hands and fingers became central to the representation and teaching of filing. In this way, filing offered an example of a distinctly modern form of information work. Filing textbooks sought to enhance dexterity as the rapid handling of paper came to represent information as something that existed in discrete units, in bits that could be easily extracted. Advertisements represented this mode of information work in its ideal form when they frequently erased the worker or reduced him or her to hands, as "instant" filing became "automatic" filing, with the filing cabinet presented as a machine.

  19. Attention-Based Recurrent Temporal Restricted Boltzmann Machine for Radar High Resolution Range Profile Sequence Recognition.

    PubMed

    Zhang, Yifan; Gao, Xunzhang; Peng, Xuan; Ye, Jiaqi; Li, Xiang

    2018-05-16

    The High Resolution Range Profile (HRRP) recognition has attracted great concern in the field of Radar Automatic Target Recognition (RATR). However, traditional HRRP recognition methods failed to model high dimensional sequential data efficiently and have a poor anti-noise ability. To deal with these problems, a novel stochastic neural network model named Attention-based Recurrent Temporal Restricted Boltzmann Machine (ARTRBM) is proposed in this paper. RTRBM is utilized to extract discriminative features and the attention mechanism is adopted to select major features. RTRBM is efficient to model high dimensional HRRP sequences because it can extract the information of temporal and spatial correlation between adjacent HRRPs. The attention mechanism is used in sequential data recognition tasks including machine translation and relation classification, which makes the model pay more attention to the major features of recognition. Therefore, the combination of RTRBM and the attention mechanism makes our model effective for extracting more internal related features and choose the important parts of the extracted features. Additionally, the model performs well with the noise corrupted HRRP data. Experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset show that our proposed model outperforms other traditional methods, which indicates that ARTRBM extracts, selects, and utilizes the correlation information between adjacent HRRPs effectively and is suitable for high dimensional data or noise corrupted data.

  20. The design of automatic software testing module for civil aviation information system

    NASA Astrophysics Data System (ADS)

    Qi, Qi; Sun, Yang

    2018-05-01

    In this paper, the practical innovation design is carried out according to the urgent needs of the automatic testing module of civil aviation information system. Firstly, the background and significance of the automatic testing module of civil aviation information system is expounded, and the current research status of automatic testing module and the advantages and disadvantages of related software are analyzed. Then, from the three aspects of macro demand, module functional requirement and module nonfunctional demand, we further study the needs of automatic testing module of civil aviation information system. Finally, from the four aspects of module structure, module core function, database and security, we have made an innovative plan for the automatic testing module of civil aviation information system.

  1. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach.

    PubMed

    Rinaldi, Fabio; Schneider, Gerold; Kaljurand, Kaarel; Hess, Michael; Andronis, Christos; Konstandi, Ourania; Persidis, Andreas

    2007-02-01

    The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results. This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus. We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus. We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.

  2. Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

    NASA Astrophysics Data System (ADS)

    Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

    2017-05-01

    This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.

  3. Automatic quantitative analysis of in-stent restenosis using FD-OCT in vivo intra-arterial imaging.

    PubMed

    Mandelias, Kostas; Tsantis, Stavros; Spiliopoulos, Stavros; Katsakiori, Paraskevi F; Karnabatidis, Dimitris; Nikiforidis, George C; Kagadis, George C

    2013-06-01

    A new segmentation technique is implemented for automatic lumen area extraction and stent strut detection in intravascular optical coherence tomography (OCT) images for the purpose of quantitative analysis of in-stent restenosis (ISR). In addition, a user-friendly graphical user interface (GUI) is developed based on the employed algorithm toward clinical use. Four clinical datasets of frequency-domain OCT scans of the human femoral artery were analyzed. First, a segmentation method based on fuzzy C means (FCM) clustering and wavelet transform (WT) was applied toward inner luminal contour extraction. Subsequently, stent strut positions were detected by utilizing metrics derived from the local maxima of the wavelet transform into the FCM membership function. The inner lumen contour and the position of stent strut were extracted with high precision. Compared to manual segmentation by an expert physician, the automatic lumen contour delineation had an average overlap value of 0.917 ± 0.065 for all OCT images included in the study. The strut detection procedure achieved an overall accuracy of 93.80% and successfully identified 9.57 ± 0.5 struts for every OCT image. Processing time was confined to approximately 2.5 s per OCT frame. A new fast and robust automatic segmentation technique combining FCM and WT for lumen border extraction and strut detection in intravascular OCT images was designed and implemented. The proposed algorithm integrated in a GUI represents a step forward toward the employment of automated quantitative analysis of ISR in clinical practice.

  4. Realtime automatic metal extraction of medical x-ray images for contrast improvement

    NASA Astrophysics Data System (ADS)

    Prangl, Martin; Hellwagner, Hermann; Spielvogel, Christian; Bischof, Horst; Szkaliczki, Tibor

    2006-03-01

    This paper focuses on an approach for real-time metal extraction of x-ray images taken from modern x-ray machines like C-arms. Such machines are used for vessel diagnostics, surgical interventions, as well as cardiology, neurology and orthopedic examinations. They are very fast in taking images from different angles. For this reason, manual adjustment of contrast is infeasible and automatic adjustment algorithms have been applied to try to select the optimal radiation dose for contrast adjustment. Problems occur when metallic objects, e.g., a prosthesis or a screw, are in the absorption area of interest. In this case, the automatic adjustment mostly fails because the dark, metallic objects lead the algorithm to overdose the x-ray tube. This outshining effect results in overexposed images and bad contrast. To overcome this limitation, metallic objects have to be detected and extracted from images that are taken as input for the adjustment algorithm. In this paper, we present a real-time solution for extracting metallic objects of x-ray images. We will explore the characteristic features of metallic objects in x-ray images and their distinction from bone fragments which form the basis to find a successful way for object segmentation and classification. Subsequently, we will present our edge based real-time approach for successful and fast automatic segmentation and classification of metallic objects. Finally, experimental results on the effectiveness and performance of our approach based on a vast amount of input image data sets will be presented.

  5. Automatic extraction and visualization of object-oriented software design metrics

    NASA Astrophysics Data System (ADS)

    Lakshminarayana, Anuradha; Newman, Timothy S.; Li, Wei; Talburt, John

    2000-02-01

    Software visualization is a graphical representation of software characteristics and behavior. Certain modes of software visualization can be useful in isolating problems and identifying unanticipated behavior. In this paper we present a new approach to aid understanding of object- oriented software through 3D visualization of software metrics that can be extracted from the design phase of software development. The focus of the paper is a metric extraction method and a new collection of glyphs for multi- dimensional metric visualization. Our approach utilize the extensibility interface of a popular CASE tool to access and automatically extract the metrics from Unified Modeling Language class diagrams. Following the extraction of the design metrics, 3D visualization of these metrics are generated for each class in the design, utilizing intuitively meaningful 3D glyphs that are representative of the ensemble of metrics. Extraction and visualization of design metrics can aid software developers in the early study and understanding of design complexity.

  6. Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

    PubMed Central

    D’Avolio, Leonard W.; Litwin, Mark S.; Rogers, Selwyn O.; Bui, Alex A. T.

    2007-01-01

    Prostate cancer removal surgeries that result in tumor found at the surgical margin, otherwise known as a positive surgical margin, have a significantly higher chance of biochemical recurrence and clinical progression. To support clinical outcomes assessment a system was designed to automatically identify, extract, and classify key phrases from pathology reports describing this outcome. Heuristics and boundary detection were used to extract phrases. Phrases were then classified using support vector machines into one of three classes: ‘positive (involved) margins,’ ‘negative (uninvolved) margins,’ and ‘not-applicable or definitive.’ A total of 851 key phrases were extracted from a sample of 782 reports produced between 1996 and 2006 from two major hospitals. Despite differences in reporting style, at least 1 sentence containing a diagnosis was extracted from 780 of the 782 reports (99.74%). Of the 851 sentences extracted, 97.3% contained diagnoses. Overall accuracy of automated classification of extracted sentences into the three categories was 97.18%. PMID:18693818

  7. Reconciling disparate information in continuity of care documents: Piloting a system to consolidate structured clinical documents.

    PubMed

    Hosseini, Masoud; Jones, Josette; Faiola, Anthony; Vreeman, Daniel J; Wu, Huanmei; Dixon, Brian E

    2017-10-01

    Due to the nature of information generation in health care, clinical documents contain duplicate and sometimes conflicting information. Recent implementation of Health Information Exchange (HIE) mechanisms in which clinical summary documents are exchanged among disparate health care organizations can proliferate duplicate and conflicting information. To reduce information overload, a system to automatically consolidate information across multiple clinical summary documents was developed for an HIE network. The system receives any number of Continuity of Care Documents (CCDs) and outputs a single, consolidated record. To test the system, a randomly sampled corpus of 522 CCDs representing 50 unique patients was extracted from a large HIE network. The automated methods were compared to manual consolidation of information for three key sections of the CCD: problems, allergies, and medications. Manual consolidation of 11,631 entries was completed in approximately 150h. The same data were automatically consolidated in 3.3min. The system successfully consolidated 99.1% of problems, 87.0% of allergies, and 91.7% of medications. Almost all of the inaccuracies were caused by issues involving the use of standardized terminologies within the documents to represent individual information entries. This study represents a novel, tested tool for de-duplication and consolidation of CDA documents, which is a major step toward improving information access and the interoperability among information systems. While more work is necessary, automated systems like the one evaluated in this study will be necessary to meet the informatics needs of providers and health systems in the future. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Text de-identification for privacy protection: a study of its impact on clinical text information content.

    PubMed

    Meystre, Stéphane M; Ferrández, Óscar; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H

    2014-08-01

    As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Automatic lumen segmentation in IVOCT images using binary morphological reconstruction

    PubMed Central

    2013-01-01

    Background Atherosclerosis causes millions of deaths, annually yielding billions in expenses round the world. Intravascular Optical Coherence Tomography (IVOCT) is a medical imaging modality, which displays high resolution images of coronary cross-section. Nonetheless, quantitative information can only be obtained with segmentation; consequently, more adequate diagnostics, therapies and interventions can be provided. Since it is a relatively new modality, many different segmentation methods, available in the literature for other modalities, could be successfully applied to IVOCT images, improving accuracies and uses. Method An automatic lumen segmentation approach, based on Wavelet Transform and Mathematical Morphology, is presented. The methodology is divided into three main parts. First, the preprocessing stage attenuates and enhances undesirable and important information, respectively. Second, in the feature extraction block, wavelet is associated with an adapted version of Otsu threshold; hence, tissue information is discriminated and binarized. Finally, binary morphological reconstruction improves the binary information and constructs the binary lumen object. Results The evaluation was carried out by segmenting 290 challenging images from human and pig coronaries, and rabbit iliac arteries; the outcomes were compared with the gold standards made by experts. The resultant accuracy was obtained: True Positive (%) = 99.29 ± 2.96, False Positive (%) = 3.69 ± 2.88, False Negative (%) = 0.71 ± 2.96, Max False Positive Distance (mm) = 0.1 ± 0.07, Max False Negative Distance (mm) = 0.06 ± 0.1. Conclusions In conclusion, by segmenting a number of IVOCT images with various features, the proposed technique showed to be robust and more accurate than published studies; in addition, the method is completely automatic, providing a new tool for IVOCT segmentation. PMID:23937790

  10. Pattern recognition applied to seismic signals of Llaima volcano (Chile): An evaluation of station-dependent classifiers

    NASA Astrophysics Data System (ADS)

    Curilem, Millaray; Huenupan, Fernando; Beltrán, Daniel; San Martin, Cesar; Fuentealba, Gustavo; Franco, Luis; Cardona, Carlos; Acuña, Gonzalo; Chacón, Max; Khan, M. Salman; Becerra Yoma, Nestor

    2016-04-01

    Automatic pattern recognition applied to seismic signals from volcanoes may assist seismic monitoring by reducing the workload of analysts, allowing them to focus on more challenging activities, such as producing reports, implementing models, and understanding volcanic behaviour. In a previous work, we proposed a structure for automatic classification of seismic events in Llaima volcano, one of the most active volcanoes in the Southern Andes, located in the Araucanía Region of Chile. A database of events taken from three monitoring stations on the volcano was used to create a classification structure, independent of which station provided the signal. The database included three types of volcanic events: tremor, long period, and volcano-tectonic and a contrast group which contains other types of seismic signals. In the present work, we maintain the same classification scheme, but we consider separately the stations information in order to assess whether the complementary information provided by different stations improves the performance of the classifier in recognising seismic patterns. This paper proposes two strategies for combining the information from the stations: i) combining the features extracted from the signals from each station and ii) combining the classifiers of each station. In the first case, the features extracted from the signals from each station are combined forming the input for a single classification structure. In the second, a decision stage combines the results of the classifiers for each station to give a unique output. The results confirm that the station-dependent strategies that combine the features and the classifiers from several stations improves the classification performance, and that the combination of the features provides the best performance. The results show an average improvement of 9% in the classification accuracy when compared with the station-independent method.

  11. An automated procedure for detection of IDP's dwellings using VHR satellite imagery

    NASA Astrophysics Data System (ADS)

    Jenerowicz, Malgorzata; Kemper, Thomas; Soille, Pierre

    2011-11-01

    This paper presents the results for the estimation of dwellings structures in Al Salam IDP Camp, Southern Darfur, based on Very High Resolution multispectral satellite images obtained by implementation of Mathematical Morphology analysis. A series of image processing procedures, feature extraction methods and textural analysis have been applied in order to provide reliable information about dwellings structures. One of the issues in this context is related to similarity of the spectral response of thatched dwellings' roofs and the surroundings in the IDP camps, where the exploitation of multispectral information is crucial. This study shows the advantage of automatic extraction approach and highlights the importance of detailed spatial and spectral information analysis based on multi-temporal dataset. The additional data fusion of high-resolution panchromatic band with lower resolution multispectral bands of WorldView-2 satellite has positive influence on results and thereby can be useful for humanitarian aid agency, providing support of decisions and estimations of population especially in situations when frequent revisits by space imaging system are the only possibility of continued monitoring.

  12. A Visual Analytics Framework for Identifying Topic Drivers in Media Events.

    PubMed

    Lu, Yafeng; Wang, Hong; Landis, Steven; Maciejewski, Ross

    2017-09-14

    Media data has been the subject of large scale analysis with applications of text mining being used to provide overviews of media themes and information flows. Such information extracted from media articles has also shown its contextual value of being integrated with other data, such as criminal records and stock market pricing. In this work, we explore linking textual media data with curated secondary textual data sources through user-guided semantic lexical matching for identifying relationships and data links. In this manner, critical information can be identified and used to annotate media timelines in order to provide a more detailed overview of events that may be driving media topics and frames. These linked events are further analyzed through an application of causality modeling to model temporal drivers between the data series. Such causal links are then annotated through automatic entity extraction which enables the analyst to explore persons, locations, and organizations that may be pertinent to the media topic of interest. To demonstrate the proposed framework, two media datasets and an armed conflict event dataset are explored.

  13. ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files.

    PubMed

    Karthikeyan, Muthukumarasamy; Vyas, Renu

    2016-01-01

    Digital access to chemical journals resulted in a vast array of molecular information that is now available in the supplementary material files in PDF format. However, extracting this molecular information, generally from a PDF document format is a daunting task. Here we present an approach to harvest 3D molecular data from the supporting information of scientific research articles that are normally available from publisher's resources. In order to demonstrate the feasibility of extracting truly computable molecules from PDF file formats in a fast and efficient manner, we have developed a Java based application, namely ChemEngine. This program recognizes textual patterns from the supplementary data and generates standard molecular structure data (bond matrix, atomic coordinates) that can be subjected to a multitude of computational processes automatically. The methodology has been demonstrated via several case studies on different formats of coordinates data stored in supplementary information files, wherein ChemEngine selectively harvested the atomic coordinates and interpreted them as molecules with high accuracy. The reusability of extracted molecular coordinate data was demonstrated by computing Single Point Energies that were in close agreement with the original computed data provided with the articles. It is envisaged that the methodology will enable large scale conversion of molecular information from supplementary files available in the PDF format into a collection of ready- to- compute molecular data to create an automated workflow for advanced computational processes. Software along with source codes and instructions available at https://sourceforge.net/projects/chemengine/files/?source=navbar.Graphical abstract.

  14. Multi-layer cube sampling for liver boundary detection in PET-CT images.

    PubMed

    Liu, Xinxin; Yang, Jian; Song, Shuang; Song, Hong; Ai, Danni; Zhu, Jianjun; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Liver metabolic information is considered as a crucial diagnostic marker for the diagnosis of fever of unknown origin, and liver recognition is the basis of automatic diagnosis of metabolic information extraction. However, the poor quality of PET and CT images is a challenge for information extraction and target recognition in PET-CT images. The existing detection method cannot meet the requirement of liver recognition in PET-CT images, which is the key problem in the big data analysis of PET-CT images. A novel texture feature descriptor called multi-layer cube sampling (MLCS) is developed for liver boundary detection in low-dose CT and PET images. The cube sampling feature is proposed for extracting more texture information, which uses a bi-centric voxel strategy. Neighbour voxels are divided into three regions by the centre voxel and the reference voxel in the histogram, and the voxel distribution information is statistically classified as texture feature. Multi-layer texture features are also used to improve the ability and adaptability of target recognition in volume data. The proposed feature is tested on the PET and CT images for liver boundary detection. For the liver in the volume data, mean detection rate (DR) and mean error rate (ER) reached 95.15 and 7.81% in low-quality PET images, and 83.10 and 21.08% in low-contrast CT images. The experimental results demonstrated that the proposed method is effective and robust for liver boundary detection.

  15. Automatic SAR/optical cross-matching for GCP monograph generation

    NASA Astrophysics Data System (ADS)

    Nutricato, Raffaele; Morea, Alberto; Nitti, Davide Oscar; La Mantia, Claudio; Agrimano, Luigi; Samarelli, Sergio; Chiaradia, Maria Teresa

    2016-10-01

    Ground Control Points (GCP), automatically extracted from Synthetic Aperture Radar (SAR) images through 3D stereo analysis, can be effectively exploited for an automatic orthorectification of optical imagery if they can be robustly located in the basic optical images. The present study outlines a SAR/Optical cross-matching procedure that allows a robust alignment of radar and optical images, and consequently to derive automatically the corresponding sub-pixel position of the GCPs in the optical image in input, expressed as fractional pixel/line image coordinates. The cross-matching in performed in two subsequent steps, in order to gradually gather a better precision. The first step is based on the Mutual Information (MI) maximization between optical and SAR chips while the last one uses the Normalized Cross-Correlation as similarity metric. This work outlines the designed algorithmic solution and discusses the results derived over the urban area of Pisa (Italy), where more than ten COSMO-SkyMed Enhanced Spotlight stereo images with different beams and passes are available. The experimental analysis involves different satellite images, in order to evaluate the performances of the algorithm w.r.t. the optical spatial resolution. An assessment of the performances of the algorithm has been carried out, and errors are computed by measuring the distance between the GCP pixel/line position in the optical image, automatically estimated by the tool, and the "true" position of the GCP, visually identified by an expert user in the optical images.

  16. Development of an automatic cow body condition scoring using body shape signature and Fourier descriptors.

    PubMed

    Bercovich, A; Edan, Y; Alchanatis, V; Moallem, U; Parmet, Y; Honig, H; Maltz, E; Antler, A; Halachmi, I

    2013-01-01

    Body condition evaluation is a common tool to assess energy reserves of dairy cows and to estimate their fatness or thinness. This study presents a computer-vision tool that automatically estimates cow's body condition score. Top-view images of 151 cows were collected on an Israeli research dairy farm using a digital still camera located at the entrance to the milking parlor. The cow's tailhead area and its contour were segmented and extracted automatically. Two types of features of the tailhead contour were extracted: (1) the angles and distances between 5 anatomical points; and (2) the cow signature, which is a 1-dimensional vector of the Euclidean distances from each point in the normalized tailhead contour to the shape center. Two methods were applied to describe the cow's signature and to reduce its dimension: (1) partial least squares regression, and (2) Fourier descriptors of the cow signature. Three prediction models were compared with manual scores of an expert. Results indicate that (1) it is possible to automatically extract and predict body condition from color images without any manual interference; and (2) Fourier descriptors of the cow's signature result in improved performance (R(2)=0.77). Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  17. Automatic discovery of cell types and microcircuitry from neural connectomics

    PubMed Central

    Jonas, Eric; Kording, Konrad

    2015-01-01

    Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a non-parametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists in a principled and probabilistically coherent manner, including connectivity, cell body location, and the spatial distribution of synapses. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity, better than simpler algorithms. It also can reveal interesting structure in the nervous system of Caenorhabditis elegans and an old man-made microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets. DOI: http://dx.doi.org/10.7554/eLife.04250.001 PMID:25928186

  18. Automatic discovery of cell types and microcircuitry from neural connectomics

    DOE PAGES

    Jonas, Eric; Kording, Konrad

    2015-04-30

    Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a non-parametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists in a principled and probabilistically coherent manner, including connectivity, cell body location, and the spatial distribution of synapses. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity,more » better than simpler algorithms. It also can reveal interesting structure in the nervous system of Caenorhabditis elegans and an old man-made microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets.« less

  19. Study on online community user motif using web usage mining

    NASA Astrophysics Data System (ADS)

    Alphy, Meera; Sharma, Ajay

    2016-04-01

    The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.

  20. Automatic discovery of cell types and microcircuitry from neural connectomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jonas, Eric; Kording, Konrad

    Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a non-parametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists in a principled and probabilistically coherent manner, including connectivity, cell body location, and the spatial distribution of synapses. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity,more » better than simpler algorithms. It also can reveal interesting structure in the nervous system of Caenorhabditis elegans and an old man-made microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets.« less

  1. A hybrid model for automatic identification of risk factors for heart disease.

    PubMed

    Yang, Hui; Garibaldi, Jonathan M

    2015-12-01

    Coronary artery disease (CAD) is the leading cause of death in both the UK and worldwide. The detection of related risk factors and tracking their progress over time is of great importance for early prevention and treatment of CAD. This paper describes an information extraction system that was developed to automatically identify risk factors for heart disease in medical records while the authors participated in the 2014 i2b2/UTHealth NLP Challenge. Our approaches rely on several nature language processing (NLP) techniques such as machine learning, rule-based methods, and dictionary-based keyword spotting to cope with complicated clinical contexts inherent in a wide variety of risk factors. Our system achieved encouraging performance on the challenge test data with an overall micro-averaged F-measure of 0.915, which was competitive to the best system (F-measure of 0.927) of this challenge task. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Multi-Sensor Registration of Earth Remotely Sensed Imagery

    NASA Technical Reports Server (NTRS)

    LeMoigne, Jacqueline; Cole-Rhodes, Arlene; Eastman, Roger; Johnson, Kisha; Morisette, Jeffrey; Netanyahu, Nathan S.; Stone, Harold S.; Zavorin, Ilya; Zukor, Dorothy (Technical Monitor)

    2001-01-01

    Assuming that approximate registration is given within a few pixels by a systematic correction system, we develop automatic image registration methods for multi-sensor data with the goal of achieving sub-pixel accuracy. Automatic image registration is usually defined by three steps; feature extraction, feature matching, and data resampling or fusion. Our previous work focused on image correlation methods based on the use of different features. In this paper, we study different feature matching techniques and present five algorithms where the features are either original gray levels or wavelet-like features, and the feature matching is based on gradient descent optimization, statistical robust matching, and mutual information. These algorithms are tested and compared on several multi-sensor datasets covering one of the EOS Core Sites, the Konza Prairie in Kansas, from four different sensors: IKONOS (4m), Landsat-7/ETM+ (30m), MODIS (500m), and SeaWIFS (1000m).

  3. Automatic evidence quality prediction to support evidence-based decision making.

    PubMed

    Sarker, Abeed; Mollá, Diego; Paris, Cécile

    2015-06-01

    Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care. Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments. We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data. The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Automatically extracting the significant aspects evaluated in game reviews

    NASA Astrophysics Data System (ADS)

    Fong, Chiok Hoong; Ng, Yen Kaow

    2017-04-01

    Understanding the criteria (or "aspects") that reviewers use to evaluate games is important to game developers and publishers, since this will give them important input on how to improve their products. Techniques for the extraction of such aspects have been studied by others, albeit not specific to the gaming industry. In this paper we demonstrate an aspect extraction and analysis system specific to computer games. The system extracts game review texts from a list of known websites and automatically extracts candidate aspects from the review text using techniques from natural language processing and sentiment analysis. It then ranks the candidate aspects using the HITS algorithm. To evaluate the correctness of the extracted aspects, we used the system to calculate an overall score for each game by aggregating its highly-rated aspects, weighted by the importance of the respective aspects. The aggregated scores resulted in a ranking of games, which we compared to a known ranking from a popular website - the rankings showed overall consistency, which suggests that the system has extracted valuable aspects from the reviews. Using the extracted aspect, our system also facilitates the analysis of a game, by evaluating how review articles have rated its performance in these extracted aspects.

  5. Comparison of methods of DNA extraction for real-time PCR in a model of pleural tuberculosis.

    PubMed

    Santos, Ana; Cremades, Rosa; Rodríguez, Juan Carlos; García-Pachón, Eduardo; Ruiz, Montserrat; Royo, Gloria

    2010-01-01

    Molecular methods have been reported to have different sensitivities in the diagnosis of pleural tuberculosis and this may in part be caused by the use of different methods of DNA extraction. Our study compares nine DNA extraction systems in an experimental model of pleural tuberculosis. An inoculum of Mycobacterium tuberculosis was added to 23 pleural liquid samples with different characteristics. DNA was subsequently extracted using nine different methods (seven manual and two automatic) for analysis with real-time PCR. Only two methods were able to detect the presence of M. tuberculosis DNA in all the samples: extraction using columns (Qiagen) and automated extraction with the TNAI system (Roche). The automatic method is more expensive, but requires less time. Almost all the false negatives were because of the difficulty involved in extracting M. tuberculosis DNA, as in general, all the methods studied are capable of eliminating inhibitory substances that block the amplification reaction. The method of M. tuberculosis DNA extraction used affects the results of the diagnosis of pleural tuberculosis by molecular methods. DNA extraction systems that have been shown to be effective in pleural liquid should be used.

  6. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov

    PubMed Central

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V

    2016-01-01

    Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. PMID:27013523

  7. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

    PubMed

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua

    2016-07-01

    Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Analysis of Fundus Fluorescein Angiogram Based on the Hessian Matrix of Directional Curvelet Sub-bands and Distance Regularized Level Set Evolution.

    PubMed

    Soltanipour, Asieh; Sadri, Saeed; Rabbani, Hossein; Akhlaghi, Mohammad Reza

    2015-01-01

    This paper presents a new procedure for automatic extraction of the blood vessels and optic disk (OD) in fundus fluorescein angiogram (FFA). In order to extract blood vessel centerlines, the algorithm of vessel extraction starts with the analysis of directional images resulting from sub-bands of fast discrete curvelet transform (FDCT) in the similar directions and different scales. For this purpose, each directional image is processed by using information of the first order derivative and eigenvalues obtained from the Hessian matrix. The final vessel segmentation is obtained using a simple region growing algorithm iteratively, which merges centerline images with the contents of images resulting from modified top-hat transform followed by bit plane slicing. After extracting blood vessels from FFA image, candidates regions for OD are enhanced by removing blood vessels from the FFA image, using multi-structure elements morphology, and modification of FDCT coefficients. Then, canny edge detector and Hough transform are applied to the reconstructed image to extract the boundary of candidate regions. At the next step, the information of the main arc of the retinal vessels surrounding the OD region is used to extract the actual location of the OD. Finally, the OD boundary is detected by applying distance regularized level set evolution. The proposed method was tested on the FFA images from angiography unit of Isfahan Feiz Hospital, containing 70 FFA images from different diabetic retinopathy stages. The experimental results show the accuracy more than 93% for vessel segmentation and more than 87% for OD boundary extraction.

  9. Analysis of Fundus Fluorescein Angiogram Based on the Hessian Matrix of Directional Curvelet Sub-bands and Distance Regularized Level Set Evolution

    PubMed Central

    Soltanipour, Asieh; Sadri, Saeed; Rabbani, Hossein; Akhlaghi, Mohammad Reza

    2015-01-01

    This paper presents a new procedure for automatic extraction of the blood vessels and optic disk (OD) in fundus fluorescein angiogram (FFA). In order to extract blood vessel centerlines, the algorithm of vessel extraction starts with the analysis of directional images resulting from sub-bands of fast discrete curvelet transform (FDCT) in the similar directions and different scales. For this purpose, each directional image is processed by using information of the first order derivative and eigenvalues obtained from the Hessian matrix. The final vessel segmentation is obtained using a simple region growing algorithm iteratively, which merges centerline images with the contents of images resulting from modified top-hat transform followed by bit plane slicing. After extracting blood vessels from FFA image, candidates regions for OD are enhanced by removing blood vessels from the FFA image, using multi-structure elements morphology, and modification of FDCT coefficients. Then, canny edge detector and Hough transform are applied to the reconstructed image to extract the boundary of candidate regions. At the next step, the information of the main arc of the retinal vessels surrounding the OD region is used to extract the actual location of the OD. Finally, the OD boundary is detected by applying distance regularized level set evolution. The proposed method was tested on the FFA images from angiography unit of Isfahan Feiz Hospital, containing 70 FFA images from different diabetic retinopathy stages. The experimental results show the accuracy more than 93% for vessel segmentation and more than 87% for OD boundary extraction. PMID:26284170

  10. Automated Assessment of Child Vocalization Development Using LENA

    ERIC Educational Resources Information Center

    Richards, Jeffrey A.; Xu, Dongxin; Gilkerson, Jill; Yapanel, Umit; Gray, Sharmistha; Paul, Terrance

    2017-01-01

    Purpose: To produce a novel, efficient measure of children's expressive vocal development on the basis of automatic vocalization assessment (AVA), child vocalizations were automatically identified and extracted from audio recordings using Language Environment Analysis (LENA) System technology. Method: Assessment was based on full-day audio…

  11. Automated renal histopathology: digital extraction and quantification of renal pathology

    NASA Astrophysics Data System (ADS)

    Sarder, Pinaki; Ginley, Brandon; Tomaszewski, John E.

    2016-03-01

    The branch of pathology concerned with excess blood serum proteins being excreted in the urine pays particular attention to the glomerulus, a small intertwined bunch of capillaries located at the beginning of the nephron. Normal glomeruli allow moderate amount of blood proteins to be filtered; proteinuric glomeruli allow large amount of blood proteins to be filtered. Diagnosis of proteinuric diseases requires time intensive manual examination of the structural compartments of the glomerulus from renal biopsies. Pathological examination includes cellularity of individual compartments, Bowman's and luminal space segmentation, cellular morphology, glomerular volume, capillary morphology, and more. Long examination times may lead to increased diagnosis time and/or lead to reduced precision of the diagnostic process. Automatic quantification holds strong potential to reduce renal diagnostic time. We have developed a computational pipeline capable of automatically segmenting relevant features from renal biopsies. Our method first segments glomerular compartments from renal biopsies by isolating regions with high nuclear density. Gabor texture segmentation is used to accurately define glomerular boundaries. Bowman's and luminal spaces are segmented using morphological operators. Nuclei structures are segmented using color deconvolution, morphological processing, and bottleneck detection. Average computation time of feature extraction for a typical biopsy, comprising of ~12 glomeruli, is ˜69 s using an Intel(R) Core(TM) i7-4790 CPU, and is ~65X faster than manual processing. Using images from rat renal tissue samples, automatic glomerular structural feature estimation was reproducibly demonstrated for 15 biopsy images, which contained 148 individual glomeruli images. The proposed method holds immense potential to enhance information available while making clinical diagnoses.

  12. Chemometric strategy for automatic chromatographic peak detection and background drift correction in chromatographic data.

    PubMed

    Yu, Yong-Jie; Xia, Qiao-Ling; Wang, Sheng; Wang, Bing; Xie, Fu-Wei; Zhang, Xiao-Bing; Ma, Yun-Ming; Wu, Hai-Long

    2014-09-12

    Peak detection and background drift correction (BDC) are the key stages in using chemometric methods to analyze chromatographic fingerprints of complex samples. This study developed a novel chemometric strategy for simultaneous automatic chromatographic peak detection and BDC. A robust statistical method was used for intelligent estimation of instrumental noise level coupled with first-order derivative of chromatographic signal to automatically extract chromatographic peaks in the data. A local curve-fitting strategy was then employed for BDC. Simulated and real liquid chromatographic data were designed with various kinds of background drift and degree of overlapped chromatographic peaks to verify the performance of the proposed strategy. The underlying chromatographic peaks can be automatically detected and reasonably integrated by this strategy. Meanwhile, chromatograms with BDC can be precisely obtained. The proposed method was used to analyze a complex gas chromatography dataset that monitored quality changes in plant extracts during storage procedure. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Automatic digital image analysis for identification of mitotic cells in synchronous mammalian cell cultures.

    PubMed

    Eccles, B A; Klevecz, R R

    1986-06-01

    Mitotic frequency in a synchronous culture of mammalian cells was determined fully automatically and in real time using low-intensity phase-contrast microscopy and a newvicon video camera connected to an EyeCom III image processor. Image samples, at a frequency of one per minute for 50 hours, were analyzed by first extracting the high-frequency picture components, then thresholding and probing for annular objects indicative of putative mitotic cells. Both the extraction of high-frequency components and the recognition of rings of varying radii and discontinuities employed novel algorithms. Spatial and temporal relationships between annuli were examined to discern the occurrences of mitoses, and such events were recorded in a computer data file. At present, the automatic analysis is suited for random cell proliferation rate measurements or cell cycle studies. The automatic identification of mitotic cells as described here provides a measure of the average proliferative activity of the cell population as a whole and eliminates more than eight hours of manual review per time-lapse video recording.

  14. Automatic Author Profiling of Online Chat Logs

    DTIC Science & Technology

    2007-03-01

    CLASSIFICATION WITH PRIOR ..........91 1. All Test Data ................................91 2. Extracted Test Data: Teens and 20s ...........92 3...Extracted Test Data: Teens and 30s ...........92 4. Extracted Test Data: Teens and 40s ...........93 5. Extracted Test Data: Teens and 50s ...........93 6...Data ................................97 C. AGE: BINARY CLASSIFICATION WITH PRIOR .............98 1. Extracted Test Data: Teens and 20s ...........98 2

  15. Automatic indexing of scanned documents: a layout-based approach

    NASA Astrophysics Data System (ADS)

    Esser, Daniel; Schuster, Daniel; Muthmann, Klemens; Berger, Michael; Schill, Alexander

    2012-01-01

    Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender's name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose we apply the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.

  16. The Extraction of Terrace in the Loess Plateau Based on radial method

    NASA Astrophysics Data System (ADS)

    Liu, W.; Li, F.

    2016-12-01

    The terrace of Loess Plateau, as a typical kind of artificial landform and an important measure of soil and water conservation, its positioning and automatic extraction will simplify the work of land use investigation. The existing methods of terrace extraction mainly include visual interpretation and automatic extraction. The manual method is used in land use investigation, but it is time-consuming and laborious. Researchers put forward some automatic extraction methods. For example, Fourier transform method can recognize terrace and find accurate position from frequency domain image, but it is more affected by the linear objects in the same direction of terrace; Texture analysis method is simple and have a wide range application of image processing. The disadvantage of texture analysis method is unable to recognize terraces' edge; Object-oriented is a new method of image classification, but when introduce it to terrace extracting, fracture polygons will be the most serious problem and it is difficult to explain its geological meaning. In order to positioning the terraces, we use high- resolution remote sensing image to extract and analyze the gray value of the pixels which the radial went through. During the recognition process, we firstly use the DEM data analysis or by manual selecting, to roughly confirm the position of peak points; secondly, take each of the peak points as the center to make radials in all directions; finally, extracting the gray values of the pixels which the radials went through, and analyzing its changing characteristics to confirm whether the terrace exists. For the purpose of getting accurate position of terrace, terraces' discontinuity, extension direction, ridge width, image processing algorithm, remote sensing image illumination and other influence factors were fully considered when designing the algorithms.

  17. 47 CFR 87.5 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... station providing communication between a control tower and aircraft. Automatic dependent surveillance... relevant information about the aircraft. Automatic terminal information service-broadcast (ATIS-B). The automatic provision of current, routine information to arriving and departing aircraft throughout a 24-hour...

  18. 47 CFR 87.5 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... station providing communication between a control tower and aircraft. Automatic dependent surveillance... relevant information about the aircraft. Automatic terminal information service-broadcast (ATIS-B). The automatic provision of current, routine information to arriving and departing aircraft throughout a 24-hour...

  19. Collaborative Manufacturing Management in Networked Supply Chains

    NASA Astrophysics Data System (ADS)

    Pouly, Michel; Naciri, Souleiman; Berthold, Sébastien

    ERP systems provide information management and analysis to industrial companies and support their planning activities. They are currently mostly based on theoretical values (averages) of parameters and not on the actual, real shop floor data, leading to disturbance of the planning algorithms. On the other hand, sharing data between manufacturers, suppliers and customers becomes very important to ensure reactivity towards markets variability. This paper proposes software solutions to address these requirements and methods to automatically capture the necessary corresponding shop floor information. In order to share data produced by different legacy systems along the collaborative networked supply chain, we propose to use the Generic Product Model developed by Hitachi to extract, translate and store heterogeneous ERP data.

  20. Cross-beam coherence of infrasonic signals at local and regional ranges.

    PubMed

    Alberts, W C Kirkpatrick; Tenney, Stephen M

    2017-11-01

    Signals collected by infrasound arrays require continuous analysis by skilled personnel or by automatic algorithms in order to extract useable information. Typical pieces of information gained by analysis of infrasonic signals collected by multiple sensor arrays are arrival time, line of bearing, amplitude, and duration. These can all be used, often with significant accuracy, to locate sources. A very important part of this chain is associating collected signals across multiple arrays. Here, a pairwise, cross-beam coherence method of signal association is described that allows rapid signal association for high signal-to-noise ratio events captured by multiple infrasound arrays at ranges exceeding 150 km. Methods, test cases, and results are described.

  1. A humming retrieval system based on music fingerprint

    NASA Astrophysics Data System (ADS)

    Han, Xingkai; Cao, Baiyu

    2011-10-01

    In this paper, we proposed an improved music information retrieval method utilizing the music fingerprint. The goal of this method is to represent the music with compressed musical information. Based on the selected MIDI files, which are generated automatically as our music target database, we evaluate the accuracy, effectiveness, and efficiency of this method. In this research we not only extract the feature sequence, which can represent the file effectively, from the query and melody database, but also make it possible for retrieving the results in an innovative way. We investigate on the influence of noise to the performance of our system. As experimental result shows, the retrieval accuracy arriving at up to91% without noise is pretty well

  2. Coupling flood forecasting and social media crowdsourcing

    NASA Astrophysics Data System (ADS)

    Kalas, Milan; Kliment, Tomas; Salamon, Peter

    2016-04-01

    Social and mainstream media monitoring is being more and more recognized as valuable source of information in disaster management and response. The information on ongoing disasters could be detected in very short time and the social media can bring additional information to traditional data feeds (ground, remote observation schemes). Probably the biggest attempt to use the social media in the crisis management was the activation of the Digital Humanitarian Network by the United Nations Office for the Coordination of Humanitarian Affairs in response to Typhoon Yolanda. The network of volunteers performing rapid needs & damage assessment by tagging reports posted to social media which were then used by machine learning classifiers as a training set to automatically identify tweets referring to both urgent needs and offers of help. In this work we will present the potential of coupling a social media streaming and news monitoring application ( GlobalFloodNews - www.globalfloodsystem.com) with a flood forecasting system (www.globalfloods.eu) and the geo-catalogue of the OGC services discovered in the Google Search Engine (WMS, WFS, WCS, etc.) to provide a full suite of information available to crisis management centers as fast as possible. In GlobalFloodNews we use advanced filtering of the real-time Twitter stream, where the relevant information is automatically extracted using natural language and signal processing techniques. The keyword filters are adjusted and optimized automatically using machine learning algorithms as new reports are added to the system. In order to refine the search results the forecasting system will be triggering an event-based search on the social media and OGC services relevant for crisis response (population distribution, critical infrastructure, hospitals etc.). The current version of the system makes use of USHAHIDI Crowdmap platform, which is designed to easily crowdsource information using multiple channels, including SMS, email, Twitter and the web we want to show the potential of monitoring floods at the global scale.

  3. GenePublisher: Automated analysis of DNA microarray data.

    PubMed

    Knudsen, Steen; Workman, Christopher; Sicheritz-Ponten, Thomas; Friis, Carsten

    2003-07-01

    GenePublisher, a system for automatic analysis of data from DNA microarray experiments, has been implemented with a web interface at http://www.cbs.dtu.dk/services/GenePublisher. Raw data are uploaded to the server together with a specification of the data. The server performs normalization, statistical analysis and visualization of the data. The results are run against databases of signal transduction pathways, metabolic pathways and promoter sequences in order to extract more information. The results of the entire analysis are summarized in report form and returned to the user.

  4. Automatic design of conformal cooling channels in injection molding tooling

    NASA Astrophysics Data System (ADS)

    Zhang, Yingming; Hou, Binkui; Wang, Qian; Li, Yang; Huang, Zhigao

    2018-02-01

    The generation of cooling system plays an important role in injection molding design. A conformal cooling system can effectively improve molding efficiency and product quality. This paper provides a generic approach for building conformal cooling channels. The centrelines of these channels are generated in two steps. First, we extract conformal loops based on geometric information of product. Second, centrelines in spiral shape are built by blending these loops. We devise algorithms to implement the entire design process. A case study verifies the feasibility of this approach.

  5. Analysis of 2D Phase Contrast MRI in Renal Arteries by Self Organizing Maps

    NASA Astrophysics Data System (ADS)

    Zöllner, Frank G.; Schad, Lothar R.

    We present an approach based on self organizing maps to segment renal arteries from 2D PC Cine MR, images to measure blood velocity and flow. Such information are important in grading renal artery stenosis and support the decision on surgical interventions like percu-tan transluminal angioplasty. Results show that the renal arteries could be extracted automatically. The corresponding velocity profiles show high correlation (r=0.99) compared those from manual delineated vessels. Furthermore, the method could detect possible blood flow patterns within the vessel.

  6. Towards fish-eye camera based in-home activity assessment.

    PubMed

    Bas, Erhan; Erdogmus, Deniz; Ozertem, Umut; Pavel, Misha

    2008-01-01

    Indoors localization, activity classification, and behavioral modeling are increasingly important for surveillance applications including independent living and remote health monitoring. In this paper, we study the suitability of fish-eye cameras (high-resolution CCD sensors with very-wide-angle lenses) for the purpose of monitoring people in indoors environments. The results indicate that these sensors are very useful for automatic activity monitoring and people tracking. We identify practical and mathematical problems related to information extraction from these video sequences and identify future directions to solve these issues.

  7. Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection

    PubMed Central

    Lossio-Ventura, Juan Antonio; Hogan, William; Modave, François; Hicks, Amanda; Hanna, Josh; Guo, Yi; He, Zhe; Bian, Jiang

    2017-01-01

    Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993. PMID:28503356

  8. Characterizing rainfall in the Tenerife island

    NASA Astrophysics Data System (ADS)

    Díez-Sierra, Javier; del Jesus, Manuel; Losada Rodriguez, Inigo

    2017-04-01

    In many locations, rainfall data are collected through networks of meteorological stations. The data collection process is nowadays automated in many places, leading to the development of big databases of rainfall data covering extensive areas of territory. However, managers, decision makers and engineering consultants tend not to extract most of the information contained in these databases due to the lack of specific software tools for their exploitation. Here we present the modeling and development effort put in place in the Tenerife island in order to develop MENSEI-L, a software tool capable of automatically analyzing a complete rainfall database to simplify the extraction of information from observations. MENSEI-L makes use of weather type information derived from atmospheric conditions to separate the complete time series into homogeneous groups where statistical distributions are fitted. Normal and extreme regimes are obtained in this manner. MENSEI-L is also able to complete missing data in the time series and to generate synthetic stations by using Kriging techniques. These techniques also serve to generate the spatial regimes of precipitation, both normal and extreme ones. MENSEI-L makes use of weather type information to also provide a stochastic three-day probability forecast for rainfall.

  9. Automatic Generation of Conditional Diagnostic Guidelines.

    PubMed

    Baldwin, Tyler; Guo, Yufan; Syeda-Mahmood, Tanveer

    2016-01-01

    The diagnostic workup for many diseases can be extraordinarily nuanced, and as such reference material text often contains extensive information regarding when it is appropriate to have a patient undergo a given procedure. In this work we employ a three task pipeline for the extraction of statements indicating the conditions under which a procedure should be performed, given a suspected diagnosis. First, we identify each instance in the text where a procedure is being recommended. Next we examine the context around these recommendations to extract conditional statements that dictate the conditions under which the recommendation holds. Finally, corefering recommendations across the document are linked to produce a full recommendation summary. Results indicate that each underlying task can be performed with above baseline performance, and the output can be used to produce concise recommendation summaries.

  10. Roi Detection and Vessel Segmentation in Retinal Image

    NASA Astrophysics Data System (ADS)

    Sabaz, F.; Atila, U.

    2017-11-01

    Diabetes disrupts work by affecting the structure of the eye and afterwards leads to loss of vision. Depending on the stage of disease that called diabetic retinopathy, there are sudden loss of vision and blurred vision problems. Automated detection of vessels in retinal images is a useful study to diagnose eye diseases, disease classification and other clinical trials. The shape and structure of the vessels give information about the severity of the disease and the stage of the disease. Automatic and fast detection of vessels allows for a quick diagnosis of the disease and the treatment process to start shortly. ROI detection and vessel extraction methods for retinal image are mentioned in this study. It is shown that the Frangi filter used in image processing can be successfully used in detection and extraction of vessels.

  11. Playing biology's name game: identifying protein names in scientific text.

    PubMed

    Hanisch, Daniel; Fluck, Juliane; Mevissen, Heinz-Theodor; Zimmer, Ralf

    2003-01-01

    A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and an accompanying automatic curation procedure based on a simple token model of protein names. We designed an efficient search algorithm to analyze all abstracts in MEDLINE in a reasonable amount of time on standard computers. The parameters of our method are optimized using machine learning techniques. Used in conjunction, these ingredients lead to good search performance. A supplementary web page is available at http://cartan.gmd.de/ProMiner/.

  12. Analysis of ketamine and norketamine in urine by automatic solid-phase extraction (SPE) and positive ion chemical ionization-gas chromatography-mass spectrometry (PCI-GC-MS).

    PubMed

    Kim, Eun-mi; Lee, Ju-seon; Choi, Sang-kil; Lim, Mi-ae; Chung, Hee-sun

    2008-01-30

    Ketamine (KT) is widely abused for hallucination and also misused as a "date-rape" drug in recent years. An analytical method using positive ion chemical ionization-gas chromatography-mass spectrometry (PCI-GC-MS) with an automatic solid-phase extraction (SPE) apparatus was studied for the determination of KT and its major metabolite, norketamine (NK), in urine. Six ketamine suspected urine samples were provided by the police. For the research of KT metabolism, KT was administered to SD rats by i.p. at a single dose of 5, 10 and 20mg/kg, respectively, and urine samples were collected 24, 48 and 72 h after administration. For the detection of KT and NK, urine samples were extracted on an automatic SPE apparatus (RapidTrace, Zymark) with mixed mode type cartridge, Drug-Clean (200 mg, Alltech). The identification of KT and NK was by PCI-GC-MS. m/z238 (M+1), 220 for KT, m/z 224 (M+1), 207 for NK and m/z307 (M+1) for Cocaine-D(3) as internal standard were extracted from the full-scan mass spectrum and the underlined ions were used for quantitation. Extracted calibration curves were linear from 50 to 1000 ng/mL for KT and NK with correlation coefficients exceeding 0.99. The limit of detection (LOD) was 25 ng/mL for KT and NK. The limit of quantitation (LOQ) was 50 ng/mL for KT and NK. The recoveries of KT and NK at three different concentrations (86, 430 and 860 ng/mL) were 53.1 to 79.7% and 45.7 to 83.0%, respectively. The intra- and inter-day run precisions (CV) for KT and NK were less than 15.0%, and the accuracies (bias) for KT and NK were also less than 15% at the three different concentration levels (86, 430 and 860 ng/mL). The analytical method was also applied to real six KT suspected urine specimens and KT administered rat urines, and the concentrations of KT and NK were determined. Dehydronorketamine (DHNK) was also confirmed in these urine samples, however the concentration of DHNK was not calculated. SPE is simple, and needs less organic solvent than liquid-liquid extraction (LLE), and PCI-GC-MS can offer both qualitative and quantitative information for urinalysis of KT in forensic analysis.

  13. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  14. Second Iteration of Photogrammetric Pipeline to Enhance the Accuracy of Image Pose Estimation

    NASA Astrophysics Data System (ADS)

    Nguyen, T. G.; Pierrot-Deseilligny, M.; Muller, J.-M.; Thom, C.

    2017-05-01

    In classical photogrammetric processing pipeline, the automatic tie point extraction plays a key role in the quality of achieved results. The image tie points are crucial to pose estimation and have a significant influence on the precision of calculated orientation parameters. Therefore, both relative and absolute orientations of the 3D model can be affected. By improving the precision of image tie point measurement, one can enhance the quality of image orientation. The quality of image tie points is under the influence of several factors such as the multiplicity, the measurement precision and the distribution in 2D images as well as in 3D scenes. In complex acquisition scenarios such as indoor applications and oblique aerial images, tie point extraction is limited while only image information can be exploited. Hence, we propose here a method which improves the precision of pose estimation in complex scenarios by adding a second iteration to the classical processing pipeline. The result of a first iteration is used as a priori information to guide the extraction of new tie points with better quality. Evaluated with multiple case studies, the proposed method shows its validity and its high potiential for precision improvement.

  15. Dilated contour extraction and component labeling algorithm for object vector representation

    NASA Astrophysics Data System (ADS)

    Skourikhine, Alexei N.

    2005-08-01

    Object boundary extraction from binary images is important for many applications, e.g., image vectorization, automatic interpretation of images containing segmentation results, printed and handwritten documents and drawings, maps, and AutoCAD drawings. Efficient and reliable contour extraction is also important for pattern recognition due to its impact on shape-based object characterization and recognition. The presented contour tracing and component labeling algorithm produces dilated (sub-pixel) contours associated with corresponding regions. The algorithm has the following features: (1) it always produces non-intersecting, non-degenerate contours, including the case of one-pixel wide objects; (2) it associates the outer and inner (i.e., around hole) contours with the corresponding regions during the process of contour tracing in a single pass over the image; (3) it maintains desired connectivity of object regions as specified by 8-neighbor or 4-neighbor connectivity of adjacent pixels; (4) it avoids degenerate regions in both background and foreground; (5) it allows an easy augmentation that will provide information about the containment relations among regions; (6) it has a time complexity that is dominantly linear in the number of contour points. This early component labeling (contour-region association) enables subsequent efficient object-based processing of the image information.

  16. Development of Mobile Mapping System for 3D Road Asset Inventory.

    PubMed

    Sairam, Nivedita; Nagarajan, Sudhagar; Ornitz, Scott

    2016-03-12

    Asset Management is an important component of an infrastructure project. A significant cost is involved in maintaining and updating the asset information. Data collection is the most time-consuming task in the development of an asset management system. In order to reduce the time and cost involved in data collection, this paper proposes a low cost Mobile Mapping System using an equipped laser scanner and cameras. First, the feasibility of low cost sensors for 3D asset inventory is discussed by deriving appropriate sensor models. Then, through calibration procedures, respective alignments of the laser scanner, cameras, Inertial Measurement Unit and GPS (Global Positioning System) antenna are determined. The efficiency of this Mobile Mapping System is experimented by mounting it on a truck and golf cart. By using derived sensor models, geo-referenced images and 3D point clouds are derived. After validating the quality of the derived data, the paper provides a framework to extract road assets both automatically and manually using techniques implementing RANSAC plane fitting and edge extraction algorithms. Then the scope of such extraction techniques along with a sample GIS (Geographic Information System) database structure for unified 3D asset inventory are discussed.

  17. Development of Mobile Mapping System for 3D Road Asset Inventory

    PubMed Central

    Sairam, Nivedita; Nagarajan, Sudhagar; Ornitz, Scott

    2016-01-01

    Asset Management is an important component of an infrastructure project. A significant cost is involved in maintaining and updating the asset information. Data collection is the most time-consuming task in the development of an asset management system. In order to reduce the time and cost involved in data collection, this paper proposes a low cost Mobile Mapping System using an equipped laser scanner and cameras. First, the feasibility of low cost sensors for 3D asset inventory is discussed by deriving appropriate sensor models. Then, through calibration procedures, respective alignments of the laser scanner, cameras, Inertial Measurement Unit and GPS (Global Positioning System) antenna are determined. The efficiency of this Mobile Mapping System is experimented by mounting it on a truck and golf cart. By using derived sensor models, geo-referenced images and 3D point clouds are derived. After validating the quality of the derived data, the paper provides a framework to extract road assets both automatically and manually using techniques implementing RANSAC plane fitting and edge extraction algorithms. Then the scope of such extraction techniques along with a sample GIS (Geographic Information System) database structure for unified 3D asset inventory are discussed. PMID:26985897

  18. An automatic dose verification system for adaptive radiotherapy for helical tomotherapy

    NASA Astrophysics Data System (ADS)

    Mo, Xiaohu; Chen, Mingli; Parnell, Donald; Olivera, Gustavo; Galmarini, Daniel; Lu, Weiguo

    2014-03-01

    Purpose: During a typical 5-7 week treatment of external beam radiotherapy, there are potential differences between planned patient's anatomy and positioning, such as patient weight loss, or treatment setup. The discrepancies between planned and delivered doses resulting from these differences could be significant, especially in IMRT where dose distributions tightly conforms to target volumes while avoiding organs-at-risk. We developed an automatic system to monitor delivered dose using daily imaging. Methods: For each treatment, a merged image is generated by registering the daily pre-treatment setup image and planning CT using treatment position information extracted from the Tomotherapy archive. The treatment dose is then computed on this merged image using our in-house convolution-superposition based dose calculator implemented on GPU. The deformation field between merged and planning CT is computed using the Morphon algorithm. The planning structures and treatment doses are subsequently warped for analysis and dose accumulation. All results are saved in DICOM format with private tags and organized in a database. Due to the overwhelming amount of information generated, a customizable tolerance system is used to flag potential treatment errors or significant anatomical changes. A web-based system and a DICOM-RT viewer were developed for reporting and reviewing the results. Results: More than 30 patients were analysed retrospectively. Our in-house dose calculator passed 97% gamma test evaluated with 2% dose difference and 2mm distance-to-agreement compared with Tomotherapy calculated dose, which is considered sufficient for adaptive radiotherapy purposes. Evaluation of the deformable registration through visual inspection showed acceptable and consistent results, except for cases with large or unrealistic deformation. Our automatic flagging system was able to catch significant patient setup errors or anatomical changes. Conclusions: We developed an automatic dose verification system that quantifies treatment doses, and provides necessary information for adaptive planning without impeding clinical workflows.

  19. Applying a weighted random forests method to extract karst sinkholes from LiDAR data

    NASA Astrophysics Data System (ADS)

    Zhu, Junfeng; Pierskalla, William P.

    2016-02-01

    Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.

  20. Challenges for automatically extracting molecular interactions from full-text articles.

    PubMed

    McIntosh, Tara; Curran, James R

    2009-09-24

    The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.

Top