Sample records for improving information extraction

  1. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  2. [Application of regular expression in extracting key information from Chinese medicine literatures about re-evaluation of post-marketing surveillance].

    PubMed

    Wang, Zhifei; Xie, Yanming; Wang, Yongyan

    2011-10-01

    Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.

  3. Extracting information from the text of electronic medical records to improve case detection: a systematic review

    PubMed Central

    Carroll, John A; Smith, Helen E; Scott, Donia; Cassell, Jackie A

    2016-01-01

    Background Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall). PMID:26911811

  4. a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information

    NASA Astrophysics Data System (ADS)

    Lian, Shizhong; Chen, Jiangping; Luo, Minghai

    2016-06-01

    Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.

  5. The algorithm of fast image stitching based on multi-feature extraction

    NASA Astrophysics Data System (ADS)

    Yang, Chunde; Wu, Ge; Shi, Jing

    2018-05-01

    This paper proposed an improved image registration method combining Hu-based invariant moment contour information and feature points detection, aiming to solve the problems in traditional image stitching algorithm, such as time-consuming feature points extraction process, redundant invalid information overload and inefficiency. First, use the neighborhood of pixels to extract the contour information, employing the Hu invariant moment as similarity measure to extract SIFT feature points in those similar regions. Then replace the Euclidean distance with Hellinger kernel function to improve the initial matching efficiency and get less mismatching points, further, estimate affine transformation matrix between the images. Finally, local color mapping method is adopted to solve uneven exposure, using the improved multiresolution fusion algorithm to fuse the mosaic images and realize seamless stitching. Experimental results confirm high accuracy and efficiency of method proposed in this paper.

  6. A Real-Time System for Lane Detection Based on FPGA and DSP

    NASA Astrophysics Data System (ADS)

    Xiao, Jing; Li, Shutao; Sun, Bin

    2016-12-01

    This paper presents a real-time lane detection system including edge detection and improved Hough Transform based lane detection algorithm and its hardware implementation with field programmable gate array (FPGA) and digital signal processor (DSP). Firstly, gradient amplitude and direction information are combined to extract lane edge information. Then, the information is used to determine the region of interest. Finally, the lanes are extracted by using improved Hough Transform. The image processing module of the system consists of FPGA and DSP. Particularly, the algorithms implemented in FPGA are working in pipeline and processing in parallel so that the system can run in real-time. In addition, DSP realizes lane line extraction and display function with an improved Hough Transform. The experimental results show that the proposed system is able to detect lanes under different road situations efficiently and effectively.

  7. Using input feature information to improve ultraviolet retrieval in neural networks

    NASA Astrophysics Data System (ADS)

    Sun, Zhibin; Chang, Ni-Bin; Gao, Wei; Chen, Maosi; Zempila, Melina

    2017-09-01

    In neural networks, the training/predicting accuracy and algorithm efficiency can be improved significantly via accurate input feature extraction. In this study, some spatial features of several important factors in retrieving surface ultraviolet (UV) are extracted. An extreme learning machine (ELM) is used to retrieve the surface UV of 2014 in the continental United States, using the extracted features. The results conclude that more input weights can improve the learning capacities of neural networks.

  8. Longitudinal Analysis of New Information Types in Clinical Notes

    PubMed Central

    Zhang, Rui; Pakhomov, Serguei; Melton, Genevieve B.

    2014-01-01

    It is increasingly recognized that redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous, significant, and may negatively impact the secondary use of these notes for research and patient care. We investigated several automated methods to identify redundant versus relevant new information in clinical reports. These methods may provide a valuable approach to extract clinically pertinent information and further improve the accuracy of clinical information extraction systems. In this study, we used UMLS semantic types to extract several types of new information, including problems, medications, and laboratory information. Automatically identified new information highly correlated with manual reference standard annotations. Methods to identify different types of new information can potentially help to build up more robust information extraction systems for clinical researchers as well as aid clinicians and researchers in navigating clinical notes more effectively and quickly identify information pertaining to changes in health states. PMID:25717418

  9. Feature extraction via KPCA for classification of gait patterns.

    PubMed

    Wu, Jianning; Wang, Jue; Liu, Li

    2007-06-01

    Automated recognition of gait pattern change is important in medical diagnostics as well as in the early identification of at-risk gait in the elderly. We evaluated the use of Kernel-based Principal Component Analysis (KPCA) to extract more gait features (i.e., to obtain more significant amounts of information about human movement) and thus to improve the classification of gait patterns. 3D gait data of 24 young and 24 elderly participants were acquired using an OPTOTRAK 3020 motion analysis system during normal walking, and a total of 36 gait spatio-temporal and kinematic variables were extracted from the recorded data. KPCA was used first for nonlinear feature extraction to then evaluate its effect on a subsequent classification in combination with learning algorithms such as support vector machines (SVMs). Cross-validation test results indicated that the proposed technique could allow spreading the information about the gait's kinematic structure into more nonlinear principal components, thus providing additional discriminatory information for the improvement of gait classification performance. The feature extraction ability of KPCA was affected slightly with different kernel functions as polynomial and radial basis function. The combination of KPCA and SVM could identify young-elderly gait patterns with 91% accuracy, resulting in a markedly improved performance compared to the combination of PCA and SVM. These results suggest that nonlinear feature extraction by KPCA improves the classification of young-elderly gait patterns, and holds considerable potential for future applications in direct dimensionality reduction and interpretation of multiple gait signals.

  10. Improving Information Extraction and Translation Using Component Interactions

    DTIC Science & Technology

    2008-01-01

    74 7. CASE STUDY ON MONOLINGUAL INTERACTION.....................................................................76 7.1 IMPROVING NAME TAGGING BY...interactions described above focused on the monolingual analysis pipeline. (Huang and Vogel, 2002) presented a cross-lingual joint inference example to...improve the extracted named entity translation dictionary and the entity annotation in a bilingual 22 training corpus. They used a more

  11. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction.

    PubMed

    Jonnalagadda, Siddhartha; Gonzalez, Graciela

    2010-11-13

    BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net.

  12. Semantic Information Extraction of Lanes Based on Onboard Camera Videos

    NASA Astrophysics Data System (ADS)

    Tang, L.; Deng, T.; Ren, C.

    2018-04-01

    In the field of autonomous driving, semantic information of lanes is very important. This paper proposes a method of automatic detection of lanes and extraction of semantic information from onboard camera videos. The proposed method firstly detects the edges of lanes by the grayscale gradient direction, and improves the Probabilistic Hough transform to fit them; then, it uses the vanishing point principle to calculate the lane geometrical position, and uses lane characteristics to extract lane semantic information by the classification of decision trees. In the experiment, 216 road video images captured by a camera mounted onboard a moving vehicle were used to detect lanes and extract lane semantic information. The results show that the proposed method can accurately identify lane semantics from video images.

  13. Integrating Information Extraction Agents into a Tourism Recommender System

    NASA Astrophysics Data System (ADS)

    Esparcia, Sergio; Sánchez-Anguix, Víctor; Argente, Estefanía; García-Fornes, Ana; Julián, Vicente

    Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system.

  14. A rapid extraction of landslide disaster information research based on GF-1 image

    NASA Astrophysics Data System (ADS)

    Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na

    2015-08-01

    In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.

  15. Architecture and data processing alternatives for the TSE computer. Volume 2: Extraction of topological information from an image by the Tse computer

    NASA Technical Reports Server (NTRS)

    Jones, J. R.; Bodenheimer, R. E.

    1976-01-01

    A simple programmable Tse processor organization and arithmetic operations necessary for extraction of the desired topological information are described. Hardware additions to this organization are discussed along with trade-offs peculiar to the tse computing concept. An improved organization is presented along with the complementary software for the various arithmetic operations. The performance of the two organizations is compared in terms of speed, power, and cost. Software routines developed to extract the desired information from an image are included.

  16. Morphometric information to reduce the semantic gap in the characterization of microscopic images of thyroid nodules.

    PubMed

    Macedo, Alessandra A; Pessotti, Hugo C; Almansa, Luciana F; Felipe, Joaquim C; Kimura, Edna T

    2016-07-01

    The analyses of several systems for medical-imaging processing typically support the extraction of image attributes, but do not comprise some information that characterizes images. For example, morphometry can be applied to find new information about the visual content of an image. The extension of information may result in knowledge. Subsequently, results of mappings can be applied to recognize exam patterns, thus improving the accuracy of image retrieval and allowing a better interpretation of exam results. Although successfully applied in breast lesion images, the morphometric approach is still poorly explored in thyroid lesions due to the high subjectivity thyroid examinations. This paper presents a theoretical-practical study, considering Computer Aided Diagnosis (CAD) and Morphometry, to reduce the semantic discontinuity between medical image features and human interpretation of image content. The proposed method aggregates the content of microscopic images characterized by morphometric information and other image attributes extracted by traditional object extraction algorithms. This method carries out segmentation, feature extraction, image labeling and classification. Morphometric analysis was included as an object extraction method in order to verify the improvement of its accuracy for automatic classification of microscopic images. To validate this proposal and verify the utility of morphometric information to characterize thyroid images, a CAD system was created to classify real thyroid image-exams into Papillary Cancer, Goiter and Non-Cancer. Results showed that morphometric information can improve the accuracy and precision of image retrieval and the interpretation of results in computer-aided diagnosis. For example, in the scenario where all the extractors are combined with the morphometric information, the CAD system had its best performance (70% of precision in Papillary cases). Results signalized a positive use of morphometric information from images to reduce semantic discontinuity between human interpretation and image characterization. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Analysis of Technique to Extract Data from the Web for Improved Performance

    NASA Astrophysics Data System (ADS)

    Gupta, Neena; Singh, Manish

    2010-11-01

    The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.

  18. Extracting the Information Backbone in Online System

    PubMed Central

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such “less can be more” feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency. PMID:23690946

  19. MedXN: an open source medication extraction and normalization tool for clinical text

    PubMed Central

    Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang

    2014-01-01

    Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954

  20. An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information.

    PubMed

    Kumar, Shiu; Sharma, Alok; Tsunoda, Tatsuhiko

    2017-12-28

    Common spatial pattern (CSP) has been an effective technique for feature extraction in electroencephalography (EEG) based brain computer interfaces (BCIs). However, motor imagery EEG signal feature extraction using CSP generally depends on the selection of the frequency bands to a great extent. In this study, we propose a mutual information based frequency band selection approach. The idea of the proposed method is to utilize the information from all the available channels for effectively selecting the most discriminative filter banks. CSP features are extracted from multiple overlapping sub-bands. An additional sub-band has been introduced that cover the wide frequency band (7-30 Hz) and two different types of features are extracted using CSP and common spatio-spectral pattern techniques, respectively. Mutual information is then computed from the extracted features of each of these bands and the top filter banks are selected for further processing. Linear discriminant analysis is applied to the features extracted from each of the filter banks. The scores are fused together, and classification is done using support vector machine. The proposed method is evaluated using BCI Competition III dataset IVa, BCI Competition IV dataset I and BCI Competition IV dataset IIb, and it outperformed all other competing methods achieving the lowest misclassification rate and the highest kappa coefficient on all three datasets. Introducing a wide sub-band and using mutual information for selecting the most discriminative sub-bands, the proposed method shows improvement in motor imagery EEG signal classification.

  1. Research on Optimal Observation Scale for Damaged Buildings after Earthquake Based on Optimal Feature Space

    NASA Astrophysics Data System (ADS)

    Chen, J.; Chen, W.; Dou, A.; Li, W.; Sun, Y.

    2018-04-01

    A new information extraction method of damaged buildings rooted in optimal feature space is put forward on the basis of the traditional object-oriented method. In this new method, ESP (estimate of scale parameter) tool is used to optimize the segmentation of image. Then the distance matrix and minimum separation distance of all kinds of surface features are calculated through sample selection to find the optimal feature space, which is finally applied to extract the image of damaged buildings after earthquake. The overall extraction accuracy reaches 83.1 %, the kappa coefficient 0.813. The new information extraction method greatly improves the extraction accuracy and efficiency, compared with the traditional object-oriented method, and owns a good promotional value in the information extraction of damaged buildings. In addition, the new method can be used for the information extraction of different-resolution images of damaged buildings after earthquake, then to seek the optimal observation scale of damaged buildings through accuracy evaluation. It is supposed that the optimal observation scale of damaged buildings is between 1 m and 1.2 m, which provides a reference for future information extraction of damaged buildings.

  2. eGARD: Extracting associations between genomic anomalies and drug responses from text

    PubMed Central

    Rao, Shruti; McGarvey, Peter; Wu, Cathy; Madhavan, Subha; Vijay-Shanker, K.

    2017-01-01

    Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials. PMID:29261751

  3. Automated generation of individually customized visualizations of diagnosis-specific medical information using novel techniques of information extraction

    NASA Astrophysics Data System (ADS)

    Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang

    2005-04-01

    Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.

  4. Acquiring 3-D information about thick objects from differential interference contrast images using texture extraction

    NASA Astrophysics Data System (ADS)

    Sierra, Heidy; Brooks, Dana; Dimarzio, Charles

    2010-07-01

    The extraction of 3-D morphological information about thick objects is explored in this work. We extract this information from 3-D differential interference contrast (DIC) images by applying a texture detection method. Texture extraction methods have been successfully used in different applications to study biological samples. A 3-D texture image is obtained by applying a local entropy-based texture extraction method. The use of this method to detect regions of blastocyst mouse embryos that are used in assisted reproduction techniques such as in vitro fertilization is presented as an example. Results demonstrate the potential of using texture detection methods to improve morphological analysis of thick samples, which is relevant to many biomedical and biological studies. Fluorescence and optical quadrature microscope phase images are used for validation.

  5. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    NASA Astrophysics Data System (ADS)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-04-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.

  6. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters

    PubMed Central

    Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei

    2016-01-01

    Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme. PMID:27362762

  7. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters.

    PubMed

    Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei

    2016-01-01

    Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme.

  8. An anaesthesia information management system as a tool for a quality assurance program: 10years of experience.

    PubMed

    Motamed, Cyrus; Bourgain, Jean Louis

    2016-06-01

    Anaesthesia Information Management Systems (AIMS) generate large amounts of data, which might be useful for quality assurance programs. This study was designed to highlight the multiple contributions of our AIMS system in extracting quality indicators over a period of 10years. The study was conducted from 2002 to 2011. Two methods were used to extract anaesthesia indicators: the manual extraction of individual files for monitoring neuromuscular relaxation and structured query language (SQL) extraction for other indicators which were postoperative nausea and vomiting (PONV), pain, sedation scores, pain-related medications, scores and postoperative hypothermia. For each indicator, a program of information/meetings and adaptation/suggestions for operating room and PACU personnel was initiated to improve quality assurance, while data were extracted each year. The study included 77,573 patients. The mean overall completeness of data for the initial years ranged from 55 to 85% and was indicator-dependent, which then improved to 95% completeness for the last 5years. The incidence of neuromuscular monitoring was initially 67% and then increased to 95% (P<0.05). The rate of pharmacological reversal remained around 53% throughout the study. Regarding SQL data, an improvement of severe postoperative pain and PONV scores was observed throughout the study, while mild postoperative hypothermia remained a challenge, despite efforts for improvement. The AIMS system permitted the follow-up of certain indicators through manual sampling and many more via SQL extraction in a sustained and non-time-consuming way across years. However, it requires competent and especially dedicated resources to handle the database. Copyright © 2016 Société française d'anesthésie et de réanimation (Sfar). Published by Elsevier Masson SAS. All rights reserved.

  9. Support patient search on pathology reports with interactive online learning based data extraction.

    PubMed

    Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng

    2015-01-01

    Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.

  10. Improving the Accuracy of Attribute Extraction using the Relatedness between Attribute Values

    NASA Astrophysics Data System (ADS)

    Bollegala, Danushka; Tani, Naoki; Ishizuka, Mitsuru

    Extracting attribute-values related to entities from web texts is an important step in numerous web related tasks such as information retrieval, information extraction, and entity disambiguation (namesake disambiguation). For example, for a search query that contains a personal name, we can not only return documents that contain that personal name, but if we have attribute-values such as the organization for which that person works, we can also suggest documents that contain information related to that organization, thereby improving the user's search experience. Despite numerous potential applications of attribute extraction, it remains a challenging task due to the inherent noise in web data -- often a single web page contains multiple entities and attributes. We propose a graph-based approach to select the correct attribute-values from a set of candidate attribute-values extracted for a particular entity. First, we build an undirected weighted graph in which, attribute-values are represented by nodes, and the edge that connects two nodes in the graph represents the degree of relatedness between the corresponding attribute-values. Next, we find the maximum spanning tree of this graph that connects exactly one attribute-value for each attribute-type. The proposed method outperforms previously proposed attribute extraction methods on a dataset that contains 5000 web pages.

  11. Terrain Extraction by Integrating Terrestrial Laser Scanner Data and Spectral Information

    NASA Astrophysics Data System (ADS)

    Lau, C. L.; Halim, S.; Zulkepli, M.; Azwan, A. M.; Tang, W. L.; Chong, A. K.

    2015-10-01

    The extraction of true terrain points from unstructured laser point cloud data is an important process in order to produce an accurate digital terrain model (DTM). However, most of these spatial filtering methods just utilizing the geometrical data to discriminate the terrain points from nonterrain points. The point cloud filtering method also can be improved by using the spectral information available with some scanners. Therefore, the objective of this study is to investigate the effectiveness of using the three-channel (red, green and blue) of the colour image captured from built-in digital camera which is available in some Terrestrial Laser Scanner (TLS) for terrain extraction. In this study, the data acquisition was conducted at a mini replica landscape in Universiti Teknologi Malaysia (UTM), Skudai campus using Leica ScanStation C10. The spectral information of the coloured point clouds from selected sample classes are extracted for spectral analysis. The coloured point clouds which within the corresponding preset spectral threshold are identified as that specific feature point from the dataset. This process of terrain extraction is done through using developed Matlab coding. Result demonstrates that a higher spectral resolution passive image is required in order to improve the output. This is because low quality of the colour images captured by the sensor contributes to the low separability in spectral reflectance. In conclusion, this study shows that, spectral information is capable to be used as a parameter for terrain extraction.

  12. Information Graphic Classification, Decomposition and Alternative Representation

    ERIC Educational Resources Information Center

    Gao, Jinglun

    2012-01-01

    This thesis work is mainly focused on two problems related to improving accessibility of information graphics for visually impaired users. The first problem is automated analysis of information graphics for information extraction and the second problem is multi-modal representations for accessibility. Information graphics are graphical…

  13. Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

    PubMed Central

    Zheng, Shuai; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A

    2017-01-01

    Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. PMID:28487265

  14. A extract method of mountainous area settlement place information from GF-1 high resolution optical remote sensing image under semantic constraints

    NASA Astrophysics Data System (ADS)

    Guo, H., II

    2016-12-01

    Spatial distribution information of mountainous area settlement place is of great significance to the earthquake emergency work because most of the key earthquake hazardous areas of china are located in the mountainous area. Remote sensing has the advantages of large coverage and low cost, it is an important way to obtain the spatial distribution information of mountainous area settlement place. At present, fully considering the geometric information, spectral information and texture information, most studies have applied object-oriented methods to extract settlement place information, In this article, semantic constraints is to be added on the basis of object-oriented methods. The experimental data is one scene remote sensing image of domestic high resolution satellite (simply as GF-1), with a resolution of 2 meters. The main processing consists of 3 steps, the first is pretreatment, including ortho rectification and image fusion, the second is Object oriented information extraction, including Image segmentation and information extraction, the last step is removing the error elements under semantic constraints, in order to formulate these semantic constraints, the distribution characteristics of mountainous area settlement place must be analyzed and the spatial logic relation between settlement place and other objects must be considered. The extraction accuracy calculation result shows that the extraction accuracy of object oriented method is 49% and rise up to 86% after the use of semantic constraints. As can be seen from the extraction accuracy, the extract method under semantic constraints can effectively improve the accuracy of mountainous area settlement place information extraction. The result shows that it is feasible to extract mountainous area settlement place information form GF-1 image, so the article proves that it has a certain practicality to use domestic high resolution optical remote sensing image in earthquake emergency preparedness.

  15. Structured reports of videofluoroscopic swallowing studies have the potential to improve overall report quality compared to free text reports.

    PubMed

    Schoeppe, Franziska; Sommer, Wieland H; Haack, Mareike; Havel, Miriam; Rheinwald, Marika; Wechtenbruch, Juliane; Fischer, Martin R; Meinel, Felix G; Sabel, Bastian O; Sommer, Nora N

    2018-01-01

    To compare free text (FTR) and structured reports (SR) of videofluoroscopic swallowing studies (VFSS) and evaluate satisfaction of referring otolaryngologists and speech therapists. Both standard FTR and SR of 26 patients with VFSS were acquired. A dedicated template focusing on oropharyngeal phases was created for SR using online software with clickable decision-trees and concomitant generation of semantically structured reports. All reports were evaluated regarding overall quality and content, information extraction and clinical decision support (10-point Likert scale (0 = I completely disagree, 10 = I completely agree)). Two otorhinolaryngologists and two speech therapists evaluated FTR and SR. SR received better ratings than FTR in all items. SR were perceived to contain more details on the swallowing phases (median rating: 10 vs. 5; P < 0.001), penetration and aspiration (10 vs. 5; P < 0.001) and facilitated information extraction compared to FTR (10 vs. 4; P < 0.001). Overall quality was rated significantly higher in SR than FTR (P < 0.001). SR of VFSS provide more detailed information and facilitate information extraction. SR better assist in clinical decision-making, might enhance the quality of the report and, thus, are recommended for the evaluation of VFSS. • Structured reports on videofluoroscopic exams of deglutition lead to improved report quality. • Information extraction is facilitated when using structured reports based on decision trees. • Template-based reports add more value to clinical decision-making than free text reports. • Structured reports receive better ratings by speech therapists and otolaryngologists. • Structured reports on videofluoroscopic exams may improve the comparability between exams.

  16. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    PubMed

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  17. Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease

    PubMed Central

    Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.

    1998-01-01

    The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.

  18. Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians' Patient Care Decisions.

    PubMed

    Kim, Heejun; Bian, Jiantao; Mostafa, Javed; Jonnalagadda, Siddhartha; Del Fiol, Guilherme

    2016-01-01

    Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians.

  19. Fully Convolutional Network Based Shadow Extraction from GF-2 Imagery

    NASA Astrophysics Data System (ADS)

    Li, Z.; Cai, G.; Ren, H.

    2018-04-01

    There are many shadows on the high spatial resolution satellite images, especially in the urban areas. Although shadows on imagery severely affect the information extraction of land cover or land use, they provide auxiliary information for building extraction which is hard to achieve a satisfactory accuracy through image classification itself. This paper focused on the method of building shadow extraction by designing a fully convolutional network and training samples collected from GF-2 satellite imagery in the urban region of Changchun city. By means of spatial filtering and calculation of adjacent relationship along the sunlight direction, the small patches from vegetation or bridges have been eliminated from the preliminary extracted shadows. Finally, the building shadows were separated. The extracted building shadow information from the proposed method in this paper was compared with the results from the traditional object-oriented supervised classification algorihtms. It showed that the deep learning network approach can improve the accuracy to a large extent.

  20. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

    PubMed Central

    Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

    2016-01-01

    Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. PMID:26911818

  1. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.

    PubMed

    Tahsin, Tasnia; Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

    2016-09-01

    The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Extraction of edge-based and region-based features for object recognition

    NASA Astrophysics Data System (ADS)

    Coutts, Benjamin; Ravi, Srinivas; Hu, Gongzhu; Shrikhande, Neelima

    1993-08-01

    One of the central problems of computer vision is object recognition. A catalogue of model objects is described as a set of features such as edges and surfaces. The same features are extracted from the scene and matched against the models for object recognition. Edges and surfaces extracted from the scenes are often noisy and imperfect. In this paper algorithms are described for improving low level edge and surface features. Existing edge extraction algorithms are applied to the intensity image to obtain edge features. Initial edges are traced by following directions of the current contour. These are improved by using corresponding depth and intensity information for decision making at branch points. Surface fitting routines are applied to the range image to obtain planar surface patches. An algorithm of region growing is developed that starts with a coarse segmentation and uses quadric surface fitting to iteratively merge adjacent regions into quadric surfaces based on approximate orthogonal distance regression. Surface information obtained is returned to the edge extraction routine to detect and remove fake edges. This process repeats until no more merging or edge improvement can take place. Both synthetic (with Gaussian noise) and real images containing multiple object scenes have been tested using the merging criteria. Results appeared quite encouraging.

  3. Developing a hybrid dictionary-based bio-entity recognition technique.

    PubMed

    Song, Min; Yu, Hwanjo; Han, Wook-Shin

    2015-01-01

    Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.

  4. Developing a hybrid dictionary-based bio-entity recognition technique

    PubMed Central

    2015-01-01

    Background Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. Methods This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. Results The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. Conclusions The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall. PMID:26043907

  5. Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

    PubMed

    Zheng, Shuai; Lu, James J; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A; Wang, Fusheng

    2017-05-09

    Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. ©Shuai Zheng, James J Lu, Nima Ghasemzadeh, Salim S Hayek, Arshed A Quyyumi, Fusheng Wang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 09.05.2017.

  6. An effective image classification method with the fusion of invariant feature and a new color descriptor

    NASA Astrophysics Data System (ADS)

    Mansourian, Leila; Taufik Abdullah, Muhamad; Nurliyana Abdullah, Lili; Azman, Azreen; Mustaffa, Mas Rina

    2017-02-01

    Pyramid Histogram of Words (PHOW), combined Bag of Visual Words (BoVW) with the spatial pyramid matching (SPM) in order to add location information to extracted features. However, different PHOW extracted from various color spaces, and they did not extract color information individually, that means they discard color information, which is an important characteristic of any image that is motivated by human vision. This article, concatenated PHOW Multi-Scale Dense Scale Invariant Feature Transform (MSDSIFT) histogram and a proposed Color histogram to improve the performance of existing image classification algorithms. Performance evaluation on several datasets proves that the new approach outperforms other existing, state-of-the-art methods.

  7. Comparison of Three Information Sources for Smoking Information in Electronic Health Records

    PubMed Central

    Wang, Liwei; Ruan, Xiaoyang; Yang, Ping; Liu, Hongfang

    2016-01-01

    OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. PMID:27980387

  8. Advanced image collection, information extraction, and change detection in support of NN-20 broad area search and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Petrie, G.M.; Perry, E.M.; Kirkham, R.R.

    1997-09-01

    This report describes the work performed at the Pacific Northwest National Laboratory (PNNL) for the U.S. Department of Energy`s Office of Nonproliferation and National Security, Office of Research and Development (NN-20). The work supports the NN-20 Broad Area Search and Analysis, a program initiated by NN-20 to improve the detection and classification of undeclared weapons facilities. Ongoing PNNL research activities are described in three main components: image collection, information processing, and change analysis. The Multispectral Airborne Imaging System, which was developed to collect georeferenced imagery in the visible through infrared regions of the spectrum, and flown on a light aircraftmore » platform, will supply current land use conditions. The image information extraction software (dynamic clustering and end-member extraction) uses imagery, like the multispectral data collected by the PNNL multispectral system, to efficiently generate landcover information. The advanced change detection uses a priori (benchmark) information, current landcover conditions, and user-supplied rules to rank suspect areas by probable risk of undeclared facilities or proliferation activities. These components, both separately and combined, provide important tools for improving the detection of undeclared facilities.« less

  9. Research on improved edge extraction algorithm of rectangular piece

    NASA Astrophysics Data System (ADS)

    He, Yi-Bin; Zeng, Ya-Jun; Chen, Han-Xin; Xiao, San-Xia; Wang, Yan-Wei; Huang, Si-Yu

    Traditional edge detection operators such as Prewitt operator, LOG operator and Canny operator, etc. cannot meet the requirements of the modern industrial measurement. This paper proposes a kind of image edge detection algorithm based on improved morphological gradient. It can be detect the image using structural elements, which deals with the characteristic information of the image directly. Choosing different shapes and sizes of structural elements to use together, the ideal image edge information can be detected. The experimental result shows that the algorithm can well extract image edge with noise, which is clearer, and has more detailed edges compared with the previous edge detection algorithm.

  10. A novel speech-processing strategy incorporating tonal information for cochlear implants.

    PubMed

    Lan, N; Nie, K B; Gao, S K; Zeng, F G

    2004-05-01

    Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.

  11. Using Information from the Electronic Health Record to Improve Measurement of Unemployment in Service Members and Veterans with mTBI and Post-Deployment Stress

    PubMed Central

    Dillahunt-Aspillaga, Christina; Finch, Dezon; Massengale, Jill; Kretzmer, Tracy; Luther, Stephen L.; McCart, James A.

    2014-01-01

    Objective The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). Design Retrospective cohort study using data from selected progress notes stored in the EHR. Setting Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. Participants Service members and Veterans with TBI who participated in the PREP program (N = 60). Main Outcome Measures Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. Results Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. Conclusions Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort. PMID:25541956

  12. 78 FR 59053 - Agency Information Collection Activities: Notice of an Extension of an Information Collection...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-25

    ... ecosystems, and human and wildlife health. These invaders extract a huge cost. The current annual..., monitoring of invading populations; improving understanding of the ecology of invaders and factors in the...

  13. Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians’ Patient Care Decisions

    PubMed Central

    Kim, Heejun; Bian, Jiantao; Mostafa, Javed; Jonnalagadda, Siddhartha; Del Fiol, Guilherme

    2016-01-01

    Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians. PMID:28269867

  14. Automated extraction of family history information from clinical notes.

    PubMed

    Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S; Winden, Tamara J; Carter, Elizabeth W; Melton, Genevieve B

    2014-01-01

    Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication ("indicator phrases"), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications.

  15. Automated Extraction of Family History Information from Clinical Notes

    PubMed Central

    Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S.; Winden, Tamara J.; Carter, Elizabeth W.; Melton, Genevieve B.

    2014-01-01

    Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication (“indicator phrases”), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications. PMID:25954443

  16. Thinking graphically: Connecting vision and cognition during graph comprehension.

    PubMed

    Ratwani, Raj M; Trafton, J Gregory; Boehm-Davis, Deborah A

    2008-03-01

    Task analytic theories of graph comprehension account for the perceptual and conceptual processes required to extract specific information from graphs. Comparatively, the processes underlying information integration have received less attention. We propose a new framework for information integration that highlights visual integration and cognitive integration. During visual integration, pattern recognition processes are used to form visual clusters of information; these visual clusters are then used to reason about the graph during cognitive integration. In 3 experiments, the processes required to extract specific information and to integrate information were examined by collecting verbal protocol and eye movement data. Results supported the task analytic theories for specific information extraction and the processes of visual and cognitive integration for integrative questions. Further, the integrative processes scaled up as graph complexity increased, highlighting the importance of these processes for integration in more complex graphs. Finally, based on this framework, design principles to improve both visual and cognitive integration are described. PsycINFO Database Record (c) 2008 APA, all rights reserved

  17. Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications

    NASA Technical Reports Server (NTRS)

    Edwards, David E.; Haimes, Robert

    1999-01-01

    An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.

  18. Perceptual learning modules in mathematics: enhancing students' pattern recognition, structure extraction, and fluency.

    PubMed

    Kellman, Philip J; Massey, Christine M; Son, Ji Y

    2010-04-01

    Learning in educational settings emphasizes declarative and procedural knowledge. Studies of expertise, however, point to other crucial components of learning, especially improvements produced by experience in the extraction of information: perceptual learning (PL). We suggest that such improvements characterize both simple sensory and complex cognitive, even symbolic, tasks through common processes of discovery and selection. We apply these ideas in the form of perceptual learning modules (PLMs) to mathematics learning. We tested three PLMs, each emphasizing different aspects of complex task performance, in middle and high school mathematics. In the MultiRep PLM, practice in matching function information across multiple representations improved students' abilities to generate correct graphs and equations from word problems. In the Algebraic Transformations PLM, practice in seeing equation structure across transformations (but not solving equations) led to dramatic improvements in the speed of equation solving. In the Linear Measurement PLM, interactive trials involving extraction of information about units and lengths produced successful transfer to novel measurement problems and fraction problem solving. Taken together, these results suggest (a) that PL techniques have the potential to address crucial, neglected dimensions of learning, including discovery and fluent processing of relations; (b) PL effects apply even to complex tasks that involve symbolic processing; and (c) appropriately designed PL technology can produce rapid and enduring advances in learning. Copyright © 2009 Cognitive Science Society, Inc.

  19. The LifeWatch approach to the exploration of distributed species information

    PubMed Central

    Fuentes, Daniel; Fiore, Nicola

    2014-01-01

    Abstract This paper introduces a new method of automatically extracting, integrating and presenting information regarding species from the most relevant online taxonomic resources. First, the information is extracted and joined using data wrappers and integration solutions. Then, an analytical tool is used to provide a visual representation of the data. The information is then integrated into a user friendly content management system. The proposal has been implemented using data from the Global Biodiversity Information Facility (GBIF), the Catalogue of Life (CoL), the World Register of Marine Species (WoRMS), the Integrated Taxonomic Information System (ITIS) and the Global Names Index (GNI). The approach improves data quality, avoiding taxonomic and nomenclature errors whilst increasing the availability and accessibility of the information. PMID:25589865

  20. Applications of bioactive material from snakehead fish (Channa striata) for repairing of learning-memory capability and motoric activity: a case study of physiological aging and aging-caused oxidative stress in rats

    NASA Astrophysics Data System (ADS)

    Sunarno, Sunarno; Muflichatun Mardiati, Siti; Rahadian, Rully

    2018-05-01

    Physiological aging and aging due to oxidative stress are a major factor cause accelerated brain aging. Aging is characterized by a decrease of brain function of the hippocampus which is linked to the decline in the capability of learning-memory and motoric activity. The objective of this research is to obtain the important information about the mechanisms of brain antiaging associated with the improvement of hippocampus function, which includes aspects of learning-memory capability and motoric activity as well as mitochondrial ultrastructure profile of hippocampus cornu ammonis cells after treated by fish snakehead fish extract. Snakehead fish in Rawa Pening Semarang District allegedly holds the potential of endemic, which contains bioactive antiaging material that can prevent aging or improve the function of the hippocampus. This research has been conducted using a completely randomized design consisting of four treatments with five replications. The treatments were including rats with physiological aging or aging due to oxidative stress which was treated and without treated with meat extract of snakehead fish. The research was divided into two stages, i.e., determining of learning-memory capability, and determining motoric activity. The measured-parameters are time response to find feed, distance travel, time stereotypes, ambulatory time, and resting time. The result showed that the snakehead fish meat extract might improve function hippocampus, both in physiological aging or aging due to oxidative stress. The capability of learning and memory showed that the rats in both conditions of aging after getting treatment of meat extract of snakehead fish could get a feed in the fourth arm maze faster than rats untreated snakehead fish meat extract. Similarly, the measurement of the distance traveled, time stereotypes, ambulatory time, and resting time showed that rats which received treatment of meat extract of snakehead fish were better than the untreated rats. To conclude, the meat extract of snakehead fish can be used as antiaging material to improve the function of the hippocampus, to improve the capability of learning and memory, to improve motoric activity, and to prevent aging. These findings are expected to provide comprehensive information for the development of antiaging research as an effort to improve public health and to improve learning-memory capability and motoric activity.

  1. Drug drug interaction extraction from the literature using a recursive neural network

    PubMed Central

    Lim, Sangrak; Lee, Kyubum

    2018-01-01

    Detecting drug-drug interactions (DDI) is important because information on DDIs can help prevent adverse effects from drug combinations. Since there are many new DDI-related papers published in the biomedical domain, manually extracting DDI information from the literature is a laborious task. However, text mining can be used to find DDIs in the biomedical literature. Among the recently developed neural networks, we use a Recursive Neural Network to improve the performance of DDI extraction. Our recursive neural network model uses a position feature, a subtree containment feature, and an ensemble method to improve the performance of DDI extraction. Compared with the state-of-the-art models, the DDI detection and type classifiers of our model performed 4.4% and 2.8% better, respectively, on the DDIExtraction Challenge’13 test data. We also validated our model on the PK DDI corpus that consists of two types of DDIs data: in vivo DDI and in vitro DDI. Compared with the existing model, our detection classifier performed 2.3% and 6.7% better on in vivo and in vitro data respectively. The results of our validation demonstrate that our model can automatically extract DDIs better than existing models. PMID:29373599

  2. A research of road centerline extraction algorithm from high resolution remote sensing images

    NASA Astrophysics Data System (ADS)

    Zhang, Yushan; Xu, Tingfa

    2017-09-01

    Satellite remote sensing technology has become one of the most effective methods for land surface monitoring in recent years, due to its advantages such as short period, large scale and rich information. Meanwhile, road extraction is an important field in the applications of high resolution remote sensing images. An intelligent and automatic road extraction algorithm with high precision has great significance for transportation, road network updating and urban planning. The fuzzy c-means (FCM) clustering segmentation algorithms have been used in road extraction, but the traditional algorithms did not consider spatial information. An improved fuzzy C-means clustering algorithm combined with spatial information (SFCM) is proposed in this paper, which is proved to be effective for noisy image segmentation. Firstly, the image is segmented using the SFCM. Secondly, the segmentation result is processed by mathematical morphology to remover the joint region. Thirdly, the road centerlines are extracted by morphology thinning and burr trimming. The average integrity of the centerline extraction algorithm is 97.98%, the average accuracy is 95.36% and the average quality is 93.59%. Experimental results show that the proposed method in this paper is effective for road centerline extraction.

  3. The Design of Case Products’ Shape Form Information Database Based on NURBS Surface

    NASA Astrophysics Data System (ADS)

    Liu, Xing; Liu, Guo-zhong; Xu, Nuo-qi; Zhang, Wei-she

    2017-07-01

    In order to improve the computer design of product shape design,applying the Non-uniform Rational B-splines(NURBS) of curves and surfaces surface to the representation of the product shape helps designers to design the product effectively.On the basis of the typical product image contour extraction and using Pro/Engineer(Pro/E) to extract the geometric feature of scanning mold,in order to structure the information data base system of value point,control point and node vector parameter information,this paper put forward a unified expression method of using NURBS curves and surfaces to describe products’ geometric shape and using matrix laboratory(MATLAB) to simulate when products have the same or similar function.A case study of electric vehicle’s front cover illustrates the access process of geometric shape information of case product in this paper.This method can not only greatly reduce the capacity of information debate,but also improve the effectiveness of computer aided geometric innovation modeling.

  4. Extracting the Textual and Temporal Structure of Supercomputing Logs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jain, S; Singh, I; Chandra, A

    2009-05-26

    Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an onlinemore » clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.« less

  5. Tags Extarction from Spatial Documents in Search Engines

    NASA Astrophysics Data System (ADS)

    Borhaninejad, S.; Hakimpour, F.; Hamzei, E.

    2015-12-01

    Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.

  6. Building an automated SOAP classifier for emergency department reports.

    PubMed

    Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

    2012-02-01

    Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.

  7. Interventions to improve patient comprehension in informed consent for medical and surgical procedures: a systematic review.

    PubMed

    Schenker, Yael; Fernandez, Alicia; Sudore, Rebecca; Schillinger, Dean

    2011-01-01

    Patient understanding in clinical informed consent is often poor. Little is known about the effectiveness of interventions to improve comprehension or the extent to which such interventions address different elements of understanding in informed consent. . To systematically review communication interventions to improve patient comprehension in informed consent for medical and surgical procedures. Data Sources. A systematic literature search of English-language articles in MEDLINE (1949-2008) and EMBASE (1974-2008) was performed. In addition, a published bibliography of empirical research on informed consent and the reference lists of all eligible studies were reviewed. Study Selection. Randomized controlled trials and controlled trials with nonrandom allocation were included if they compared comprehension in informed consent for a medical or surgical procedure. Only studies that used a quantitative, objective measure of understanding were included. All studies addressed informed consent for a needed or recommended procedure in actual patients. Data Extraction. Reviewers independently extracted data using a standardized form. All results were compared, and disagreements were resolved by consensus. Data Synthesis. Forty-four studies were eligible. Intervention categories included written information, audiovisual/multimedia, extended discussions, and test/feedback techniques. The majority of studies assessed patient understanding of procedural risks; other elements included benefits, alternatives, and general knowledge about the procedure. Only 6 of 44 studies assessed all 4 elements of understanding. Interventions were generally effective in improving patient comprehension, especially regarding risks and general knowledge. Limitations. Many studies failed to include adequate description of the study population, and outcome measures varied widely. . A wide range of communication interventions improve comprehension in clinical informed consent. Decisions to enhance informed consent should consider the importance of different elements of understanding, beyond procedural risks, as well as feasibility and acceptability of the intervention to clinicians and patients. Conceptual clarity regarding the key elements of informed consent knowledge will help to focus improvements and standardize evaluations.

  8. Interventions to Improve Patient Comprehension in Informed Consent for Medical and Surgical Procedures: A Systematic Review

    PubMed Central

    Schenker, Yael; Fernandez, Alicia; Sudore, Rebecca; Schillinger, Dean

    2017-01-01

    Background Patient understanding in clinical informed consent is often poor. Little is known about the effectiveness of interventions to improve comprehension or the extent to which such interventions address different elements of understanding in informed consent. Purpose To systematically review communication interventions to improve patient comprehension in informed consent for medical and surgical procedures. Data Sources A systematic literature search of English-language articles in MEDLINE (1949–2008) and EMBASE (1974–2008) was performed. In addition, a published bibliography of empirical research on informed consent and the reference lists of all eligible studies were reviewed. Study Selection Randomized controlled trials and controlled trials with non-random allocation were included if they compared comprehension in informed consent for a medical or surgical procedure. Only studies that used a quantitative, objective measure of understanding were included. All studies addressed informed consent for a needed or recommended procedure in actual patients. Data Extraction Reviewers independently extracted data using a standardized form. All results were compared, and disagreements were resolved by consensus. Data Synthesis Forty-four studies were eligible. Intervention categories included written information, audiovisual/multimedia, extended discussions, and test/feedback techniques. The majority of studies assessed patient understanding of procedural risks; other elements included benefits, alternatives, and general knowledge about the procedure. Only 6 of 44 studies assessed all 4 elements of understanding. Interventions were generally effective in improving patient comprehension, especially regarding risks and general knowledge. Limitations Many studies failed to include adequate description of the study population, and outcome measures varied widely. Conclusions A wide range of communication interventions improve comprehension in clinical informed consent. Decisions to enhance informed consent should consider the importance of different elements of understanding, beyond procedural risks, as well as feasibility and acceptability of the intervention to clinicians and patients. Conceptual clarity regarding the key elements of informed consent knowledge will help to focus improvements and standardize evaluations. PMID:20357225

  9. Automated extraction of radiation dose information for CT examinations.

    PubMed

    Cook, Tessa S; Zimmerman, Stefan; Maidment, Andrew D A; Kim, Woojin; Boonn, William W

    2010-11-01

    Exposure to radiation as a result of medical imaging is currently in the spotlight, receiving attention from Congress as well as the lay press. Although scanner manufacturers are moving toward including effective dose information in the Digital Imaging and Communications in Medicine headers of imaging studies, there is a vast repository of retrospective CT data at every imaging center that stores dose information in an image-based dose sheet. As such, it is difficult for imaging centers to participate in the ACR's Dose Index Registry. The authors have designed an automated extraction system to query their PACS archive and parse CT examinations to extract the dose information stored in each dose sheet. First, an open-source optical character recognition program processes each dose sheet and converts the information to American Standard Code for Information Interchange (ASCII) text. Each text file is parsed, and radiation dose information is extracted and stored in a database which can be queried using an existing pathology and radiology enterprise search tool. Using this automated extraction pipeline, it is possible to perform dose analysis on the >800,000 CT examinations in the PACS archive and generate dose reports for all of these patients. It is also possible to more effectively educate technologists, radiologists, and referring physicians about exposure to radiation from CT by generating report cards for interpreted and performed studies. The automated extraction pipeline enables compliance with the ACR's reporting guidelines and greater awareness of radiation dose to patients, thus resulting in improved patient care and management. Copyright © 2010 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  10. Automatic seizure detection based on the combination of newborn multi-channel EEG and HRV information

    NASA Astrophysics Data System (ADS)

    Mesbah, Mostefa; Balakrishnan, Malarvili; Colditz, Paul B.; Boashash, Boualem

    2012-12-01

    This article proposes a new method for newborn seizure detection that uses information extracted from both multi-channel electroencephalogram (EEG) and a single channel electrocardiogram (ECG). The aim of the study is to assess whether additional information extracted from ECG can improve the performance of seizure detectors based solely on EEG. Two different approaches were used to combine this extracted information. The first approach, known as feature fusion, involves combining features extracted from EEG and heart rate variability (HRV) into a single feature vector prior to feeding it to a classifier. The second approach, called classifier or decision fusion, is achieved by combining the independent decisions of the EEG and the HRV-based classifiers. Tested on recordings obtained from eight newborns with identified EEG seizures, the proposed neonatal seizure detection algorithms achieved 95.20% sensitivity and 88.60% specificity for the feature fusion case and 95.20% sensitivity and 94.30% specificity for the classifier fusion case. These results are considerably better than those involving classifiers using EEG only (80.90%, 86.50%) or HRV only (85.70%, 84.60%).

  11. Extraction of land cover change information from ENVISAT-ASAR data in Chengdu Plain

    NASA Astrophysics Data System (ADS)

    Xu, Wenbo; Fan, Jinlong; Huang, Jianxi; Tian, Yichen; Zhang, Yong

    2006-10-01

    Land cover data are essential to most global change research objectives, including the assessment of current environmental conditions and the simulation of future environmental scenarios that ultimately lead to public policy development. Chinese Academy of Sciences generated a nationwide land cover database in order to carry out the quantification and spatial characterization of land use/cover changes (LUCC) in 1990s. In order to improve the reliability of the database, we will update the database anytime. But it is difficult to obtain remote sensing data to extract land cover change information in large-scale. It is hard to acquire optical remote sensing data in Chengdu plain, so the objective of this research was to evaluate multitemporal ENVISAT advanced synthetic aperture radar (ASAR) data for extracting land cover change information. Based on the fieldwork and the nationwide 1:100000 land cover database, the paper assesses several land cover changes in Chengdu plain, for example: crop to buildings, forest to buildings, and forest to bare land. The results show that ENVISAT ASAR data have great potential for the applications of extracting land cover change information.

  12. Real time system design of motor imagery brain-computer interface based on multi band CSP and SVM

    NASA Astrophysics Data System (ADS)

    Zhao, Li; Li, Xiaoqin; Bian, Yan

    2018-04-01

    Motion imagery (MT) is an effective method to promote the recovery of limbs in patients after stroke. Though an online MT brain computer interface (BCT) system, which apply MT, can enhance the patient's participation and accelerate their recovery process. The traditional method deals with the electroencephalogram (EEG) induced by MT by common spatial pattern (CSP), which is used to extract information from a frequency band. Tn order to further improve the classification accuracy of the system, information of two characteristic frequency bands is extracted. The effectiveness of the proposed feature extraction method is verified by off-line analysis of competition data and the analysis of online system.

  13. Extracting Information from Narratives: An Application to Aviation Safety Reports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Posse, Christian; Matzke, Brett D.; Anderson, Catherine M.

    2005-05-12

    Aviation safety reports are the best available source of information about why a flight incident happened. However, stream of consciousness permeates the narratives making difficult the automation of the information extraction task. We propose an approach and infrastructure based on a common pattern specification language to capture relevant information via normalized template expression matching in context. Template expression matching handles variants of multi-word expressions. Normalization improves the likelihood of correct hits by standardizing and cleaning the vocabulary used in narratives. Checking for the presence of negative modifiers in the proximity of a potential hit reduces the chance of false hits.more » We present the above approach in the context of a specific application, which is the extraction of human performance factors from NASA ASRS reports. While knowledge infusion from experts plays a critical role during the learning phase, early results show that in a production mode, the automated process provides information that is consistent with analyses by human subjects.« less

  14. Information Retrieval Using Hadoop Big Data Analysis

    NASA Astrophysics Data System (ADS)

    Motwani, Deepak; Madan, Madan Lal

    This paper concern on big data analysis which is the cognitive operation of probing huge amounts of information in an attempt to get uncovers unseen patterns. Through Big Data Analytics Applications such as public and private organization sectors have formed a strategic determination to turn big data into cut throat benefit. The primary occupation of extracting value from big data give rise to a process applied to pull information from multiple different sources; this process is known as extract transforms and lode. This paper approach extract information from log files and Research Paper, awareness reduces the efforts for blueprint finding and summarization of document from several positions. The work is able to understand better Hadoop basic concept and increase the user experience for research. In this paper, we propose an approach for analysis log files for finding concise information which is useful and time saving by using Hadoop. Our proposed approach will be applied on different research papers on a specific domain and applied for getting summarized content for further improvement and make the new content.

  15. Integration of forward-looking infrared (FLIR) and traffic information for moving obstacle detection with integrity

    NASA Astrophysics Data System (ADS)

    Zhu, Zhen; Vana, Sudha; Bhattacharya, Sumit; Uijt de Haag, Maarten

    2009-05-01

    This paper discusses the integration of Forward-looking Infrared (FLIR) and traffic information from, for example, the Automatic Dependent Surveillance - Broadcast (ADS-B) or the Traffic Information Service-Broadcast (TIS-B). The goal of this integration method is to obtain an improved state estimate of a moving obstacle within the Field-of-View of the FLIR with added integrity. The focus of the paper will be on the approach phase of the flight. The paper will address methods to extract moving objects from the FLIR imagery and geo-reference these objects using outputs of both the onboard Global Positioning System (GPS) and the Inertial Navigation System (INS). The proposed extraction method uses a priori airport information and terrain databases. Furthermore, state information from the traffic information sources will be extracted and integrated with the state estimates from the FLIR. Finally, a method will be addressed that performs a consistency check between both sources of traffic information. The methods discussed in this paper will be evaluated using flight test data collected with a Gulfstream V in Reno, NV (GVSITE) and simulated ADS-B.

  16. FEX: A Knowledge-Based System For Planimetric Feature Extraction

    NASA Astrophysics Data System (ADS)

    Zelek, John S.

    1988-10-01

    Topographical planimetric features include natural surfaces (rivers, lakes) and man-made surfaces (roads, railways, bridges). In conventional planimetric feature extraction, a photointerpreter manually interprets and extracts features from imagery on a stereoplotter. Visual planimetric feature extraction is a very labour intensive operation. The advantages of automating feature extraction include: time and labour savings; accuracy improvements; and planimetric data consistency. FEX (Feature EXtraction) combines techniques from image processing, remote sensing and artificial intelligence for automatic feature extraction. The feature extraction process co-ordinates the information and knowledge in a hierarchical data structure. The system simulates the reasoning of a photointerpreter in determining the planimetric features. Present efforts have concentrated on the extraction of road-like features in SPOT imagery. Keywords: Remote Sensing, Artificial Intelligence (AI), SPOT, image understanding, knowledge base, apars.

  17. Knowledge Acquisition of Generic Queries for Information Retrieval

    PubMed Central

    Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.

    2002-01-01

    Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.

  18. Research of building information extraction and evaluation based on high-resolution remote-sensing imagery

    NASA Astrophysics Data System (ADS)

    Cao, Qiong; Gu, Lingjia; Ren, Ruizhi; Wang, Lang

    2016-09-01

    Building extraction currently is important in the application of high-resolution remote sensing imagery. At present, quite a few algorithms are available for detecting building information, however, most of them still have some obvious disadvantages, such as the ignorance of spectral information, the contradiction between extraction rate and extraction accuracy. The purpose of this research is to develop an effective method to detect building information for Chinese GF-1 data. Firstly, the image preprocessing technique is used to normalize the image and image enhancement is used to highlight the useful information in the image. Secondly, multi-spectral information is analyzed. Subsequently, an improved morphological building index (IMBI) based on remote sensing imagery is proposed to get the candidate building objects. Furthermore, in order to refine building objects and further remove false objects, the post-processing (e.g., the shape features, the vegetation index and the water index) is employed. To validate the effectiveness of the proposed algorithm, the omission errors (OE), commission errors (CE), the overall accuracy (OA) and Kappa are used at final. The proposed method can not only effectively use spectral information and other basic features, but also avoid extracting excessive interference details from high-resolution remote sensing images. Compared to the original MBI algorithm, the proposed method reduces the OE by 33.14% .At the same time, the Kappa increase by 16.09%. In experiments, IMBI achieved satisfactory results and outperformed other algorithms in terms of both accuracies and visual inspection

  19. Change Detection in High-Resolution Remote Sensing Images Using Levene-Test and Fuzzy Evaluation

    NASA Astrophysics Data System (ADS)

    Wang, G. H.; Wang, H. B.; Fan, W. F.; Liu, Y.; Liu, H. J.

    2018-04-01

    High-resolution remote sensing images possess complex spatial structure and rich texture information, according to these, this paper presents a new method of change detection based on Levene-Test and Fuzzy Evaluation. It first got map-spots by segmenting two overlapping images which had been pretreated, extracted features such as spectrum and texture. Then, changed information of all map-spots which had been treated by the Levene-Test were counted to obtain the candidate changed regions, hue information (H component) was extracted through the IHS Transform and conducted change vector analysis combined with the texture information. Eventually, the threshold was confirmed by an iteration method, the subject degrees of candidate changed regions were calculated, and final change regions were determined. In this paper experimental results on multi-temporal ZY-3 high-resolution images of some area in Jiangsu Province show that: Through extracting map-spots of larger difference as the candidate changed regions, Levene-Test decreases the computing load, improves the precision of change detection, and shows better fault-tolerant capacity for those unchanged regions which are of relatively large differences. The combination of Hue-texture features and fuzzy evaluation method can effectively decrease omissions and deficiencies, improve the precision of change detection.

  20. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies

    PubMed Central

    Ni, Yan; Su, Mingming; Qiu, Yunping; Jia, Wei

    2017-01-01

    ADAP-GC is an automated computational pipeline for untargeted, GC-MS-based metabolomics studies. It takes raw mass spectrometry data as input and carries out a sequence of data processing steps including construction of extracted ion chromatograms, detection of chromatographic peak features, deconvolution of co-eluting compounds, and alignment of compounds across samples. Despite the increased accuracy from the original version to version 2.0 in terms of extracting metabolite information for identification and quantitation, ADAP-GC 2.0 requires appropriate specification of a number of parameters and has difficulty in extracting information of compounds that are in low concentration. To overcome these two limitations, ADAP-GC 3.0 was developed to improve both the robustness and sensitivity of compound detection. In this paper, we report how these goals were achieved and compare ADAP-GC 3.0 against three other software tools including ChromaTOF, AnalyzerPro, and AMDIS that are widely used in the metabolomics community. PMID:27461032

  1. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies.

    PubMed

    Ni, Yan; Su, Mingming; Qiu, Yunping; Jia, Wei; Du, Xiuxia

    2016-09-06

    ADAP-GC is an automated computational pipeline for untargeted, GC/MS-based metabolomics studies. It takes raw mass spectrometry data as input and carries out a sequence of data processing steps including construction of extracted ion chromatograms, detection of chromatographic peak features, deconvolution of coeluting compounds, and alignment of compounds across samples. Despite the increased accuracy from the original version to version 2.0 in terms of extracting metabolite information for identification and quantitation, ADAP-GC 2.0 requires appropriate specification of a number of parameters and has difficulty in extracting information on compounds that are in low concentration. To overcome these two limitations, ADAP-GC 3.0 was developed to improve both the robustness and sensitivity of compound detection. In this paper, we report how these goals were achieved and compare ADAP-GC 3.0 against three other software tools including ChromaTOF, AnalyzerPro, and AMDIS that are widely used in the metabolomics community.

  2. Multiple kernel learning in protein-protein interaction extraction from biomedical literature.

    PubMed

    Yang, Zhihao; Tang, Nan; Zhang, Xiao; Lin, Hongfei; Li, Yanpeng; Yang, Zhiwei

    2011-03-01

    Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature. We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information. Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the AImed corpus. As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information. Copyright © 2010 Elsevier B.V. All rights reserved.

  3. PKDE4J: Entity and relation extraction for public knowledge discovery.

    PubMed

    Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young

    2015-10-01

    Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Photoacoustic phasoscopy super-contrast imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gao, Fei; Feng, Xiaohua; Zheng, Yuanjin, E-mail: yjzheng@ntu.edu.sg

    2014-05-26

    Phasoscopy is a recently proposed concept correlating electromagnetic (EM) absorption and scattering properties based on energy conservation. Phase information can be extracted from EM absorption induced acoustic wave and scattered EM wave for biological tissue characterization. In this paper, an imaging modality, termed photoacoustic phasoscopy imaging (PAPS), is proposed and verified experimentally based on phasoscopy concept with laser illumination. Both endogenous photoacoustic wave and scattered photons are collected simultaneously to extract the phase information. The PAPS images are then reconstructed on vessel-mimicking phantom and ex vivo porcine tissues to show significantly improved contrast than conventional photoacoustic imaging.

  5. PDF text classification to leverage information extraction from publication reports.

    PubMed

    Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

    2016-06-01

    Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Program review presentation to Level 1, Interagency Coordination Committee

    NASA Technical Reports Server (NTRS)

    1982-01-01

    Progress in the development of crop inventory technology is reported. Specific topics include the results of a thematic mapper analysis, variable selection studies/early season estimator improvements, the agricultural information system simulator, large unit proportion estimation, and development of common features for multi-satellite information extraction.

  7. Road and Roadside Feature Extraction Using Imagery and LIDAR Data for Transportation Operation

    NASA Astrophysics Data System (ADS)

    Ural, S.; Shan, J.; Romero, M. A.; Tarko, A.

    2015-03-01

    Transportation agencies require up-to-date, reliable, and feasibly acquired information on road geometry and features within proximity to the roads as input for evaluating and prioritizing new or improvement road projects. The information needed for a robust evaluation of road projects includes road centerline, width, and extent together with the average grade, cross-sections, and obstructions near the travelled way. Remote sensing is equipped with a large collection of data and well-established tools for acquiring the information and extracting aforementioned various road features at various levels and scopes. Even with many remote sensing data and methods available for road extraction, transportation operation requires more than the centerlines. Acquiring information that is spatially coherent at the operational level for the entire road system is challenging and needs multiple data sources to be integrated. In the presented study, we established a framework that used data from multiple sources, including one-foot resolution color infrared orthophotos, airborne LiDAR point clouds, and existing spatially non-accurate ancillary road networks. We were able to extract 90.25% of a total of 23.6 miles of road networks together with estimated road width, average grade along the road, and cross sections at specified intervals. Also, we have extracted buildings and vegetation within a predetermined proximity to the extracted road extent. 90.6% of 107 existing buildings were correctly identified with 31% false detection rate.

  8. Multisensor multiresolution data fusion for improvement in classification

    NASA Astrophysics Data System (ADS)

    Rubeena, V.; Tiwari, K. C.

    2016-04-01

    The rapid advancements in technology have facilitated easy availability of multisensor and multiresolution remote sensing data. Multisensor, multiresolution data contain complementary information and fusion of such data may result in application dependent significant information which may otherwise remain trapped within. The present work aims at improving classification by fusing features of coarse resolution hyperspectral (1 m) LWIR and fine resolution (20 cm) RGB data. The classification map comprises of eight classes. The class names are Road, Trees, Red Roof, Grey Roof, Concrete Roof, Vegetation, bare Soil and Unclassified. The processing methodology for hyperspectral LWIR data comprises of dimensionality reduction, resampling of data by interpolation technique for registering the two images at same spatial resolution, extraction of the spatial features to improve classification accuracy. In the case of fine resolution RGB data, the vegetation index is computed for classifying the vegetation class and the morphological building index is calculated for buildings. In order to extract the textural features, occurrence and co-occurence statistics is considered and the features will be extracted from all the three bands of RGB data. After extracting the features, Support Vector Machine (SVMs) has been used for training and classification. To increase the classification accuracy, post processing steps like removal of any spurious noise such as salt and pepper noise is done which is followed by filtering process by majority voting within the objects for better object classification.

  9. Extraction of actionable information from crowdsourced disaster data.

    PubMed

    Kiatpanont, Rungsun; Tanlamai, Uthai; Chongstitvatana, Prabhas

    Natural disasters cause enormous damage to countries all over the world. To deal with these common problems, different activities are required for disaster management at each phase of the crisis. There are three groups of activities as follows: (1) make sense of the situation and determine how best to deal with it, (2) deploy the necessary resources, and (3) harmonize as many parties as possible, using the most effective communication channels. Current technological improvements and developments now enable people to act as real-time information sources. As a result, inundation with crowdsourced data poses a real challenge for a disaster manager. The problem is how to extract the valuable information from a gigantic data pool in the shortest possible time so that the information is still useful and actionable. This research proposed an actionable-data-extraction process to deal with the challenge. Twitter was selected as a test case because messages posted on Twitter are publicly available. Hashtag, an easy and very efficient technique, was also used to differentiate information. A quantitative approach to extract useful information from the tweets was supported and verified by interviews with disaster managers from many leading organizations in Thailand to understand their missions. The information classifications extracted from the collected tweets were first performed manually, and then the tweets were used to train a machine learning algorithm to classify future tweets. One particularly useful, significant, and primary section was the request for help category. The support vector machine algorithm was used to validate the results from the extraction process of 13,696 sample tweets, with over 74 percent accuracy. The results confirmed that the machine learning technique could significantly and practically assist with disaster management by dealing with crowdsourced data.

  10. Second Iteration of Photogrammetric Pipeline to Enhance the Accuracy of Image Pose Estimation

    NASA Astrophysics Data System (ADS)

    Nguyen, T. G.; Pierrot-Deseilligny, M.; Muller, J.-M.; Thom, C.

    2017-05-01

    In classical photogrammetric processing pipeline, the automatic tie point extraction plays a key role in the quality of achieved results. The image tie points are crucial to pose estimation and have a significant influence on the precision of calculated orientation parameters. Therefore, both relative and absolute orientations of the 3D model can be affected. By improving the precision of image tie point measurement, one can enhance the quality of image orientation. The quality of image tie points is under the influence of several factors such as the multiplicity, the measurement precision and the distribution in 2D images as well as in 3D scenes. In complex acquisition scenarios such as indoor applications and oblique aerial images, tie point extraction is limited while only image information can be exploited. Hence, we propose here a method which improves the precision of pose estimation in complex scenarios by adding a second iteration to the classical processing pipeline. The result of a first iteration is used as a priori information to guide the extraction of new tie points with better quality. Evaluated with multiple case studies, the proposed method shows its validity and its high potiential for precision improvement.

  11. Uncovering the essential links in online commercial networks

    NASA Astrophysics Data System (ADS)

    Zeng, Wei; Fang, Meiling; Shao, Junming; Shang, Mingsheng

    2016-09-01

    Recommender systems are designed to effectively support individuals' decision-making process on various web sites. It can be naturally represented by a user-object bipartite network, where a link indicates that a user has collected an object. Recently, research on the information backbone has attracted researchers' interests, which is a sub-network with fewer nodes and links but carrying most of the relevant information. With the backbone, a system can generate satisfactory recommenda- tions while saving much computing resource. In this paper, we propose an enhanced topology-aware method to extract the information backbone in the bipartite network mainly based on the information of neighboring users and objects. Our backbone extraction method enables the recommender systems achieve more than 90% of the accuracy of the top-L recommendation, however, consuming only 20% links. The experimental results show that our method outperforms the alternative backbone extraction methods. Moreover, the structure of the information backbone is studied in detail. Finally, we highlight that the information backbone is one of the most important properties of the bipartite network, with which one can significantly improve the efficiency of the recommender system.

  12. [An Extraction and Recognition Method of the Distributed Optical Fiber Vibration Signal Based on EMD-AWPP and HOSA-SVM Algorithm].

    PubMed

    Zhang, Yanjun; Liu, Wen-zhe; Fu, Xing-hu; Bi, Wei-hong

    2016-02-01

    Given that the traditional signal processing methods can not effectively distinguish the different vibration intrusion signal, a feature extraction and recognition method of the vibration information is proposed based on EMD-AWPP and HOSA-SVM, using for high precision signal recognition of distributed fiber optic intrusion detection system. When dealing with different types of vibration, the method firstly utilizes the adaptive wavelet processing algorithm based on empirical mode decomposition effect to reduce the abnormal value influence of sensing signal and improve the accuracy of signal feature extraction. Not only the low frequency part of the signal is decomposed, but also the high frequency part the details of the signal disposed better by time-frequency localization process. Secondly, it uses the bispectrum and bicoherence spectrum to accurately extract the feature vector which contains different types of intrusion vibration. Finally, based on the BPNN reference model, the recognition parameters of SVM after the implementation of the particle swarm optimization can distinguish signals of different intrusion vibration, which endows the identification model stronger adaptive and self-learning ability. It overcomes the shortcomings, such as easy to fall into local optimum. The simulation experiment results showed that this new method can effectively extract the feature vector of sensing information, eliminate the influence of random noise and reduce the effects of outliers for different types of invasion source. The predicted category identifies with the output category and the accurate rate of vibration identification can reach above 95%. So it is better than BPNN recognition algorithm and improves the accuracy of the information analysis effectively.

  13. Active surface model improvement by energy function optimization for 3D segmentation.

    PubMed

    Azimifar, Zohreh; Mohaddesi, Mahsa

    2015-04-01

    This paper proposes an optimized and efficient active surface model by improving the energy functions, searching method, neighborhood definition and resampling criterion. Extracting an accurate surface of the desired object from a number of 3D images using active surface and deformable models plays an important role in computer vision especially medical image processing. Different powerful segmentation algorithms have been suggested to address the limitations associated with the model initialization, poor convergence to surface concavities and slow convergence rate. This paper proposes a method to improve one of the strongest and recent segmentation algorithms, namely the Decoupled Active Surface (DAS) method. We consider a gradient of wavelet edge extracted image and local phase coherence as external energy to extract more information from images and we use curvature integral as internal energy to focus on high curvature region extraction. Similarly, we use resampling of points and a line search for point selection to improve the accuracy of the algorithm. We further employ an estimation of the desired object as an initialization for the active surface model. A number of tests and experiments have been done and the results show the improvements with regards to the extracted surface accuracy and computational time of the presented algorithm compared with the best and recent active surface models. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Cognition-Based Approaches for High-Precision Text Mining

    ERIC Educational Resources Information Center

    Shannon, George John

    2017-01-01

    This research improves the precision of information extraction from free-form text via the use of cognitive-based approaches to natural language processing (NLP). Cognitive-based approaches are an important, and relatively new, area of research in NLP and search, as well as linguistics. Cognitive approaches enable significant improvements in both…

  15. Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources.

    PubMed

    Mao, Jin; Moore, Lisa R; Blank, Carrine E; Wu, Elvis Hsin-Hui; Ackerman, Marcia; Ranade, Sonali; Cui, Hong

    2016-12-13

    The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.

  16. A Framework for Land Cover Classification Using Discrete Return LiDAR Data: Adopting Pseudo-Waveform and Hierarchical Segmentation

    NASA Technical Reports Server (NTRS)

    Jung, Jinha; Pasolli, Edoardo; Prasad, Saurabh; Tilton, James C.; Crawford, Melba M.

    2014-01-01

    Acquiring current, accurate land-use information is critical for monitoring and understanding the impact of anthropogenic activities on natural environments.Remote sensing technologies are of increasing importance because of their capability to acquire information for large areas in a timely manner, enabling decision makers to be more effective in complex environments. Although optical imagery has demonstrated to be successful for land cover classification, active sensors, such as light detection and ranging (LiDAR), have distinct capabilities that can be exploited to improve classification results. However, utilization of LiDAR data for land cover classification has not been fully exploited. Moreover, spatial-spectral classification has recently gained significant attention since classification accuracy can be improved by extracting additional information from the neighboring pixels. Although spatial information has been widely used for spectral data, less attention has been given to LiDARdata. In this work, a new framework for land cover classification using discrete return LiDAR data is proposed. Pseudo-waveforms are generated from the LiDAR data and processed by hierarchical segmentation. Spatial featuresare extracted in a region-based way using a new unsupervised strategy for multiple pruning of the segmentation hierarchy. The proposed framework is validated experimentally on a real dataset acquired in an urban area. Better classification results are exhibited by the proposed framework compared to the cases in which basic LiDAR products such as digital surface model and intensity image are used. Moreover, the proposed region-based feature extraction strategy results in improved classification accuracies in comparison with a more traditional window-based approach.

  17. Establishment of Application Guidance for OTC non-Kampo Crude Drug Extract Products in Japan

    PubMed Central

    Somekawa, Layla; Maegawa, Hikoichiro; Tsukada, Shinsuke; Nakamura, Takatoshi

    2017-01-01

    Currently, there are no standardized regulatory systems for herbal medicinal products worldwide. Communication and sharing of knowledge between different regulatory systems will lead to mutual understanding and might help identify topics which deserve further discussion in the establishment of common standards. Regulatory information on traditional herbal medicinal products in Japan is updated by the establishment of Application Guidance for over-the-counter non-Kampo Crude Drug Extract Products. We would like to report on updated regulatory information on the new Application Guidance. Methods for comparison of Crude Drug Extract formulation and standard decoction and criteria for application and the key points to consider for each criterion are indicated in the guidance. Establishment of the guidance contributes to improvements in public health. We hope that the regulatory information about traditional herbal medicinal products in Japan will be of contribution to tackling the challenging task of regulating traditional herbal products worldwide. PMID:28894633

  18. Using Process Redesign and Information Technology to Improve Procurement

    DTIC Science & Technology

    1994-04-01

    contrac- tor. Many large-volume contractors have automated order processing tied to ac- counting, manufacturing, and shipping subsystems. Currently...the contractor must receive the mailed order, analyze it, extract pertinent information, and en- ter that information into the automated order ... processing system. Almost all orders for small purchases are unilateral documents that do not require acceptance or acknowledgment by the contractor. For

  19. ccML, a new mark-up language to improve ISO/EN 13606-based electronic health record extracts practical edition.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Cáceres, Jesús; Somolinos, Roberto; Pascual, Mario; Martínez, Ignacio; Salvador, Carlos H; Monteagudo, José Luis

    2013-01-01

    The objective of this paper is to introduce a new language called ccML, designed to provide convenient pragmatic information to applications using the ISO/EN13606 reference model (RM), such as electronic health record (EHR) extracts editors. EHR extracts are presently built using the syntactic and semantic information provided in the RM and constrained by archetypes. The ccML extra information enables the automation of the medico-legal context information edition, which is over 70% of the total in an extract, without modifying the RM information. ccML is defined using a W3C XML schema file. Valid ccML files complement the RM with additional pragmatics information. The ccML language grammar is defined using formal language theory as a single-type tree grammar. The new language is tested using an EHR extracts editor application as proof-of-concept system. Seven ccML PVCodes (predefined value codes) are introduced in this grammar to cope with different realistic EHR edition situations. These seven PVCodes have different interpretation strategies, from direct look up in the ccML file itself, to more complex searches in archetypes or system precomputation. The possibility to declare generic types in ccML gives rise to ambiguity during interpretation. The criterion used to overcome ambiguity is that specificity should prevail over generality. The opposite would make the individual specific element declarations useless. A new mark-up language ccML is introduced that opens up the possibility of providing applications using the ISO/EN13606 RM with the necessary pragmatics information to be practical and realistic.

  20. Consent for third molar tooth extractions in Australia and New Zealand: a review of current practice.

    PubMed

    Badenoch-Jones, E K; Lynham, A J; Loessner, D

    2016-06-01

    Informed consent is the legal requirement to educate a patient about a proposed medical treatment or procedure so that he or she can make informed decisions. The purpose of the study was to examine the current practice for obtaining informed consent for third molar tooth extractions (wisdom teeth) by oral and maxillofacial surgeons in Australia and New Zealand. An online survey was sent to 180 consultant oral and maxillofacial surgeons in Australia and New Zealand. Surgeons were asked to answer (yes/no) whether they routinely warned of a specific risk of third molar tooth extraction in their written consent. Seventy-one replies were received (39%). The only risks that surgeons agreed should be routinely included in written consent were a general warning of infection (not alveolar osteitis), inferior alveolar nerve damage (temporary and permanent) and lingual nerve damage (temporary and permanent). There is significant variability among Australian and New Zealand oral and maxillofacial surgeons regarding risk disclosure for third molar tooth extractions. We aim to improve consistency in consent for third molar extractions by developing an evidence-based consent form. © 2016 Australian Dental Association.

  1. Sophia: A Expedient UMLS Concept Extraction Annotator.

    PubMed

    Divita, Guy; Zeng, Qing T; Gundlapalli, Adi V; Duvall, Scott; Nebeker, Jonathan; Samore, Matthew H

    2014-01-01

    An opportunity exists for meaningful concept extraction and indexing from large corpora of clinical notes in the Veterans Affairs (VA) electronic medical record. Currently available tools such as MetaMap, cTAKES and HITex do not scale up to address this big data need. Sophia, a rapid UMLS concept extraction annotator was developed to fulfill a mandate and address extraction where high throughput is needed while preserving performance. We report on the development, testing and benchmarking of Sophia against MetaMap and cTAKEs. Sophia demonstrated improved performance on recall as compared to cTAKES and MetaMap (0.71 vs 0.66 and 0.38). The overall f-score was similar to cTAKES and an improvement over MetaMap (0.53 vs 0.57 and 0.43). With regard to speed of processing records, we noted Sophia to be several fold faster than cTAKES and the scaled-out MetaMap service. Sophia offers a viable alternative for high-throughput information extraction tasks.

  2. Sophia: A Expedient UMLS Concept Extraction Annotator

    PubMed Central

    Divita, Guy; Zeng, Qing T; Gundlapalli, Adi V.; Duvall, Scott; Nebeker, Jonathan; Samore, Matthew H.

    2014-01-01

    An opportunity exists for meaningful concept extraction and indexing from large corpora of clinical notes in the Veterans Affairs (VA) electronic medical record. Currently available tools such as MetaMap, cTAKES and HITex do not scale up to address this big data need. Sophia, a rapid UMLS concept extraction annotator was developed to fulfill a mandate and address extraction where high throughput is needed while preserving performance. We report on the development, testing and benchmarking of Sophia against MetaMap and cTAKEs. Sophia demonstrated improved performance on recall as compared to cTAKES and MetaMap (0.71 vs 0.66 and 0.38). The overall f-score was similar to cTAKES and an improvement over MetaMap (0.53 vs 0.57 and 0.43). With regard to speed of processing records, we noted Sophia to be several fold faster than cTAKES and the scaled-out MetaMap service. Sophia offers a viable alternative for high-throughput information extraction tasks. PMID:25954351

  3. The Importance of Data Quality in Using Health Information Exchange (HIE) Networks to Improve Health Outcomes: Case Study of a HIE Extracted Dataset of Patients with Congestive Heart Failure Participating in a Regional HIE

    ERIC Educational Resources Information Center

    Cartron-Mizeracki, Marie-Astrid

    2016-01-01

    Expenditures on health information technology (HIT) for healthcare organizations are growing exponentially and the value of it is the subject of criticism and skepticism. Because HIT is viewed as capable of improving major health care indicators, the government offers incentives to health care providers and organizations to implement solutions.…

  4. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

    PubMed

    Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela

    2015-05-01

    Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  5. Passive Polarimetric Information Processing for Target Classification

    NASA Astrophysics Data System (ADS)

    Sadjadi, Firooz; Sadjadi, Farzad

    Polarimetric sensing is an area of active research in a variety of applications. In particular, the use of polarization diversity has been shown to improve performance in automatic target detection and recognition. Within the diverse scope of polarimetric sensing, the field of passive polarimetric sensing is of particular interest. This chapter presents several new methods for gathering in formation using such passive techniques. One method extracts three-dimensional (3D) information and surface properties using one or more sensors. Another method extracts scene-specific algebraic expressions that remain unchanged under polariza tion transformations (such as along the transmission path to the sensor).

  6. Feasibility of approaches combining sensor and source features in brain-computer interface.

    PubMed

    Ahn, Minkyu; Hong, Jun Hee; Jun, Sung Chan

    2012-02-15

    Brain-computer interface (BCI) provides a new channel for communication between brain and computers through brain signals. Cost-effective EEG provides good temporal resolution, but its spatial resolution is poor and sensor information is blurred by inherent noise. To overcome these issues, spatial filtering and feature extraction techniques have been developed. Source imaging, transformation of sensor signals into the source space through source localizer, has gained attention as a new approach for BCI. It has been reported that the source imaging yields some improvement of BCI performance. However, there exists no thorough investigation on how source imaging information overlaps with, and is complementary to, sensor information. Information (visible information) from the source space may overlap as well as be exclusive to information from the sensor space is hypothesized. Therefore, we can extract more information from the sensor and source spaces if our hypothesis is true, thereby contributing to more accurate BCI systems. In this work, features from each space (sensor or source), and two strategies combining sensor and source features are assessed. The information distribution among the sensor, source, and combined spaces is discussed through a Venn diagram for 18 motor imagery datasets. Additional 5 motor imagery datasets from the BCI Competition III site were examined. The results showed that the addition of source information yielded about 3.8% classification improvement for 18 motor imagery datasets and showed an average accuracy of 75.56% for BCI Competition data. Our proposed approach is promising, and improved performance may be possible with better head model. Copyright © 2011 Elsevier B.V. All rights reserved.

  7. Interrogating Bronchoalveolar Lavage Samples via Exclusion-Based Analyte Extraction.

    PubMed

    Tokar, Jacob J; Warrick, Jay W; Guckenberger, David J; Sperger, Jamie M; Lang, Joshua M; Ferguson, J Scott; Beebe, David J

    2017-06-01

    Although average survival rates for lung cancer have improved, earlier and better diagnosis remains a priority. One promising approach to assisting earlier and safer diagnosis of lung lesions is bronchoalveolar lavage (BAL), which provides a sample of lung tissue as well as proteins and immune cells from the vicinity of the lesion, yet diagnostic sensitivity remains a challenge. Reproducible isolation of lung epithelia and multianalyte extraction have the potential to improve diagnostic sensitivity and provide new information for developing personalized therapeutic approaches. We present the use of a recently developed exclusion-based, solid-phase-extraction technique called SLIDE (Sliding Lid for Immobilized Droplet Extraction) to facilitate analysis of BAL samples. We developed a SLIDE protocol for lung epithelial cell extraction and biomarker staining of patient BALs, testing both EpCAM and Trop2 as capture antigens. We characterized captured cells using TTF1 and p40 as immunostaining biomarkers of adenocarcinoma and squamous cell carcinoma, respectively. We achieved up to 90% (EpCAM) and 84% (Trop2) extraction efficiency of representative tumor cell lines. We then used the platform to process two patient BAL samples in parallel within the same sample plate to demonstrate feasibility and observed that Trop2-based extraction potentially extracts more target cells than EpCAM-based extraction.

  8. Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

    PubMed

    Han, Xu; Kim, Jung-jae; Kwoh, Chee Keong

    2016-01-01

    Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.

  9. Performance measure that indicates geometry sufficiency of state highways : volume II -- clear zones and cross-section information extraction.

    DOT National Transportation Integrated Search

    2015-03-01

    Evaluationmethod employedforthe proposed corridor projects by IndianaDepartment of Transportation(INDOT) considerroad : geometry improvements by a generalized categorization. A newmethod which consi...

  10. Aircraft Segmentation in SAR Images Based on Improved Active Shape Model

    NASA Astrophysics Data System (ADS)

    Zhang, X.; Xiong, B.; Kuang, G.

    2018-04-01

    In SAR image interpretation, aircrafts are the important targets arousing much attention. However, it is far from easy to segment an aircraft from the background completely and precisely in SAR images. Because of the complex structure, different kinds of electromagnetic scattering take place on the aircraft surfaces. As a result, aircraft targets usually appear to be inhomogeneous and disconnected. It is a good idea to extract an aircraft target by the active shape model (ASM), since combination of the geometric information controls variations of the shape during the contour evolution. However, linear dimensionality reduction, used in classic ACM, makes the model rigid. It brings much trouble to segment different types of aircrafts. Aiming at this problem, an improved ACM based on ISOMAP is proposed in this paper. ISOMAP algorithm is used to extract the shape information of the training set and make the model flexible enough to deal with different aircrafts. The experiments based on real SAR data shows that the proposed method achieves obvious improvement in accuracy.

  11. MULTISCALE TENSOR ANISOTROPIC FILTERING OF FLUORESCENCE MICROSCOPY FOR DENOISING MICROVASCULATURE.

    PubMed

    Prasath, V B S; Pelapur, R; Glinskii, O V; Glinsky, V V; Huxley, V H; Palaniappan, K

    2015-04-01

    Fluorescence microscopy images are contaminated by noise and improving image quality without blurring vascular structures by filtering is an important step in automatic image analysis. The application of interest here is to automatically extract the structural components of the microvascular system with accuracy from images acquired by fluorescence microscopy. A robust denoising process is necessary in order to extract accurate vascular morphology information. For this purpose, we propose a multiscale tensor with anisotropic diffusion model which progressively and adaptively updates the amount of smoothing while preserving vessel boundaries accurately. Based on a coherency enhancing flow with planar confidence measure and fused 3D structure information, our method integrates multiple scales for microvasculature preservation and noise removal membrane structures. Experimental results on simulated synthetic images and epifluorescence images show the advantage of our improvement over other related diffusion filters. We further show that the proposed multiscale integration approach improves denoising accuracy of different tensor diffusion methods to obtain better microvasculature segmentation.

  12. Text-mining and information-retrieval services for molecular biology

    PubMed Central

    Krallinger, Martin; Valencia, Alfonso

    2005-01-01

    Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455

  13. Fusion of infrared polarization and intensity images based on improved toggle operator

    NASA Astrophysics Data System (ADS)

    Zhu, Pan; Ding, Lei; Ma, Xiaoqing; Huang, Zhanhua

    2018-01-01

    Integration of infrared polarization and intensity images has been a new topic in infrared image understanding and interpretation. The abundant infrared details and target from infrared image and the salient edge and shape information from polarization image should be preserved or even enhanced in the fused result. In this paper, a new fusion method is proposed for infrared polarization and intensity images based on the improved multi-scale toggle operator with spatial scale, which can effectively extract the feature information of source images and heavily reduce redundancy among different scale. Firstly, the multi-scale image features of infrared polarization and intensity images are respectively extracted at different scale levels by the improved multi-scale toggle operator. Secondly, the redundancy of the features among different scales is reduced by using spatial scale. Thirdly, the final image features are combined by simply adding all scales of feature images together, and a base image is calculated by performing mean value weighted method on smoothed source images. Finally, the fusion image is obtained by importing the combined image features into the base image with a suitable strategy. Both objective assessment and subjective vision of the experimental results indicate that the proposed method obtains better performance in preserving the details and edge information as well as improving the image contrast.

  14. Use of feature extraction techniques for the texture and context information in ERTS imagery: Spectral and textural processing of ERTS imagery. [classification of Kansas land use

    NASA Technical Reports Server (NTRS)

    Haralick, R. H. (Principal Investigator); Bosley, R. J.

    1974-01-01

    The author has identified the following significant results. A procedure was developed to extract cross-band textural features from ERTS MSS imagery. Evolving from a single image texture extraction procedure which uses spatial dependence matrices to measure relative co-occurrence of nearest neighbor grey tones, the cross-band texture procedure uses the distribution of neighboring grey tone N-tuple differences to measure the spatial interrelationships, or co-occurrences, of the grey tone N-tuples present in a texture pattern. In both procedures, texture is characterized in such a way as to be invariant under linear grey tone transformations. However, the cross-band procedure complements the single image procedure by extracting texture information and spectral information contained in ERTS multi-images. Classification experiments show that when used alone, without spectral processing, the cross-band texture procedure extracts more information than the single image texture analysis. Results show an improvement in average correct classification from 86.2% to 88.8% for ERTS image no. 1021-16333 with the cross-band texture procedure. However, when used together with spectral features, the single image texture plus spectral features perform better than the cross-band texture plus spectral features, with an average correct classification of 93.8% and 91.6%, respectively.

  15. Model of experts for decision support in the diagnosis of leukemia patients.

    PubMed

    Corchado, Juan M; De Paz, Juan F; Rodríguez, Sara; Bajo, Javier

    2009-07-01

    Recent advances in the field of biomedicine, specifically in the field of genomics, have led to an increase in the information available for conducting expression analysis. Expression analysis is a technique used in transcriptomics, a branch of genomics that deals with the study of messenger ribonucleic acid (mRNA) and the extraction of information contained in the genes. This increase in information is reflected in the exon arrays, which require the use of new techniques in order to extract the information. The purpose of this study is to provide a tool based on a mixture of experts model that allows the analysis of the information contained in the exon arrays, from which automatic classifications for decision support in diagnoses of leukemia patients can be made. The proposed model integrates several cooperative algorithms characterized for their efficiency for data processing, filtering, classification and knowledge extraction. The Cancer Institute of the University of Salamanca is making an effort to develop tools to automate the evaluation of data and to facilitate de analysis of information. This proposal is a step forward in this direction and the first step toward the development of a mixture of experts tool that integrates different cognitive and statistical approaches to deal with the analysis of exon arrays. The mixture of experts model presented within this work provides great capacities for learning and adaptation to the characteristics of the problem in consideration, using novel algorithms in each of the stages of the analysis process that can be easily configured and combined, and provides results that notably improve those provided by the existing methods for exon arrays analysis. The material used consists of data from exon arrays provided by the Cancer Institute that contain samples from leukemia patients. The methodology used consists of a system based on a mixture of experts. Each one of the experts incorporates novel artificial intelligence techniques that improve the process of carrying out various tasks such as pre-processing, filtering, classification and extraction of knowledge. This article will detail the manner in which individual experts are combined so that together they generate a system capable of extracting knowledge, thus permitting patients to be classified in an automatic and efficient manner that is also comprehensible for medical personnel. The system has been tested in a real setting and has been used for classifying patients who suffer from different forms of leukemia at various stages. Personnel from the Cancer Institute supervised and participated throughout the testing period. Preliminary results are promising, notably improving the results obtained with previously used tools. The medical staff from the Cancer Institute considers the tools that have been developed to be positive and very useful in a supporting capacity for carrying out their daily tasks. Additionally the mixture of experts supplies a tool for the extraction of necessary information in order to explain the associations that have been made in simple terms. That is, it permits the extraction of knowledge for each classification made and generalized in order to be used in subsequent classifications. This allows for a large amount of learning and adaptation within the proposed system.

  16. Implementation and Initial Testing of Advanced Processing and Analysis Algorithms for Correlated Neutron Counting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santi, Peter Angelo; Cutler, Theresa Elizabeth; Favalli, Andrea

    In order to improve the accuracy and capabilities of neutron multiplicity counting, additional quantifiable information is needed in order to address the assumptions that are present in the point model. Extracting and utilizing higher order moments (Quads and Pents) from the neutron pulse train represents the most direct way of extracting additional information from the measurement data to allow for an improved determination of the physical properties of the item of interest. The extraction of higher order moments from a neutron pulse train required the development of advanced dead time correction algorithms which could correct for dead time effects inmore » all of the measurement moments in a self-consistent manner. In addition, advanced analysis algorithms have been developed to address specific assumptions that are made within the current analysis model, namely that all neutrons are created at a single point within the item of interest, and that all neutrons that are produced within an item are created with the same energy distribution. This report will discuss the current status of implementation and initial testing of the advanced dead time correction and analysis algorithms that have been developed in an attempt to utilize higher order moments to improve the capabilities of correlated neutron measurement techniques.« less

  17. The identification of clinically important elements within medical journal abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR).

    PubMed

    Dawes, Martin; Pluye, Pierre; Shea, Laura; Grad, Roland; Greenberg, Arlene; Nie, Jian-Yun

    2007-01-01

    Information retrieval in primary care is becoming more difficult as the volume of medical information held in electronic databases expands. The lexical structure of this information might permit automatic indexing and improved retrieval. To determine the possibility of identifying the key elements of clinical studies, namely Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR), from abstracts of medical journals. We used a convenience sample of 20 synopses from the journal Evidence-Based Medicine (EBM) and their matching original journal article abstracts obtained from PubMed. Three independent primary care professionals identified PECODR-related extracts of text. Rules were developed to define each PECODR element and the selection process of characters, words, phrases and sentences. From the extracts of text related to PECODR elements, potential lexical patterns that might help identify those elements were proposed and assessed using NVivo software. A total of 835 PECODR-related text extracts containing 41,263 individual text characters were identified from 20 EBM journal synopses. There were 759 extracts in the corresponding PubMed abstracts containing 31,947 characters. PECODR elements were found in nearly all abstracts and synopses with the exception of duration. There was agreement on 86.6% of the extracts from the 20 EBM synopses and 85.0% on the corresponding PubMed abstracts. After consensus this rose to 98.4% and 96.9% respectively. We found potential text patterns in the Comparison, Outcome and Results elements of both EBM synopses and PubMed abstracts. Some phrases and words are used frequently and are specific for these elements in both synopses and abstracts. Results suggest a PECODR-related structure exists in medical abstracts and that there might be lexical patterns specific to these elements. More sophisticated computer-assisted lexical-semantic analysis might refine these results, and pave the way to automating PECODR indexing, and improve information retrieval in primary care.

  18. ccML, a new mark-up language to improve ISO/EN 13606-based electronic health record extracts practical edition

    PubMed Central

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Cáceres, Jesús; Somolinos, Roberto; Pascual, Mario; Martínez, Ignacio; Salvador, Carlos H; Monteagudo, José Luis

    2013-01-01

    Objective The objective of this paper is to introduce a new language called ccML, designed to provide convenient pragmatic information to applications using the ISO/EN13606 reference model (RM), such as electronic health record (EHR) extracts editors. EHR extracts are presently built using the syntactic and semantic information provided in the RM and constrained by archetypes. The ccML extra information enables the automation of the medico-legal context information edition, which is over 70% of the total in an extract, without modifying the RM information. Materials and Methods ccML is defined using a W3C XML schema file. Valid ccML files complement the RM with additional pragmatics information. The ccML language grammar is defined using formal language theory as a single-type tree grammar. The new language is tested using an EHR extracts editor application as proof-of-concept system. Results Seven ccML PVCodes (predefined value codes) are introduced in this grammar to cope with different realistic EHR edition situations. These seven PVCodes have different interpretation strategies, from direct look up in the ccML file itself, to more complex searches in archetypes or system precomputation. Discussion The possibility to declare generic types in ccML gives rise to ambiguity during interpretation. The criterion used to overcome ambiguity is that specificity should prevail over generality. The opposite would make the individual specific element declarations useless. Conclusion A new mark-up language ccML is introduced that opens up the possibility of providing applications using the ISO/EN13606 RM with the necessary pragmatics information to be practical and realistic. PMID:23019241

  19. Improved Proteomic Analysis Following Trichloroacetic Acid Extraction of Bacillus anthracis Spore Proteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaiser, Brooke LD; Wunschel, David S.; Sydor, Michael A.

    2015-08-07

    Proteomic analysis of bacterial samples provides valuable information about cellular responses and functions under different environmental pressures. Proteomic analysis is dependent upon efficient extraction of proteins from bacterial samples without introducing bias toward extraction of particular protein classes. While no single method can recover 100% of the bacterial proteins, selected protocols can improve overall protein isolation, peptide recovery, or enrich for certain classes of proteins. The method presented here is technically simple and does not require specialized equipment such as a mechanical disrupter. Our data reveal that for particularly challenging samples, such as B. anthracis Sterne spores, trichloroacetic acid extractionmore » improved the number of proteins identified within a sample compared to bead beating (714 vs 660, respectively). Further, TCA extraction enriched for 103 known spore specific proteins whereas bead beating resulted in 49 unique proteins. Analysis of C. botulinum samples grown to 5 days, composed of vegetative biomass and spores, showed a similar trend with improved protein yields and identification using our method compared to bead beating. Interestingly, easily lysed samples, such as B. anthracis vegetative cells, were equally as effectively processed via TCA and bead beating, but TCA extraction remains the easiest and most cost effective option. As with all assays, supplemental methods such as implementation of an alternative preparation method may provide additional insight to the protein biology of the bacteria being studied.« less

  20. Non-causal spike filtering improves decoding of movement intention for intracortical BCIs

    PubMed Central

    Masse, Nicolas Y.; Jarosiewicz, Beata; Simeral, John D.; Bacher, Daniel; Stavisky, Sergey D.; Cash, Sydney S.; Oakley, Erin M.; Berhanu, Etsub; Eskandar, Emad; Friehs, Gerhard; Hochberg, Leigh R.; Donoghue, John P.

    2014-01-01

    Background Multiple types of neural signals are available for controlling assistive devices through brain-computer interfaces (BCIs). Intracortically-recorded spiking neural signals are attractive for BCIs because they can in principle provide greater fidelity of encoded information compared to electrocorticographic (ECoG) signals and electroencephalograms (EEGs). Recent reports show that the information content of these spiking neural signals can be reliably extracted simply by causally band-pass filtering the recorded extracellular voltage signals and then applying a spike detection threshold, without relying on “sorting” action potentials. New method We show that replacing the causal filter with an equivalent non-causal filter increases the information content extracted from the extracellular spiking signal and improves decoding of intended movement direction. This method can be used for real-time BCI applications by using a 4 ms lag between recording and filtering neural signals. Results Across 18 sessions from two people with tetraplegia enrolled in the BrainGate2 pilot clinical trial, we found that threshold crossing events extracted using this non-causal filtering method were significantly more informative of each participant’s intended cursor kinematics compared to threshold crossing events derived from causally filtered signals. This new method decreased the mean angular error between the intended and decoded cursor direction by 9.7° for participant S3, who was implanted 5.4 years prior to this study, and by 3.5° for participant T2, who was implanted 3 months prior to this study. Conclusions Non-causally filtering neural signals prior to extracting threshold crossing events may be a simple yet effective way to condition intracortically recorded neural activity for direct control of external devices through BCIs. PMID:25128256

  1. An improved feature extraction algorithm based on KAZE for multi-spectral image

    NASA Astrophysics Data System (ADS)

    Yang, Jianping; Li, Jun

    2018-02-01

    Multi-spectral image contains abundant spectral information, which is widely used in all fields like resource exploration, meteorological observation and modern military. Image preprocessing, such as image feature extraction and matching, is indispensable while dealing with multi-spectral remote sensing image. Although the feature matching algorithm based on linear scale such as SIFT and SURF performs strong on robustness, the local accuracy cannot be guaranteed. Therefore, this paper proposes an improved KAZE algorithm, which is based on nonlinear scale, to raise the number of feature and to enhance the matching rate by using the adjusted-cosine vector. The experiment result shows that the number of feature and the matching rate of the improved KAZE are remarkably than the original KAZE algorithm.

  2. Deep Learning from EEG Reports for Inferring Underspecified Information

    PubMed Central

    Goodwin, Travis R.; Harabagiu, Sanda M.

    2017-01-01

    Secondary use1of electronic health records (EHRs) often relies on the ability to automatically identify and extract information from EHRs. Unfortunately, EHRs are known to suffer from a variety of idiosyncrasies – most prevalently, they have been shown to often omit or underspecify information. Adapting traditional machine learning methods for inferring underspecified information relies on manually specifying features characterizing the specific information to recover (e.g. particular findings, test results, or physician’s impressions). By contrast, in this paper, we present a method for jointly (1) automatically extracting word- and report-level features and (2) inferring underspecified information from EHRs. Our approach accomplishes these two tasks jointly by combining recent advances in deep neural learning with access to textual data in electroencephalogram (EEG) reports. We evaluate the performance of our model on the problem of inferring the neurologist’s over-all impression (normal or abnormal) from electroencephalogram (EEG) reports and report an accuracy of 91.4% precision of 94.4% recall of 91.2% and F1 measure of 92.8% (a 40% improvement over the performance obtained using Doc2Vec). These promising results demonstrate the power of our approach, while error analysis reveals remaining obstacles as well as areas for future improvement. PMID:28815118

  3. Improving the extraction of crisis information in the context of flood, fire, and landslide rapid mapping using SAR and optical remote sensing data

    NASA Astrophysics Data System (ADS)

    Martinis, Sandro; Clandillon, Stephen; Twele, André; Huber, Claire; Plank, Simon; Maxant, Jérôme; Cao, Wenxi; Caspard, Mathilde; May, Stéphane

    2016-04-01

    Optical and radar satellite remote sensing have proven to provide essential crisis information in case of natural disasters, humanitarian relief activities and civil security issues in a growing number of cases through mechanisms such as the Copernicus Emergency Management Service (EMS) of the European Commission or the International Charter 'Space and Major Disasters'. The aforementioned programs and initiatives make use of satellite-based rapid mapping services aimed at delivering reliable and accurate crisis information after natural hazards. Although these services are increasingly operational, they need to be continuously updated and improved through research and development (R&D) activities. The principal objective of ASAPTERRA (Advancing SAR and Optical Methods for Rapid Mapping), the ESA-funded R&D project being described here, is to improve, automate and, hence, speed-up geo-information extraction procedures in the context of natural hazards response. This is performed through the development, implementation, testing and validation of novel image processing methods using optical and Synthetic Aperture Radar (SAR) data. The methods are mainly developed based on data of the German radar satellites TerraSAR-X and TanDEM-X, the French satellite missions Pléiades-1A/1B as well as the ESA missions Sentinel-1/2 with the aim to better characterize the potential and limitations of these sensors and their synergy. The resulting algorithms and techniques are evaluated in real case applications during rapid mapping activities. The project is focussed on three types of natural hazards: floods, landslides and fires. Within this presentation an overview of the main methodological developments in each topic is given and demonstrated in selected test areas. The following developments are presented in the context of flood mapping: a fully automated Sentinel-1 based processing chain for detecting open flood surfaces, a method for the improved detection of flooded vegetation in Sentinel-1data using Entropy/Alpha decomposition, unsupervised Wishart Classification, and object-based post-classification as well as semi-automatic approaches for extracting inundated areas and flood traces in rural and urban areas from VHR and HR optical imagery using machine learning techniques. Methodological developments related to fires are the implementation of fast and robust methods for mapping burnt scars using change detection procedures using SAR (Sentinel-1, TerraSAR-X) and HR optical (e.g. SPOT, Sentinel-2) data as well as the extraction of 3D surface and volume change information from Pléiades stereo-pairs. In the context of landslides, fast and transferable change detection procedures based on SAR (TerraSAR-X) and optical (SPOT) data as well methods for extracting the extent of landslides only based on polarimetric VHR SAR (TerraSAR-X) data are presented.

  4. Video indexing based on image and sound

    NASA Astrophysics Data System (ADS)

    Faudemay, Pascal; Montacie, Claude; Caraty, Marie-Jose

    1997-10-01

    Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel. We first present a multi-channel and multi-modal query interface, to query sound, image and script through 'pull' and 'push' queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech. We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation as experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a carton movie, 'Ivanhoe'. For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost architectures or networks of workstations.

  5. Semantic Location Extraction from Crowdsourced Data

    NASA Astrophysics Data System (ADS)

    Koswatte, S.; Mcdougall, K.; Liu, X.

    2016-06-01

    Crowdsourced Data (CSD) has recently received increased attention in many application areas including disaster management. Convenience of production and use, data currency and abundancy are some of the key reasons for attracting this high interest. Conversely, quality issues like incompleteness, credibility and relevancy prevent the direct use of such data in important applications like disaster management. Moreover, location information availability of CSD is problematic as it remains very low in many crowd sourced platforms such as Twitter. Also, this recorded location is mostly related to the mobile device or user location and often does not represent the event location. In CSD, event location is discussed descriptively in the comments in addition to the recorded location (which is generated by means of mobile device's GPS or mobile communication network). This study attempts to semantically extract the CSD location information with the help of an ontological Gazetteer and other available resources. 2011 Queensland flood tweets and Ushahidi Crowd Map data were semantically analysed to extract the location information with the support of Queensland Gazetteer which is converted to an ontological gazetteer and a global gazetteer. Some preliminary results show that the use of ontologies and semantics can improve the accuracy of place name identification of CSD and the process of location information extraction.

  6. 78 FR 77705 - Proposed Agency Information Collection Activity: Nonindigenous Aquatic Species Sighting Reporting...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-24

    ... species, valued ecosystems, and human and wildlife health. These invaders extract a huge cost, an...; and improving understanding of the ecology of invaders and factors in the resistance of habitats to...

  7. Enhancing biomedical text summarization using semantic relation extraction.

    PubMed

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  8. Text mining for adverse drug events: the promise, challenges, and state of the art.

    PubMed

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H

    2014-10-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

  9. Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art

    PubMed Central

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H.

    2014-01-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. Text mining is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text-mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance. PMID:25151493

  10. Stimulus encoding and feature extraction by multiple sensory neurons.

    PubMed

    Krahe, Rüdiger; Kreiman, Gabriel; Gabbiani, Fabrizio; Koch, Christof; Metzner, Walter

    2002-03-15

    Neighboring cells in topographical sensory maps may transmit similar information to the next higher level of processing. How information transmission by groups of nearby neurons compares with the performance of single cells is a very important question for understanding the functioning of the nervous system. To tackle this problem, we quantified stimulus-encoding and feature extraction performance by pairs of simultaneously recorded electrosensory pyramidal cells in the hindbrain of weakly electric fish. These cells constitute the output neurons of the first central nervous stage of electrosensory processing. Using random amplitude modulations (RAMs) of a mimic of the fish's own electric field within behaviorally relevant frequency bands, we found that pyramidal cells with overlapping receptive fields exhibit strong stimulus-induced correlations. To quantify the encoding of the RAM time course, we estimated the stimuli from simultaneously recorded spike trains and found significant improvements over single spike trains. The quality of stimulus reconstruction, however, was still inferior to the one measured for single primary sensory afferents. In an analysis of feature extraction, we found that spikes of pyramidal cell pairs coinciding within a time window of a few milliseconds performed significantly better at detecting upstrokes and downstrokes of the stimulus compared with isolated spikes and even spike bursts of single cells. Coincident spikes can thus be considered "distributed bursts." Our results suggest that stimulus encoding by primary sensory afferents is transformed into feature extraction at the next processing stage. There, stimulus-induced coincident activity can improve the extraction of behaviorally relevant features from the stimulus.

  11. An improved active contour model for glacial lake extraction

    NASA Astrophysics Data System (ADS)

    Zhao, H.; Chen, F.; Zhang, M.

    2017-12-01

    Active contour model is a widely used method in visual tracking and image segmentation. Under the driven of objective function, the initial curve defined in active contour model will evolve to a stable condition - a desired result in given image. As a typical region-based active contour model, C-V model has a good effect on weak boundaries detection and anti noise ability which shows great potential in glacial lake extraction. Glacial lake is a sensitive indicator for reflecting global climate change, therefore accurate delineate glacial lake boundaries is essential to evaluate hydrologic environment and living environment. However, the current method in glacial lake extraction mainly contains water index method and recognition classification method are diffcult to directly applied in large scale glacial lake extraction due to the diversity of glacial lakes and masses impacted factors in the image, such as image noise, shadows, snow and ice, etc. Regarding the abovementioned advantanges of C-V model and diffcults in glacial lake extraction, we introduce the signed pressure force function to improve the C-V model for adapting to processing of glacial lake extraction. To inspect the effect of glacial lake extraction results, three typical glacial lake development sites were selected, include Altai mountains, Centre Himalayas, South-eastern Tibet, and Landsat8 OLI imagery was conducted as experiment data source, Google earth imagery as reference data for varifying the results. The experiment consequence suggests that improved active contour model we proposed can effectively discriminate the glacial lakes from complex backgound with a higher Kappa Coefficient - 0.895, especially in some small glacial lakes which belongs to weak information in the image. Our finding provide a new approach to improved accuracy under the condition of large proportion of small glacial lakes and the possibility for automated glacial lake mapping in large-scale area.

  12. Reprint of “Non-causal spike filtering improves decoding of movement intention for intracortical BCIs”☆

    PubMed Central

    Masse, Nicolas Y.; Jarosiewicz, Beata; Simeral, John D.; Bacher, Daniel; Stavisky, Sergey D.; Cash, Sydney S.; Oakley, Erin M.; Berhanu, Etsub; Eskandar, Emad; Friehs, Gerhard; Hochberg, Leigh R.; Donoghue, John P.

    2015-01-01

    Background Multiple types of neural signals are available for controlling assistive devices through brain–computer interfaces (BCIs). Intracortically recorded spiking neural signals are attractive for BCIs because they can in principle provide greater fidelity of encoded information compared to electrocorticographic (ECoG) signals and electroencephalograms (EEGs). Recent reports show that the information content of these spiking neural signals can be reliably extracted simply by causally band-pass filtering the recorded extracellular voltage signals and then applying a spike detection threshold, without relying on “sorting” action potentials. New method We show that replacing the causal filter with an equivalent non-causal filter increases the information content extracted from the extracellular spiking signal and improves decoding of intended movement direction. This method can be used for real-time BCI applications by using a 4 ms lag between recording and filtering neural signals. Results Across 18 sessions from two people with tetraplegia enrolled in the BrainGate2 pilot clinical trial, we found that threshold crossing events extracted using this non-causal filtering method were significantly more informative of each participant’s intended cursor kinematics compared to threshold crossing events derived from causally filtered signals. This new method decreased the mean angular error between the intended and decoded cursor direction by 9.7° for participant S3, who was implanted 5.4 years prior to this study, and by 3.5° for participant T2, who was implanted 3 months prior to this study. PMID:25681017

  13. Investigation related to multispectral imaging systems

    NASA Technical Reports Server (NTRS)

    Nalepka, R. F.; Erickson, J. D.

    1974-01-01

    A summary of technical progress made during a five year research program directed toward the development of operational information systems based on multispectral sensing and the use of these systems in earth-resource survey applications is presented. Efforts were undertaken during this program to: (1) improve the basic understanding of the many facets of multispectral remote sensing, (2) develop methods for improving the accuracy of information generated by remote sensing systems, (3) improve the efficiency of data processing and information extraction techniques to enhance the cost-effectiveness of remote sensing systems, (4) investigate additional problems having potential remote sensing solutions, and (5) apply the existing and developing technology for specific users and document and transfer that technology to the remote sensing community.

  14. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org

  15. An Active Learning Framework for Hyperspectral Image Classification Using Hierarchical Segmentation

    NASA Technical Reports Server (NTRS)

    Zhang, Zhou; Pasolli, Edoardo; Crawford, Melba M.; Tilton, James C.

    2015-01-01

    Augmenting spectral data with spatial information for image classification has recently gained significant attention, as classification accuracy can often be improved by extracting spatial information from neighboring pixels. In this paper, we propose a new framework in which active learning (AL) and hierarchical segmentation (HSeg) are combined for spectral-spatial classification of hyperspectral images. The spatial information is extracted from a best segmentation obtained by pruning the HSeg tree using a new supervised strategy. The best segmentation is updated at each iteration of the AL process, thus taking advantage of informative labeled samples provided by the user. The proposed strategy incorporates spatial information in two ways: 1) concatenating the extracted spatial features and the original spectral features into a stacked vector and 2) extending the training set using a self-learning-based semi-supervised learning (SSL) approach. Finally, the two strategies are combined within an AL framework. The proposed framework is validated with two benchmark hyperspectral datasets. Higher classification accuracies are obtained by the proposed framework with respect to five other state-of-the-art spectral-spatial classification approaches. Moreover, the effectiveness of the proposed pruning strategy is also demonstrated relative to the approaches based on a fixed segmentation.

  16. [Effect of Astragalus membranaceus var. mongholicus seed extracts on seed germination and seedling growth of different Codonopsis pilosula caltiver].

    PubMed

    Guo, Feng-Xia; Wu, Zhi-Jiang; Chen, Yuan; Xi, Zhuo-Xia; Zhang, Xiao-Hu; Yao, Li-Rong; Chen, Xiang

    2012-11-01

    To reveal the allelopathy effect of Astragalus membranaceus var. mongholicus seeds and provide information for the intercrop production. The A. membranaceus. var. mongholicus seeds were soaked in distilled water for different time (12, 24, 36, 48, 60 h) , and then the seed extracts were used to study their effects on the seed germination, seedling growth and development of two Codonopsis pilosula. The A. membranaceus var. mongholicus seeds contained some allelopathy compounds. Their soaked liquid had significantly influence on the seed germination and seedling growth of C. pilosula. The seed germination rate, germination power, germination index and vigor index of two C. pilosula calrivar were improved and then inhabited with soaking time elongation. The extract soaking for 24 h significantly improved the germination traits but the extract for 60 h appeared different degrees of inhibiting vigor. The seed extracts soaking ranging between 12 and 60 h all significantly improved the above plant growth of C. pilosula but significant inhibited their radicle growth in length. And with the soaking time elongation the facilitation effect weakened and the inhibiting effect enhanced, especially more significant in the C. pilosula caltivar (Baitiaodangshen). The A. membranaceus var. mongholicus seeds have allelopathic compounds and the endogenous inhibitor can be extracted when soaked for more than 24 h in water with intact seeds, resulting in improvement of seed germination rate. The C. pilosula could be intercropped in A. membranaceus var. mongholicus field, however, when intercroped it should notice that the intercrop proportion should vary with the caltivar.

  17. Remote Sensing Information Sciences Research Group: Santa Barbara Information Sciences Research Group, year 4

    NASA Technical Reports Server (NTRS)

    Estes, John E.; Smith, Terence; Star, Jeffrey L.

    1987-01-01

    Information Sciences Research Group (ISRG) research continues to focus on improving the type, quantity, and quality of information which can be derived from remotely sensed data. Particular focus in on the needs of the remote sensing research and application science community which will be served by the Earth Observing System (EOS) and Space Station, including associated polar and co-orbiting platforms. The areas of georeferenced information systems, machine assisted information extraction from image data, artificial intelligence and both natural and cultural vegetation analysis and modeling research will be expanded.

  18. Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

    PubMed

    Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

    2015-06-06

    Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.

  19. [Technologies for Complex Intelligent Clinical Data Analysis].

    PubMed

    Baranov, A A; Namazova-Baranova, L S; Smirnov, I V; Devyatkin, D A; Shelmanov, A O; Vishneva, E A; Antonova, E V; Smirnov, V I

    2016-01-01

    The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient's features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented. Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality. the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center. Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as "negation" (indicates that the disease is absent), "no patient" (indicates that the disease refers to the patient's family member, but not to the patient), "severity of illness", disease course", "body region to which the disease refers". Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the methodfor determining the most informative patients'features are also proposed. Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records ofpatients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases. The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare.

  20. Figure Text Extraction in Biomedical Literature

    PubMed Central

    Kim, Daehyun; Yu, Hong

    2011-01-01

    Background Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. Methodology We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. Results/Conclusions The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search. PMID:21249186

  1. Figure text extraction in biomedical literature.

    PubMed

    Kim, Daehyun; Yu, Hong

    2011-01-13

    Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search.

  2. Improving PERSIANN-CCS rain estimation using probabilistic approach and multi-sensors information

    NASA Astrophysics Data System (ADS)

    Karbalaee, N.; Hsu, K. L.; Sorooshian, S.; Kirstetter, P.; Hong, Y.

    2016-12-01

    This presentation discusses the recent implemented approaches to improve the rainfall estimation from Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network-Cloud Classification System (PERSIANN-CCS). PERSIANN-CCS is an infrared (IR) based algorithm being integrated in the IMERG (Integrated Multi-Satellite Retrievals for the Global Precipitation Mission GPM) to create a precipitation product in 0.1x0.1degree resolution over the chosen domain 50N to 50S every 30 minutes. Although PERSIANN-CCS has a high spatial and temporal resolution, it overestimates or underestimates due to some limitations.PERSIANN-CCS can estimate rainfall based on the extracted information from IR channels at three different temperature threshold levels (220, 235, and 253k). This algorithm relies only on infrared data to estimate rainfall indirectly from this channel which cause missing the rainfall from warm clouds and false estimation for no precipitating cold clouds. In this research the effectiveness of using other channels of GOES satellites such as visible and water vapors has been investigated. By using multi-sensors the precipitation can be estimated based on the extracted information from multiple channels. Also, instead of using the exponential function for estimating rainfall from cloud top temperature, the probabilistic method has been used. Using probability distributions of precipitation rates instead of deterministic values has improved the rainfall estimation for different type of clouds.

  3. The application of machine vision in fire protection system

    NASA Astrophysics Data System (ADS)

    Rong, Jiang

    2018-04-01

    Based on the previous research, this paper introduces the theory of wavelet, collects the situation through the video system, and calculates the key information needed in the fire protection system. That is, through the algorithm to collect the information, according to the flame color characteristics and smoke characteristics were extracted, and as the characteristic information corresponding processing. Alarm system set the corresponding alarm threshold, when more than this alarm threshold, the system will alarm. This combination of flame color characteristics and smoke characteristics of the fire method not only improve the accuracy of judgment, but also improve the efficiency of judgments. Experiments show that the scheme is feasible.

  4. Ultrasound guided fluorescence molecular tomography with improved quantification by an attenuation compensated born-normalization and in vivo preclinical study of cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Baoqiang; Berti, Romain; Abran, Maxime

    2014-05-15

    Ultrasound imaging, having the advantages of low-cost and non-invasiveness over MRI and X-ray CT, was reported by several studies as an adequate complement to fluorescence molecular tomography with the perspective of improving localization and quantification of fluorescent molecular targets in vivo. Based on the previous work, an improved dual-modality Fluorescence-Ultrasound imaging system was developed and then validated in imaging study with preclinical tumor model. Ultrasound imaging and a profilometer were used to obtain the anatomical prior information and 3D surface, separately, to precisely extract the tissue boundary on both sides of sample in order to achieve improved fluorescence reconstruction. Furthermore,more » a pattern-based fluorescence reconstruction on the detection side was incorporated to enable dimensional reduction of the dataset while keeping the useful information for reconstruction. Due to its putative role in the current imaging geometry and the chosen reconstruction technique, we developed an attenuation compensated Born-normalization method to reduce the attenuation effects and cancel off experimental factors when collecting quantitative fluorescence datasets over large area. Results of both simulation and phantom study demonstrated that fluorescent targets could be recovered accurately and quantitatively using this reconstruction mechanism. Finally, in vivo experiment confirms that the imaging system associated with the proposed image reconstruction approach was able to extract both functional and anatomical information, thereby improving quantification and localization of molecular targets.« less

  5. A Novel Multilayer Correlation Maximization Model for Improving CCA-Based Frequency Recognition in SSVEP Brain-Computer Interface.

    PubMed

    Jiao, Yong; Zhang, Yu; Wang, Yu; Wang, Bei; Jin, Jing; Wang, Xingyu

    2018-05-01

    Multiset canonical correlation analysis (MsetCCA) has been successfully applied to optimize the reference signals by extracting common features from multiple sets of electroencephalogram (EEG) for steady-state visual evoked potential (SSVEP) recognition in brain-computer interface application. To avoid extracting the possible noise components as common features, this study proposes a sophisticated extension of MsetCCA, called multilayer correlation maximization (MCM) model for further improving SSVEP recognition accuracy. MCM combines advantages of both CCA and MsetCCA by carrying out three layers of correlation maximization processes. The first layer is to extract the stimulus frequency-related information in using CCA between EEG samples and sine-cosine reference signals. The second layer is to learn reference signals by extracting the common features with MsetCCA. The third layer is to re-optimize the reference signals set in using CCA with sine-cosine reference signals again. Experimental study is implemented to validate effectiveness of the proposed MCM model in comparison with the standard CCA and MsetCCA algorithms. Superior performance of MCM demonstrates its promising potential for the development of an improved SSVEP-based brain-computer interface.

  6. Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

    PubMed Central

    Sun, Wencheng; Li, Yangyang; Liu, Fang; Fang, Shengqun; Wang, Guoyan

    2018-01-01

    Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work. PMID:29849998

  7. Ionization Electron Signal Processing in Single Phase LArTPCs II. Data/Simulation Comparison and Performance in MicroBooNE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Adams, C.; et al.

    The single-phase liquid argon time projection chamber (LArTPC) provides a large amount of detailed information in the form of fine-grained drifted ionization charge from particle traces. To fully utilize this information, the deposited charge must be accurately extracted from the raw digitized waveforms via a robust signal processing chain. Enabled by the ultra-low noise levels associated with cryogenic electronics in the MicroBooNE detector, the precise extraction of ionization charge from the induction wire planes in a single-phase LArTPC is qualitatively demonstrated on MicroBooNE data with event display images, and quantitatively demonstrated via waveform-level and track-level metrics. Improved performance of inductionmore » plane calorimetry is demonstrated through the agreement of extracted ionization charge measurements across different wire planes for various event topologies. In addition to the comprehensive waveform-level comparison of data and simulation, a calibration of the cryogenic electronics response is presented and solutions to various MicroBooNE-specific TPC issues are discussed. This work presents an important improvement in LArTPC signal processing, the foundation of reconstruction and therefore physics analyses in MicroBooNE.« less

  8. Informatics in radiology: automated Web-based graphical dashboard for radiology operational business intelligence.

    PubMed

    Nagy, Paul G; Warnock, Max J; Daly, Mark; Toland, Christopher; Meenan, Christopher D; Mezrich, Reuben S

    2009-11-01

    Radiology departments today are faced with many challenges to improve operational efficiency, performance, and quality. Many organizations rely on antiquated, paper-based methods to review their historical performance and understand their operations. With increased workloads, geographically dispersed image acquisition and reading sites, and rapidly changing technologies, this approach is increasingly untenable. A Web-based dashboard was constructed to automate the extraction, processing, and display of indicators and thereby provide useful and current data for twice-monthly departmental operational meetings. The feasibility of extracting specific metrics from clinical information systems was evaluated as part of a longer-term effort to build a radiology business intelligence architecture. Operational data were extracted from clinical information systems and stored in a centralized data warehouse. Higher-level analytics were performed on the centralized data, a process that generated indicators in a dynamic Web-based graphical environment that proved valuable in discussion and root cause analysis. Results aggregated over a 24-month period since implementation suggest that this operational business intelligence reporting system has provided significant data for driving more effective management decisions to improve productivity, performance, and quality of service in the department.

  9. Remote sensing and extractable biological resources

    NASA Technical Reports Server (NTRS)

    Cronin, L. E.

    1972-01-01

    The nature and quantity of extractable biological resources available in the Chesapeake Bay are discussed. The application of miniaturized radio sensors to track the movement of fish and birds is described. The specific uses of remote sensors for detecting and mapping areas of algae, red tide, thermal pollution, and vegetation beds are presented. The necessity for obtaining information on the physical, chemical, and meteorological features of the entire bay in order to provide improved resources management is emphasized.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiskoot, R.J.J.

    Accurate and reliable sampling systems are imperative when confirming natural gas' commercial value. Buyers and sellers need accurate hydrocarbon-composition information to conduct fair sale transactions. Because of poor sample extraction, preparation or analysis can invalidate the sale, more attention should be directed toward improving representative sampling. Consider all sampling components, i.e., gas types, line pressure and temperature, equipment maintenance and service needs, etc. The paper discusses gas sampling, design considerations (location, probe type, extraction devices, controller, and receivers), operating requirements, and system integration.

  11. The utility of an automated electronic system to monitor and audit transfusion practice.

    PubMed

    Grey, D E; Smith, V; Villanueva, G; Richards, B; Augustson, B; Erber, W N

    2006-05-01

    Transfusion laboratories with transfusion committees have a responsibility to monitor transfusion practice and generate improvements in clinical decision-making and red cell usage. However, this can be problematic and expensive because data cannot be readily extracted from most laboratory information systems. To overcome this problem, we developed and introduced a system to electronically extract and collate extensive amounts of data from two laboratory information systems and to link it with ICD10 clinical codes in a new database using standard information technology. Three data files were generated from two laboratory information systems, ULTRA (version 3.2) and TM, using standard information technology scripts. These were patient pre- and post-transfusion haemoglobin, blood group and antibody screen, and cross match and transfusion data. These data together with ICD10 codes for surgical cases were imported into an MS ACCESS database and linked by means of a unique laboratory number. Queries were then run to extract the relevant information and processed in Microsoft Excel for graphical presentation. We assessed the utility of this data extraction system to audit transfusion practice in a 600-bed adult tertiary hospital over an 18-month period. A total of 52 MB of data were extracted from the two laboratory information systems for the 18-month period and together with 2.0 MB theatre ICD10 data enabled case-specific transfusion information to be generated. The audit evaluated 15,992 blood group and antibody screens, 25,344 cross-matched red cell units and 15,455 transfused red cell units. Data evaluated included cross-matched to transfusion ratios and pre- and post-transfusion haemoglobin levels for a range of clinical diagnoses. Data showed significant differences between clinical units and by ICD10 code. This method to electronically extract large amounts of data and linkage with clinical databases has provided a powerful and sustainable tool for monitoring transfusion practice. It has been successfully used to identify areas requiring education, training and clinical guidance and allows for comparison with national haemoglobin-based transfusion guidelines.

  12. Induced lexico-syntactic patterns improve information extraction from online medical forums.

    PubMed

    Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D

    2014-01-01

    To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  13. Remote Sensing Information Sciences Research Group, Santa Barbara Information Sciences Research Group, year 3

    NASA Technical Reports Server (NTRS)

    Estes, J. E.; Smith, T.; Star, J. L.

    1986-01-01

    Research continues to focus on improving the type, quantity, and quality of information which can be derived from remotely sensed data. The focus is on remote sensing and application for the Earth Observing System (Eos) and Space Station, including associated polar and co-orbiting platforms. The remote sensing research activities are being expanded, integrated, and extended into the areas of global science, georeferenced information systems, machine assissted information extraction from image data, and artificial intelligence. The accomplishments in these areas are examined.

  14. A tutorial on information retrieval: basic terms and concepts

    PubMed Central

    Zhou, Wei; Smalheiser, Neil R; Yu, Clement

    2006-01-01

    This informal tutorial is intended for investigators and students who would like to understand the workings of information retrieval systems, including the most frequently used search engines: PubMed and Google. Having a basic knowledge of the terms and concepts of information retrieval should improve the efficiency and productivity of searches. As well, this knowledge is needed in order to follow current research efforts in biomedical information retrieval and text mining that are developing new systems not only for finding documents on a given topic, but extracting and integrating knowledge across documents. PMID:16722601

  15. Location of core diagnostic information across various sequences in brain MRI and implications for efficiency of MRI scanner utilization.

    PubMed

    Sharma, Aseem; Chatterjee, Arindam; Goyal, Manu; Parsons, Matthew S; Bartel, Seth

    2015-04-01

    Targeting redundancy within MRI can improve its cost-effective utilization. We sought to quantify potential redundancy in our brain MRI protocols. In this retrospective review, we aggregated 207 consecutive adults who underwent brain MRI and reviewed their medical records to document clinical indication, core diagnostic information provided by MRI, and its clinical impact. Contributory imaging abnormalities constituted positive core diagnostic information whereas absence of imaging abnormalities constituted negative core diagnostic information. The senior author selected core sequences deemed sufficient for extraction of core diagnostic information. For validating core sequences selection, four readers assessed the relative ease of extracting core diagnostic information from the core sequences. Potential redundancy was calculated by comparing the average number of core sequences to the average number of sequences obtained. Scanning had been performed using 9.4±2.8 sequences over 37.3±12.3 minutes. Core diagnostic information was deemed extractable from 2.1±1.1 core sequences, with an assumed scanning time of 8.6±4.8 minutes, reflecting a potential redundancy of 74.5%±19.1%. Potential redundancy was least in scans obtained for treatment planning (14.9%±25.7%) and highest in scans obtained for follow-up of benign diseases (81.4%±12.6%). In 97.4% of cases, all four readers considered core diagnostic information to be either easily extractable from core sequences or the ease to be equivalent to that from the entire study. With only one MRI lacking clinical impact (0.48%), overutilization did not seem to contribute to potential redundancy. High potential redundancy that can be targeted for more efficient scanner utilization exists in brain MRI protocols.

  16. A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients.

    PubMed

    Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid

    2015-01-01

    In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.

  17. Application of Natural Language Processing and Network Analysis Techniques to Post-market Reports for the Evaluation of Dose-related Anti-Thymocyte Globulin Safety Patterns.

    PubMed

    Botsis, Taxiarchis; Foster, Matthew; Arya, Nina; Kreimeyer, Kory; Pandey, Abhishek; Arya, Deepa

    2017-04-26

    To evaluate the feasibility of automated dose and adverse event information retrieval in supporting the identification of safety patterns. We extracted all rabbit Anti-Thymocyte Globulin (rATG) reports submitted to the United States Food and Drug Administration Adverse Event Reporting System (FAERS) from the product's initial licensure in April 16, 1984 through February 8, 2016. We processed the narratives using the Medication Extraction (MedEx) and the Event-based Text-mining of Health Electronic Records (ETHER) systems and retrieved the appropriate medication, clinical, and temporal information. When necessary, the extracted information was manually curated. This process resulted in a high quality dataset that was analyzed with the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA) to explore the association of rATG dosing with post-transplant lymphoproliferative disorder (PTLD). Although manual curation was necessary to improve the data quality, MedEx and ETHER supported the extraction of the appropriate information. We created a final dataset of 1,380 cases with complete information for rATG dosing and date of administration. Analysis in PANACEA found that PTLD was associated with cumulative doses of rATG >8 mg/kg, even in periods where most of the submissions to FAERS reported low doses of rATG. We demonstrated the feasibility of investigating a dose-related safety pattern for a particular product in FAERS using a set of automated tools.

  18. A novel approach to studying co-evolution of understanding and research: Family bereavement and the potential for organ donation as a case study

    PubMed Central

    Dicks, Sean G; Ranse, Kristen; Northam, Holly; van Haren, Frank MP; Boer, Douglas P

    2018-01-01

    A novel approach to data extraction and synthesis was used to explore the connections between research priorities, understanding and practice improvement associated with family bereavement in the context of the potential for organ donation. Conducting the review as a qualitative longitudinal study highlighted changes over time, and extraction of citation-related data facilitated an analysis of the interaction in this field. It was found that lack of ‘communication’ between researchers contributes to information being ‘lost’ and then later ‘rediscovered’. It is recommended that researchers should plan early for dissemination and practice improvement to ensure that research contributes to change. PMID:29399367

  19. Natural Language Processing in Radiology: A Systematic Review.

    PubMed

    Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A

    2016-05-01

    Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.

  20. Enhancing Biomedical Text Summarization Using Semantic Relation Extraction

    PubMed Central

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336

  1. Dimensionality-varied convolutional neural network for spectral-spatial classification of hyperspectral data

    NASA Astrophysics Data System (ADS)

    Liu, Wanjun; Liang, Xuejian; Qu, Haicheng

    2017-11-01

    Hyperspectral image (HSI) classification is one of the most popular topics in remote sensing community. Traditional and deep learning-based classification methods were proposed constantly in recent years. In order to improve the classification accuracy and robustness, a dimensionality-varied convolutional neural network (DVCNN) was proposed in this paper. DVCNN was a novel deep architecture based on convolutional neural network (CNN). The input of DVCNN was a set of 3D patches selected from HSI which contained spectral-spatial joint information. In the following feature extraction process, each patch was transformed into some different 1D vectors by 3D convolution kernels, which were able to extract features from spectral-spatial data. The rest of DVCNN was about the same as general CNN and processed 2D matrix which was constituted by by all 1D data. So that the DVCNN could not only extract more accurate and rich features than CNN, but also fused spectral-spatial information to improve classification accuracy. Moreover, the robustness of network on water-absorption bands was enhanced in the process of spectral-spatial fusion by 3D convolution, and the calculation was simplified by dimensionality varied convolution. Experiments were performed on both Indian Pines and Pavia University scene datasets, and the results showed that the classification accuracy of DVCNN improved by 32.87% on Indian Pines and 19.63% on Pavia University scene than spectral-only CNN. The maximum accuracy improvement of DVCNN achievement was 13.72% compared with other state-of-the-art HSI classification methods, and the robustness of DVCNN on water-absorption bands noise was demonstrated.

  2. An improved automated procedure for informal and temporary dwellings detection and enumeration, using mathematical morphology operators on VHR satellite data

    NASA Astrophysics Data System (ADS)

    Jenerowicz, Małgorzata; Kemper, Thomas

    2016-10-01

    Every year thousands of people are displaced by conflicts or natural disasters and often gather in large camps. Knowing how many people have been gathered is crucial for an efficient relief operation. However, it is often difficult to collect exact information on the total number of the population. This paper presents the improved morphological methodology for the estimation of dwellings structures located in several Internally Displaced Persons (IDPs) Camps, based on Very High Resolution (VHR) multispectral satellite imagery with pixel sizes of 1 meter or less including GeoEye-1, WorldView-2, QuickBird-2, Ikonos-2, Pléiades-A and Pléiades-B. The main topic of this paper is the approach enhancement with selection of feature extraction algorithm, the improvement and automation of pre-processing and results verification. For the informal and temporary dwellings extraction purpose the high quality of data has to be ensured. The pre-processing has been extended by including the input data hierarchy level assignment and data fusion method selection and evaluation. The feature extraction algorithm follows the procedure presented in Jenerowicz, M., Kemper, T., 2011. Optical data are analysed in a cyclic approach comprising image segmentation, geometrical, textural and spectral class modeling aiming at camp area identification. The successive steps of morphological processing have been combined in a one stand-alone application for automatic dwellings detection and enumeration. Actively implemented, these approaches can provide a reliable and consistent results, independent of the imaging satellite type and different study sites location, providing decision support in emergency response for the humanitarian community like United Nations, European Union and Non-Governmental relief organizations.

  3. Differential Nuclear and Mitochondrial DNA Preservation in Post-Mortem Teeth with Implications for Forensic and Ancient DNA Studies

    PubMed Central

    Higgins, Denice; Rohrlach, Adam B.; Kaidonis, John; Townsend, Grant; Austin, Jeremy J.

    2015-01-01

    Major advances in genetic analysis of skeletal remains have been made over the last decade, primarily due to improvements in post-DNA-extraction techniques. Despite this, a key challenge for DNA analysis of skeletal remains is the limited yield of DNA recovered from these poorly preserved samples. Enhanced DNA recovery by improved sampling and extraction techniques would allow further advancements. However, little is known about the post-mortem kinetics of DNA degradation and whether the rate of degradation varies between nuclear and mitochondrial DNA or across different skeletal tissues. This knowledge, along with information regarding ante-mortem DNA distribution within skeletal elements, would inform sampling protocols facilitating development of improved extraction processes. Here we present a combined genetic and histological examination of DNA content and rates of DNA degradation in the different tooth tissues of 150 human molars over short-medium post-mortem intervals. DNA was extracted from coronal dentine, root dentine, cementum and pulp of 114 teeth via a silica column method and the remaining 36 teeth were examined histologically. Real time quantification assays based on two nuclear DNA fragments (67 bp and 156 bp) and one mitochondrial DNA fragment (77 bp) showed nuclear and mitochondrial DNA degraded exponentially, but at different rates, depending on post-mortem interval and soil temperature. In contrast to previous studies, we identified differential survival of nuclear and mtDNA in different tooth tissues. Futhermore histological examination showed pulp and dentine were rapidly affected by loss of structural integrity, and pulp was completely destroyed in a relatively short time period. Conversely, cementum showed little structural change over the same time period. Finally, we confirm that targeted sampling of cementum from teeth buried for up to 16 months can provide a reliable source of nuclear DNA for STR-based genotyping using standard extraction methods, without the need for specialised equipment or large-volume demineralisation steps. PMID:25992635

  4. A color fusion method of infrared and low-light-level images based on visual perception

    NASA Astrophysics Data System (ADS)

    Han, Jing; Yan, Minmin; Zhang, Yi; Bai, Lianfa

    2014-11-01

    The color fusion images can be obtained through the fusion of infrared and low-light-level images, which will contain both the information of the two. The fusion images can help observers to understand the multichannel images comprehensively. However, simple fusion may lose the target information due to inconspicuous targets in long-distance infrared and low-light-level images; and if targets extraction is adopted blindly, the perception of the scene information will be affected seriously. To solve this problem, a new fusion method based on visual perception is proposed in this paper. The extraction of the visual targets ("what" information) and parallel processing mechanism are applied in traditional color fusion methods. The infrared and low-light-level color fusion images are achieved based on efficient typical targets learning. Experimental results show the effectiveness of the proposed method. The fusion images achieved by our algorithm can not only improve the detection rate of targets, but also get rich natural information of the scenes.

  5. Semantic information extracting system for classification of radiological reports in radiology information system (RIS)

    NASA Astrophysics Data System (ADS)

    Shi, Liehang; Ling, Tonghui; Zhang, Jianguo

    2016-03-01

    Radiologists currently use a variety of terminologies and standards in most hospitals in China, and even there are multiple terminologies being used for different sections in one department. In this presentation, we introduce a medical semantic comprehension system (MedSCS) to extract semantic information about clinical findings and conclusion from free text radiology reports so that the reports can be classified correctly based on medical terms indexing standards such as Radlex or SONMED-CT. Our system (MedSCS) is based on both rule-based methods and statistics-based methods which improve the performance and the scalability of MedSCS. In order to evaluate the over all of the system and measure the accuracy of the outcomes, we developed computation methods to calculate the parameters of precision rate, recall rate, F-score and exact confidence interval.

  6. Multiplexed Sequence Encoding: A Framework for DNA Communication.

    PubMed

    Zakeri, Bijan; Carr, Peter A; Lu, Timothy K

    2016-01-01

    Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication-data encoding, data transfer & data extraction-and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system-Multiplexed Sequence Encoding (MuSE)-that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA.

  7. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    PubMed

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Scene text recognition in mobile applications by character descriptor and structure configuration.

    PubMed

    Yi, Chucai; Tian, Yingli

    2014-07-01

    Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.

  9. A sensitive continuum analysis method for gamma ray spectra

    NASA Technical Reports Server (NTRS)

    Thakur, Alakh N.; Arnold, James R.

    1993-01-01

    In this work we examine ways to improve the sensitivity of the analysis procedure for gamma ray spectra with respect to small differences in the continuum (Compton) spectra. The method developed is applied to analyze gamma ray spectra obtained from planetary mapping by the Mars Observer spacecraft launched in September 1992. Calculated Mars simulation spectra and actual thick target bombardment spectra have been taken as test cases. The principle of the method rests on the extraction of continuum information from Fourier transforms of the spectra. We study how a better estimate of the spectrum from larger regions of the Mars surface will improve the analysis for smaller regions with poorer statistics. Estimation of signal within the continuum is done in the frequency domain which enables efficient and sensitive discrimination of subtle differences between two spectra. The process is compared to other methods for the extraction of information from the continuum. Finally we explore briefly the possible uses of this technique in other applications of continuum spectra.

  10. Framework for automatic information extraction from research papers on nanocrystal devices

    PubMed Central

    Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

    2015-01-01

    Summary To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system. PMID:26665057

  11. Framework for automatic information extraction from research papers on nanocrystal devices.

    PubMed

    Dieb, Thaer M; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

    2015-01-01

    To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.

  12. Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.

    PubMed

    Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B

    2017-01-01

    NLP-PIER (Natural Language Processing - Patient Information Extraction for Research) is a self-service platform with a search engine for clinical researchers to perform natural language processing (NLP) queries using clinical notes. We conducted user-centered testing of NLP-PIER's usability to inform future design decisions. Quantitative and qualitative data were analyzed. Our findings will be used to improve the usability of NLP-PIER.

  13. BilKristal 2.0: A tool for pattern information extraction from crystal structures

    NASA Astrophysics Data System (ADS)

    Okuyan, Erhan; Güdükbay, Uğur

    2014-01-01

    We present a revised version of the BilKristal tool of Okuyan et al. (2007). We converted the development environment into Microsoft Visual Studio 2005 in order to resolve compatibility issues. We added multi-core CPU support and improvements are made to graphics functions in order to improve performance. Discovered bugs are fixed and exporting functionality to a material visualization tool is added.

  14. Embedding strategies for effective use of information from multiple sequence alignments.

    PubMed Central

    Henikoff, S.; Henikoff, J. G.

    1997-01-01

    We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452

  15. Phrase-based Multimedia Information Extraction

    DTIC Science & Technology

    2006-07-01

    names with periods — J. K. Ramirez, T. Grant Smith, Lita S. Jones; names with commas — Hector Jones, Jr.; and conjoined names, such as Sherlock and Judy... Holmes . Using both the type and token metrics (described above), we tested these extensions and improvements to the name identification module on

  16. Time-dependent analysis of dosage delivery information for patient-controlled analgesia services.

    PubMed

    Kuo, I-Ting; Chang, Kuang-Yi; Juan, De-Fong; Hsu, Steen J; Chan, Chia-Tai; Tsou, Mei-Yung

    2018-01-01

    Pain relief always plays the essential part of perioperative care and an important role of medical quality improvement. Patient-controlled analgesia (PCA) is a method that allows a patient to self-administer small boluses of analgesic to relieve the subjective pain. PCA logs from the infusion pump consisted of a lot of text messages which record all events during the therapies. The dosage information can be extracted from PCA logs to provide easily understanding features. The analysis of dosage information with time has great help to figure out the variance of a patient's pain relief condition. To explore the trend of pain relief requirement, we developed a PCA dosage information generator (PCA DIG) to extract meaningful messages from PCA logs during the first 48 hours of therapies. PCA dosage information including consumption, delivery, infusion rate, and the ratio between demand and delivery is presented with corresponding values in 4 successive time frames. Time-dependent statistical analysis demonstrated the trends of analgesia requirements decreased gradually along with time. These findings are compatible with clinical observations and further provide valuable information about the strategy to customize postoperative pain management.

  17. Waterbodies Extraction from LANDSAT8-OLI Imagery Using Awater Indexs-Guied Stochastic Fully-Connected Conditional Random Field Model and the Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Wang, X.; Xu, L.

    2018-04-01

    One of the most important applications of remote sensing classification is water extraction. The water index (WI) based on Landsat images is one of the most common ways to distinguish water bodies from other land surface features. But conventional WI methods take into account spectral information only form a limited number of bands, and therefore the accuracy of those WI methods may be constrained in some areas which are covered with snow/ice, clouds, etc. An accurate and robust water extraction method is the key to the study at present. The support vector machine (SVM) using all bands spectral information can reduce for these classification error to some extent. Nevertheless, SVM which barely considers spatial information is relatively sensitive to noise in local regions. Conditional random field (CRF) which considers both spatial information and spectral information has proven to be able to compensate for these limitations. Hence, in this paper, we develop a systematic water extraction method by taking advantage of the complementarity between the SVM and a water index-guided stochastic fully-connected conditional random field (SVM-WIGSFCRF) to address the above issues. In addition, we comprehensively evaluate the reliability and accuracy of the proposed method using Landsat-8 operational land imager (OLI) images of one test site. We assess the method's performance by calculating the following accuracy metrics: Omission Errors (OE) and Commission Errors (CE); Kappa coefficient (KP) and Total Error (TE). Experimental results show that the new method can improve target detection accuracy under complex and changeable environments.

  18. Time to consider sharing data extracted from trials included in systematic reviews.

    PubMed

    Wolfenden, Luke; Grimshaw, Jeremy; Williams, Christopher M; Yoong, Sze Lin

    2016-11-03

    While the debate regarding shared clinical trial data has shifted from whether such data should be shared to how this is best achieved, the sharing of data collected as part of systematic reviews has received little attention. In this commentary, we discuss the potential benefits of coordinated efforts to share data collected as part of systematic reviews. There are a number of potential benefits of systematic review data sharing. Shared information and data obtained as part of the systematic review process may reduce unnecessary duplication, reduce demand on trialist to service repeated requests from reviewers for data, and improve the quality and efficiency of future reviews. Sharing also facilitates research to improve clinical trial and systematic review methods and supports additional analyses to address secondary research questions. While concerns regarding appropriate use of data, costs, or the academic return for original review authors may impede more open access to information extracted as part of systematic reviews, many of these issues are being addressed, and infrastructure to enable greater access to such information is being developed. Embracing systems to enable more open access to systematic review data has considerable potential to maximise the benefits of research investment in undertaking systematic reviews.

  19. The 2D analytic signal for envelope detection and feature extraction on ultrasound images.

    PubMed

    Wachinger, Christian; Klein, Tassilo; Navab, Nassir

    2012-08-01

    The fundamental property of the analytic signal is the split of identity, meaning the separation of qualitative and quantitative information in form of the local phase and the local amplitude, respectively. Especially the structural representation, independent of brightness and contrast, of the local phase is interesting for numerous image processing tasks. Recently, the extension of the analytic signal from 1D to 2D, covering also intrinsic 2D structures, was proposed. We show the advantages of this improved concept on ultrasound RF and B-mode images. Precisely, we use the 2D analytic signal for the envelope detection of RF data. This leads to advantages for the extraction of the information-bearing signal from the modulated carrier wave. We illustrate this, first, by visual assessment of the images, and second, by performing goodness-of-fit tests to a Nakagami distribution, indicating a clear improvement of statistical properties. The evaluation is performed for multiple window sizes and parameter estimation techniques. Finally, we show that the 2D analytic signal allows for an improved estimation of local features on B-mode images. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Extraction and fusion of spectral parameters for face recognition

    NASA Astrophysics Data System (ADS)

    Boisier, B.; Billiot, B.; Abdessalem, Z.; Gouton, P.; Hardeberg, J. Y.

    2011-03-01

    Many methods have been developed in image processing for face recognition, especially in recent years with the increase of biometric technologies. However, most of these techniques are used on grayscale images acquired in the visible range of the electromagnetic spectrum. The aims of our study are to improve existing tools and to develop new methods for face recognition. The techniques used take advantage of the different spectral ranges, the visible, optical infrared and thermal infrared, by either combining them or analyzing them separately in order to extract the most appropriate information for face recognition. We also verify the consistency of several keypoints extraction techniques in the Near Infrared (NIR) and in the Visible Spectrum.

  1. Extracellular voltage threshold settings can be tuned for optimal encoding of movement and stimulus parameters

    NASA Astrophysics Data System (ADS)

    Oby, Emily R.; Perel, Sagi; Sadtler, Patrick T.; Ruff, Douglas A.; Mischel, Jessica L.; Montez, David F.; Cohen, Marlene R.; Batista, Aaron P.; Chase, Steven M.

    2016-06-01

    Objective. A traditional goal of neural recording with extracellular electrodes is to isolate action potential waveforms of an individual neuron. Recently, in brain-computer interfaces (BCIs), it has been recognized that threshold crossing events of the voltage waveform also convey rich information. To date, the threshold for detecting threshold crossings has been selected to preserve single-neuron isolation. However, the optimal threshold for single-neuron identification is not necessarily the optimal threshold for information extraction. Here we introduce a procedure to determine the best threshold for extracting information from extracellular recordings. We apply this procedure in two distinct contexts: the encoding of kinematic parameters from neural activity in primary motor cortex (M1), and visual stimulus parameters from neural activity in primary visual cortex (V1). Approach. We record extracellularly from multi-electrode arrays implanted in M1 or V1 in monkeys. Then, we systematically sweep the voltage detection threshold and quantify the information conveyed by the corresponding threshold crossings. Main Results. The optimal threshold depends on the desired information. In M1, velocity is optimally encoded at higher thresholds than speed; in both cases the optimal thresholds are lower than are typically used in BCI applications. In V1, information about the orientation of a visual stimulus is optimally encoded at higher thresholds than is visual contrast. A conceptual model explains these results as a consequence of cortical topography. Significance. How neural signals are processed impacts the information that can be extracted from them. Both the type and quality of information contained in threshold crossings depend on the threshold setting. There is more information available in these signals than is typically extracted. Adjusting the detection threshold to the parameter of interest in a BCI context should improve our ability to decode motor intent, and thus enhance BCI control. Further, by sweeping the detection threshold, one can gain insights into the topographic organization of the nearby neural tissue.

  2. Extracellular voltage threshold settings can be tuned for optimal encoding of movement and stimulus parameters

    PubMed Central

    Oby, Emily R; Perel, Sagi; Sadtler, Patrick T; Ruff, Douglas A; Mischel, Jessica L; Montez, David F; Cohen, Marlene R; Batista, Aaron P; Chase, Steven M

    2018-01-01

    Objective A traditional goal of neural recording with extracellular electrodes is to isolate action potential waveforms of an individual neuron. Recently, in brain–computer interfaces (BCIs), it has been recognized that threshold crossing events of the voltage waveform also convey rich information. To date, the threshold for detecting threshold crossings has been selected to preserve single-neuron isolation. However, the optimal threshold for single-neuron identification is not necessarily the optimal threshold for information extraction. Here we introduce a procedure to determine the best threshold for extracting information from extracellular recordings. We apply this procedure in two distinct contexts: the encoding of kinematic parameters from neural activity in primary motor cortex (M1), and visual stimulus parameters from neural activity in primary visual cortex (V1). Approach We record extracellularly from multi-electrode arrays implanted in M1 or V1 in monkeys. Then, we systematically sweep the voltage detection threshold and quantify the information conveyed by the corresponding threshold crossings. Main Results The optimal threshold depends on the desired information. In M1, velocity is optimally encoded at higher thresholds than speed; in both cases the optimal thresholds are lower than are typically used in BCI applications. In V1, information about the orientation of a visual stimulus is optimally encoded at higher thresholds than is visual contrast. A conceptual model explains these results as a consequence of cortical topography. Significance How neural signals are processed impacts the information that can be extracted from them. Both the type and quality of information contained in threshold crossings depend on the threshold setting. There is more information available in these signals than is typically extracted. Adjusting the detection threshold to the parameter of interest in a BCI context should improve our ability to decode motor intent, and thus enhance BCI control. Further, by sweeping the detection threshold, one can gain insights into the topographic organization of the nearby neural tissue. PMID:27097901

  3. Extracellular voltage threshold settings can be tuned for optimal encoding of movement and stimulus parameters.

    PubMed

    Oby, Emily R; Perel, Sagi; Sadtler, Patrick T; Ruff, Douglas A; Mischel, Jessica L; Montez, David F; Cohen, Marlene R; Batista, Aaron P; Chase, Steven M

    2016-06-01

    A traditional goal of neural recording with extracellular electrodes is to isolate action potential waveforms of an individual neuron. Recently, in brain-computer interfaces (BCIs), it has been recognized that threshold crossing events of the voltage waveform also convey rich information. To date, the threshold for detecting threshold crossings has been selected to preserve single-neuron isolation. However, the optimal threshold for single-neuron identification is not necessarily the optimal threshold for information extraction. Here we introduce a procedure to determine the best threshold for extracting information from extracellular recordings. We apply this procedure in two distinct contexts: the encoding of kinematic parameters from neural activity in primary motor cortex (M1), and visual stimulus parameters from neural activity in primary visual cortex (V1). We record extracellularly from multi-electrode arrays implanted in M1 or V1 in monkeys. Then, we systematically sweep the voltage detection threshold and quantify the information conveyed by the corresponding threshold crossings. The optimal threshold depends on the desired information. In M1, velocity is optimally encoded at higher thresholds than speed; in both cases the optimal thresholds are lower than are typically used in BCI applications. In V1, information about the orientation of a visual stimulus is optimally encoded at higher thresholds than is visual contrast. A conceptual model explains these results as a consequence of cortical topography. How neural signals are processed impacts the information that can be extracted from them. Both the type and quality of information contained in threshold crossings depend on the threshold setting. There is more information available in these signals than is typically extracted. Adjusting the detection threshold to the parameter of interest in a BCI context should improve our ability to decode motor intent, and thus enhance BCI control. Further, by sweeping the detection threshold, one can gain insights into the topographic organization of the nearby neural tissue.

  4. Biological network extraction from scientific literature: state of the art and challenges.

    PubMed

    Li, Chen; Liakata, Maria; Rebholz-Schuhmann, Dietrich

    2014-09-01

    Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. Predicate Argument Structure Frames for Modeling Information in Operative Notes

    PubMed Central

    Wang, Yan; Pakhomov, Serguei; Melton, Genevieve B.

    2015-01-01

    The rich information about surgical procedures contained in operative notes is a valuable data source for improving the clinical evidence base and clinical research. In this study, we propose a set of Predicate Argument Structure (PAS) frames for surgical action verbs to assist in the creation of an information extraction (IE) system to automatically extract details about the techniques, equipment, and operative steps from operative notes. We created PropBank style PAS frames for the 30 top surgical action verbs based on examination of randomly selected sample sentences from 3,000 Laparoscopic Cholecystectomy notes. To assess completeness of the PAS frames to represent usage of same action verbs, we evaluated the PAS frames created on sample sentences from operative notes of 6 other gastrointestinal surgical procedures. Our results showed that the PAS frames created with one type of surgery can successfully denote the usage of the same verbs in operative notes of broader surgical categories. PMID:23920664

  6. A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.

    PubMed

    Zarei, Roozbeh; He, Jing; Siuly, Siuly; Zhang, Yanchun

    2017-07-01

    Feature extraction of EEG signals plays a significant role in Brain-computer interface (BCI) as it can significantly affect the performance and the computational time of the system. The main aim of the current work is to introduce an innovative algorithm for acquiring reliable discriminating features from EEG signals to improve classification performances and to reduce the time complexity. This study develops a robust feature extraction method combining the principal component analysis (PCA) and the cross-covariance technique (CCOV) for the extraction of discriminatory information from the mental states based on EEG signals in BCI applications. We apply the correlation based variable selection method with the best first search on the extracted features to identify the best feature set for characterizing the distribution of mental state signals. To verify the robustness of the proposed feature extraction method, three machine learning techniques: multilayer perceptron neural networks (MLP), least square support vector machine (LS-SVM), and logistic regression (LR) are employed on the obtained features. The proposed methods are evaluated on two publicly available datasets. Furthermore, we evaluate the performance of the proposed methods by comparing it with some recently reported algorithms. The experimental results show that all three classifiers achieve high performance (above 99% overall classification accuracy) for the proposed feature set. Among these classifiers, the MLP and LS-SVM methods yield the best performance for the obtained feature. The average sensitivity, specificity and classification accuracy for these two classifiers are same, which are 99.32%, 100%, and 99.66%, respectively for the BCI competition dataset IVa and 100%, 100%, and 100%, for the BCI competition dataset IVb. The results also indicate the proposed methods outperform the most recently reported methods by at least 0.25% average accuracy improvement in dataset IVa. The execution time results show that the proposed method has less time complexity after feature selection. The proposed feature extraction method is very effective for getting representatives information from mental states EEG signals in BCI applications and reducing the computational complexity of classifiers by reducing the number of extracted features. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning.

    PubMed

    Norouzzadeh, Mohammad Sadegh; Nguyen, Anh; Kosmala, Margaret; Swanson, Alexandra; Palmer, Meredith S; Packer, Craig; Clune, Jeff

    2018-06-19

    Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would improve our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into "big data" sciences. Motion-sensor "camera traps" enable collecting wildlife pictures inexpensively, unobtrusively, and frequently. However, extracting information from these pictures remains an expensive, time-consuming, manual task. We demonstrate that such information can be automatically extracted by deep learning, a cutting-edge type of artificial intelligence. We train deep convolutional neural networks to identify, count, and describe the behaviors of 48 species in the 3.2 million-image Snapshot Serengeti dataset. Our deep neural networks automatically identify animals with >93.8% accuracy, and we expect that number to improve rapidly in years to come. More importantly, if our system classifies only images it is confident about, our system can automate animal identification for 99.3% of the data while still performing at the same 96.6% accuracy as that of crowdsourced teams of human volunteers, saving >8.4 y (i.e., >17,000 h at 40 h/wk) of human labeling effort on this 3.2 million-image dataset. Those efficiency gains highlight the importance of using deep neural networks to automate data extraction from camera-trap images, reducing a roadblock for this widely used technology. Our results suggest that deep learning could enable the inexpensive, unobtrusive, high-volume, and even real-time collection of a wealth of information about vast numbers of animals in the wild. Copyright © 2018 the Author(s). Published by PNAS.

  8. Foreword to the Special Issue on Remote Sensing and Modeling of Surface Properties

    USDA-ARS?s Scientific Manuscript database

    CURRENTLY, the Numerical Weather Prediction (NWP) community is striving for better ways to extract information on the lower layer using current and future satellite systems to improve short-term to medium-range forecasts. The surface emissivity is highly variable and may cause biases in the forward ...

  9. Developing Communication in the Workplace for Non-Native English Speakers.

    ERIC Educational Resources Information Center

    Nichols, Pat; Watkins, Lisa

    This curriculum module contains materials for conducting a course designed to build oral and written English skills for nonnative speakers. The course focuses on increasing vocabulary, improving listening/speaking skills, extracting information from various written texts (such as memos, notes, business forms, manuals, letters), and developing…

  10. A judicious multiple hypothesis tracker with interacting feature extraction

    NASA Astrophysics Data System (ADS)

    McAnanama, James G.; Kirubarajan, T.

    2009-05-01

    The multiple hypotheses tracker (mht) is recognized as an optimal tracking method due to the enumeration of all possible measurement-to-track associations, which does not involve any approximation in its original formulation. However, its practical implementation is limited by the NP-hard nature of this enumeration. As a result, a number of maintenance techniques such as pruning and merging have been proposed to bound the computational complexity. It is possible to improve the performance of a tracker, mht or not, using feature information (e.g., signal strength, size, type) in addition to kinematic data. However, in most tracking systems, the extraction of features from the raw sensor data is typically independent of the subsequent association and filtering stages. In this paper, a new approach, called the Judicious Multi Hypotheses Tracker (jmht), whereby there is an interaction between feature extraction and the mht, is presented. The measure of the quality of feature extraction is input into measurement-to-track association while the prediction step feeds back the parameters to be used in the next round of feature extraction. The motivation for this forward and backward interaction between feature extraction and tracking is to improve the performance in both steps. This approach allows for a more rational partitioning of the feature space and removes unlikely features from the assignment problem. Simulation results demonstrate the benefits of the proposed approach.

  11. Quality tools and resources to support organisational improvement integral to high-quality primary care: a systematic review of published and grey literature.

    PubMed

    Janamian, Tina; Upham, Susan J; Crossland, Lisa; Jackson, Claire L

    2016-04-18

    To conduct a systematic review of the literature to identify existing online primary care quality improvement tools and resources to support organisational improvement related to the seven elements in the Primary Care Practice Improvement Tool (PC-PIT), with the identified tools and resources to progress to a Delphi study for further assessment of relevance and utility. Systematic review of the international published and grey literature. CINAHL, Embase and PubMed databases were searched in March 2014 for articles published between January 2004 and December 2013. GreyNet International and other relevant websites and repositories were also searched in March-April 2014 for documents dated between 1992 and 2012. All citations were imported into a bibliographic database. Published and unpublished tools and resources were included in the review if they were in English, related to primary care quality improvement and addressed any of the seven PC-PIT elements of a high-performing practice. Tools and resources that met the eligibility criteria were then evaluated for their accessibility, relevance, utility and comprehensiveness using a four-criteria appraisal framework. We used a data extraction template to systematically extract information from eligible tools and resources. A content analysis approach was used to explore the tools and resources and collate relevant information: name of the tool or resource, year and country of development, author, name of the organisation that provided access and its URL, accessibility information or problems, overview of each tool or resource and the quality improvement element(s) it addresses. If available, a copy of the tool or resource was downloaded into the bibliographic database, along with supporting evidence (published or unpublished) on its use in primary care. This systematic review identified 53 tools and resources that can potentially be provided as part of a suite of tools and resources to support primary care practices in improving the quality of their practice, to achieve improved health outcomes.

  12. Effect of two doses of ginkgo biloba extract (EGb 761) on the dual-coding test in elderly subjects.

    PubMed

    Allain, H; Raoul, P; Lieury, A; LeCoz, F; Gandon, J M; d'Arbigny, P

    1993-01-01

    The subjects of this double-blind study were 18 elderly men and women (mean age, 69.3 years) with slight age-related memory impairment. In a crossover-study design, each subject received placebo or an extract of Ginkgo biloba (EGb 761) (320 mg or 600 mg) 1 hour before performing a dual-coding test that measures the speed of information processing; the test consists of several coding series of drawings and words presented at decreasing times of 1920, 960, 480, 240, and 120 ms. The dual-coding phenomenon (a break point between coding verbal material and images) was demonstrated in all the tests. After placebo, the break point was observed at 960 ms and dual coding beginning at 1920 ms. After each dose of the ginkgo extract, the break point (at 480 ms) and dual coding (at 960 ms) were significantly shifted toward a shorter presentation time, indicating an improvement in the speed of information processing.

  13. Light Microscopy at Maximal Precision

    NASA Astrophysics Data System (ADS)

    Bierbaum, Matthew; Leahy, Brian D.; Alemi, Alexander A.; Cohen, Itai; Sethna, James P.

    2017-10-01

    Microscopy is the workhorse of the physical and life sciences, producing crisp images of everything from atoms to cells well beyond the capabilities of the human eye. However, the analysis of these images is frequently little more accurate than manual marking. Here, we revolutionize the analysis of microscopy images, extracting all the useful information theoretically contained in a complex microscope image. Using a generic, methodological approach, we extract the information by fitting experimental images with a detailed optical model of the microscope, a method we call parameter extraction from reconstructing images (PERI). As a proof of principle, we demonstrate this approach with a confocal image of colloidal spheres, improving measurements of particle positions and radii by 10-100 times over current methods and attaining the maximum possible accuracy. With this unprecedented accuracy, we measure nanometer-scale colloidal interactions in dense suspensions solely with light microscopy, a previously impossible feat. Our approach is generic and applicable to imaging methods from brightfield to electron microscopy, where we expect accuracies of 1 nm and 0.1 pm, respectively.

  14. Zea mays, Stigma maydis prevents and extenuates acetaminophen-perturbed oxidative onslaughts in rat hepatocytes.

    PubMed

    Saheed, Sabiu; Frans Hendrik, O'Neill; Tom, Ashafa Anofi Omotayo

    2016-11-01

    Zea mays L. (Poaceae) Stigma maydis is an underutilized product of corn cultivation finding therapeutic applications in oxidative stress-related disorders. This study investigated its aqueous extract against acetaminophen (APAP)-perturbed oxidative insults in rat hepatocytes. Hepatotoxic rats were orally pre- and post-treated with the extract (at 200 and 400 mg/kg body weight) and vitamin C (200 mg/kg body weight), respectively, for 14 days. Liver function, antioxidative and histological analyses were thereafter evaluated. The APAP-induced marked (p < 0.05) increases in the activities of alkaline phosphatase, alanine aminotransferase, aspartate aminotransferase, gamma glutamyl transferase and the concentrations of bilirubin, oxidized glutathione, protein carbonyls, malondialdehyde, conjugated dienes, lipid hydroperoxides and fragmented DNA were dose-dependently extenuated in the extract-treated animals. The extract also significantly (p < 0.05) improved the reduced activities of superoxide dismutase, catalase, glutathione reductase and glutathione peroxidase as well as total protein, albumin and glutathione concentrations in the hepatotoxic rats. These improvements may be attributed to the bioactive constituents as revealed by the gas chromatography-mass spectrometric chromatogram of the extract. The observed effects compared favourably with vitamin C and are informative of hepatoprotective and antioxidative attributes of the extract and were further supported by the histological analysis. The data from the present findings suggest that Stigma maydis aqueous extract is capable of preventing and ameliorating APAP-mediated oxidative hepatic damage via enhancement of antioxidant defence systems.

  15. Biological event composition

    PubMed Central

    2012-01-01

    Background In recent years, biological event extraction has emerged as a key natural language processing task, aiming to address the information overload problem in accessing the molecular biology literature. The BioNLP shared task competitions have contributed to this recent interest considerably. The first competition (BioNLP'09) focused on extracting biological events from Medline abstracts from a narrow domain, while the theme of the latest competition (BioNLP-ST'11) was generalization and a wider range of text types, event types, and subject domains were considered. We view event extraction as a building block in larger discourse interpretation and propose a two-phase, linguistically-grounded, rule-based methodology. In the first phase, a general, underspecified semantic interpretation is composed from syntactic dependency relations in a bottom-up manner. The notion of embedding underpins this phase and it is informed by a trigger dictionary and argument identification rules. Coreference resolution is also performed at this step, allowing extraction of inter-sentential relations. The second phase is concerned with constraining the resulting semantic interpretation by shared task specifications. We evaluated our general methodology on core biological event extraction and speculation/negation tasks in three main tracks of BioNLP-ST'11 (GENIA, EPI, and ID). Results We achieved competitive results in GENIA and ID tracks, while our results in the EPI track leave room for improvement. One notable feature of our system is that its performance across abstracts and articles bodies is stable. Coreference resolution results in minor improvement in system performance. Due to our interest in discourse-level elements, such as speculation/negation and coreference, we provide a more detailed analysis of our system performance in these subtasks. Conclusions The results demonstrate the viability of a robust, linguistically-oriented methodology, which clearly distinguishes general semantic interpretation from shared task specific aspects, for biological event extraction. Our error analysis pinpoints some shortcomings, which we plan to address in future work within our incremental system development methodology. PMID:22759461

  16. Rain/No-Rain Identification from Bispectral Satellite Information using Deep Neural Networks

    NASA Astrophysics Data System (ADS)

    Tao, Y.

    2016-12-01

    Satellite-based precipitation estimation products have the advantage of high resolution and global coverage. However, they still suffer from insufficient accuracy. To accurately estimate precipitation from satellite data, there are two most important aspects: sufficient precipitation information in the satellite information and proper methodologies to extract such information effectively. This study applies the state-of-the-art machine learning methodologies to bispectral satellite information for Rain/No-Rain detection. Specifically, we use deep neural networks to extract features from infrared and water vapor channels and connect it to precipitation identification. To evaluate the effectiveness of the methodology, we first applies it to the infrared data only (Model DL-IR only), the most commonly used inputs for satellite-based precipitation estimation. Then we incorporates water vapor data (Model DL-IR + WV) to further improve the prediction performance. Radar stage IV dataset is used as ground measurement for parameter calibration. The operational product, Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks Cloud Classification System (PERSIANN-CCS), is used as a reference to compare the performance of both models in both winter and summer seasons.The experiments show significant improvement for both models in precipitation identification. The overall performance gains in the Critical Success Index (CSI) are 21.60% and 43.66% over the verification periods for Model DL-IR only and Model DL-IR+WV model compared to PERSIANN-CCS, respectively. Moreover, specific case studies show that the water vapor channel information and the deep neural networks effectively help recover a large number of missing precipitation pixels under warm clouds while reducing false alarms under cold clouds.

  17. Challenges in Managing Information Extraction

    ERIC Educational Resources Information Center

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  18. The Effects of Age and Set Size on the Fast Extraction of Egocentric Distance

    PubMed Central

    Gajewski, Daniel A.; Wallin, Courtney P.; Philbeck, John W.

    2016-01-01

    Angular direction is a source of information about the distance to floor-level objects that can be extracted from brief glimpses (near one's threshold for detection). Age and set size are two factors known to impact the viewing time needed to directionally localize an object, and these were posited to similarly govern the extraction of distance. The question here was whether viewing durations sufficient to support object detection (controlled for age and set size) would also be sufficient to support well-constrained judgments of distance. Regardless of viewing duration, distance judgments were more accurate (less biased towards underestimation) when multiple potential targets were presented, suggesting that the relative angular declinations between the objects are an additional source of useful information. Distance judgments were more precise with additional viewing time, but the benefit did not depend on set size and accuracy did not improve with longer viewing durations. The overall pattern suggests that distance can be efficiently derived from direction for floor-level objects. Controlling for age-related differences in the viewing time needed to support detection was sufficient to support distal localization but only when brief and longer glimpse trials were interspersed. Information extracted from longer glimpse trials presumably supported performance on subsequent trials when viewing time was more limited. This outcome suggests a particularly important role for prior visual experience in distance judgments for older observers. PMID:27398065

  19. Hand and goods judgment algorithm based on depth information

    NASA Astrophysics Data System (ADS)

    Li, Mingzhu; Zhang, Jinsong; Yan, Dan; Wang, Qin; Zhang, Ruiqi; Han, Jing

    2016-03-01

    A tablet computer with a depth camera and a color camera is loaded on a traditional shopping cart. The inside information of the shopping cart is obtained by two cameras. In the shopping cart monitoring field, it is very important for us to determine whether the customer with goods in or out of the shopping cart. This paper establishes a basic framework for judging empty hand, it includes the hand extraction process based on the depth information, process of skin color model building based on WPCA (Weighted Principal Component Analysis), an algorithm for judging handheld products based on motion and skin color information, statistical process. Through this framework, the first step can ensure the integrity of the hand information, and effectively avoids the influence of sleeve and other debris, the second step can accurately extract skin color and eliminate the similar color interference, light has little effect on its results, it has the advantages of fast computation speed and high efficiency, and the third step has the advantage of greatly reducing the noise interference and improving the accuracy.

  20. Brain extraction in partial volumes T2*@7T by using a quasi-anatomic segmentation with bias field correction.

    PubMed

    Valente, João; Vieira, Pedro M; Couto, Carlos; Lima, Carlos S

    2018-02-01

    Poor brain extraction in Magnetic Resonance Imaging (MRI) has negative consequences in several types of brain post-extraction such as tissue segmentation and related statistical measures or pattern recognition algorithms. Current state of the art algorithms for brain extraction work on weighted T1 and T2, being not adequate for non-whole brain images such as the case of T2*FLASH@7T partial volumes. This paper proposes two new methods that work directly in T2*FLASH@7T partial volumes. The first is an improvement of the semi-automatic threshold-with-morphology approach adapted to incomplete volumes. The second method uses an improved version of a current implementation of the fuzzy c-means algorithm with bias correction for brain segmentation. Under high inhomogeneity conditions the performance of the first method degrades, requiring user intervention which is unacceptable. The second method performed well for all volumes, being entirely automatic. State of the art algorithms for brain extraction are mainly semi-automatic, requiring a correct initialization by the user and knowledge of the software. These methods can't deal with partial volumes and/or need information from atlas which is not available in T2*FLASH@7T. Also, combined volumes suffer from manipulations such as re-sampling which deteriorates significantly voxel intensity structures making segmentation tasks difficult. The proposed method can overcome all these difficulties, reaching good results for brain extraction using only T2*FLASH@7T volumes. The development of this work will lead to an improvement of automatic brain lesions segmentation in T2*FLASH@7T volumes, becoming more important when lesions such as cortical Multiple-Sclerosis need to be detected. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Robust Tomography using Randomized Benchmarking

    NASA Astrophysics Data System (ADS)

    Silva, Marcus; Kimmel, Shelby; Johnson, Blake; Ryan, Colm; Ohki, Thomas

    2013-03-01

    Conventional randomized benchmarking (RB) can be used to estimate the fidelity of Clifford operations in a manner that is robust against preparation and measurement errors -- thus allowing for a more accurate and relevant characterization of the average error in Clifford gates compared to standard tomography protocols. Interleaved RB (IRB) extends this result to the extraction of error rates for individual Clifford gates. In this talk we will show how to combine multiple IRB experiments to extract all information about the unital part of any trace preserving quantum process. Consequently, one can compute the average fidelity to any unitary, not just the Clifford group, with tighter bounds than IRB. Moreover, the additional information can be used to design improvements in control. MS, BJ, CR and TO acknowledge support from IARPA under contract W911NF-10-1-0324.

  2. Profiling Lung Cancer Patients Using Electronic Health Records.

    PubMed

    Menasalvas Ruiz, Ernestina; Tuñas, Juan Manuel; Bermejo, Guzmán; Gonzalo Martín, Consuelo; Rodríguez-González, Alejandro; Zanin, Massimiliano; González de Pedro, Cristina; Méndez, Marta; Zaretskaia, Olga; Rey, Jesús; Parejo, Consuelo; Cruz Bermudez, Juan Luis; Provencio, Mariano

    2018-05-31

    If Electronic Health Records contain a large amount of information about the patient's condition and response to treatment, which can potentially revolutionize the clinical practice, such information is seldom considered due to the complexity of its extraction and analysis. We here report on a first integration of an NLP framework for the analysis of clinical records of lung cancer patients making use of a telephone assistance service of a major Spanish hospital. We specifically show how some relevant data, about patient demographics and health condition, can be extracted; and how some relevant analyses can be performed, aimed at improving the usefulness of the service. We thus demonstrate that the use of EHR texts, and their integration inside a data analysis framework, is technically feasible and worth of further study.

  3. Using automatically extracted information from mammography reports for decision-support

    PubMed Central

    Bozkurt, Selen; Gimenez, Francisco; Burnside, Elizabeth S.; Gulkesen, Kemal H.; Rubin, Daniel L.

    2016-01-01

    Objective To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report. Materials and methods We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect (“reference standard”) structured inputs to the DSS (“RS-DSS”) vs NLP-derived inputs (“NLP-DSS”) on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist. Results The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004 ± 0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%. Conclusion The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions. PMID:27388877

  4. Study on Low Illumination Simultaneous Polarization Image Registration Based on Improved SURF Algorithm

    NASA Astrophysics Data System (ADS)

    Zhang, Wanjun; Yang, Xu

    2017-12-01

    Registration of simultaneous polarization images is the premise of subsequent image fusion operations. However, in the process of shooting all-weather, the polarized camera exposure time need to be kept unchanged, sometimes polarization images under low illumination conditions due to too dark result in SURF algorithm can not extract feature points, thus unable to complete the registration, therefore this paper proposes an improved SURF algorithm. Firstly, the luminance operator is used to improve overall brightness of low illumination image, and then create integral image, using Hession matrix to extract the points of interest to get the main direction of characteristic points, calculate Haar wavelet response in X and Y directions to get the SURF descriptor information, then use the RANSAC function to make precise matching, the function can eliminate wrong matching points and improve accuracy rate. And finally resume the brightness of the polarized image after registration, the effect of the polarized image is not affected. Results show that the improved SURF algorithm can be applied well under low illumination conditions.

  5. Evaluation of the application of ERTS-1 data to the regional land use planning process. [Northeast Wisconsin

    NASA Technical Reports Server (NTRS)

    Clapp, J. L. (Principal Investigator); Green, T., III; Hanson, G. F.; Kiefer, R. W.; Niemann, B. J., Jr.

    1974-01-01

    The author has identified the following significant results. Employing simple and economical extraction methods, ERTS can provide valuable data to the planners at the state or regional level with a frequency never before possible. Interactive computer methods of working directly with ERTS digital information show much promise for providing land use information at a more specific level, since the data format production rate of ERTS justifies improved methods of analysis.

  6. Can Natural Language Processing Improve the Efficiency of Vaccine Adverse Event Report Review?

    PubMed

    Baer, B; Nguyen, M; Woo, E J; Winiecki, S; Scott, J; Martin, D; Botsis, T; Ball, R

    2016-01-01

    Individual case review of spontaneous adverse event (AE) reports remains a cornerstone of medical product safety surveillance for industry and regulators. Previously we developed the Vaccine Adverse Event Text Miner (VaeTM) to offer automated information extraction and potentially accelerate the evaluation of large volumes of unstructured data and facilitate signal detection. To assess how the information extraction performed by VaeTM impacts the accuracy of a medical expert's review of the vaccine adverse event report. The "outcome of interest" (diagnosis, cause of death, second level diagnosis), "onset time," and "alternative explanations" (drug, medical and family history) for the adverse event were extracted from 1000 reports from the Vaccine Adverse Event Reporting System (VAERS) using the VaeTM system. We compared the human interpretation, by medical experts, of the VaeTM extracted data with their interpretation of the traditional full text reports for these three variables. Two experienced clinicians alternately reviewed text miner output and full text. A third clinician scored the match rate using a predefined algorithm; the proportion of matches and 95% confidence intervals (CI) were calculated. Review time per report was analyzed. Proportion of matches between the interpretation of the VaeTM extracted data, compared to the interpretation of the full text: 93% for outcome of interest (95% CI: 91-94%) and 78% for alternative explanation (95% CI: 75-81%). Extracted data on the time to onset was used in 14% of cases and was a match in 54% (95% CI: 46-63%) of those cases. When supported by structured time data from reports, the match for time to onset was 79% (95% CI: 76-81%). The extracted text averaged 136 (74%) fewer words, resulting in a mean reduction in review time of 50 (58%) seconds per report. Despite a 74% reduction in words, the clinical conclusion from VaeTM extracted data agreed with the full text in 93% and 78% of reports for the outcome of interest and alternative explanation, respectively. The limited amount of extracted time interval data indicates the need for further development of this feature. VaeTM may improve review efficiency, but further study is needed to determine if this level of agreement is sufficient for routine use.

  7. Creation New Patient Information Leaflets with Diabetes by Pharmacists and Assesment Conducted by Patients.

    PubMed

    Arai, Motoharu; Maeda, Kazuto; Satoh, Hiroki; Miki, Akiko; Sawada, Yasufumi

    2016-10-01

    We created a draft of new patient information leaflets to ensure patients' proper use of drugs and to highlight safety issues and improvement plans extracted and proposed by small group discussions (SGD) with pharmacists. A total of 3 SGDs (participants: 15 pharmacists) were conducted with the aim of improving patient information leaflets for oral diabetes drugs. First, the disadvantages and advantages of the current instructions as well as requests for ideal patient information leaflets were obtained from participants. Conventional patient information leaflets that could be improved were useful to understand drug efficacy, adverse effects, and instructions for daily consumption of medicines, and to encourage patients to re-check drugs at home and inform their family of the measures to be taken in the case of adverse effects from the standpoint of patients. However, some disadvantages arose; for example, the instructions were difficult to read because of small lettering and illustrations and too much text. It was not tailored for individual patients, and descriptions about serious adverse effects caused patients much anxiety. Therefore, we have created a draft of new patient information leaflets with diabetes that are simpler and easier to understand and use concise wording and illustrations that are impactful.

  8. About increasing informativity of diagnostic system of asynchronous electric motor by extracting additional information from values of consumed current parameter

    NASA Astrophysics Data System (ADS)

    Zhukovskiy, Y.; Korolev, N.; Koteleva, N.

    2018-05-01

    This article is devoted to expanding the possibilities of assessing the technical state of the current consumption of asynchronous electric drives, as well as increasing the information capacity of diagnostic methods, in conditions of limited access to equipment and incompleteness of information. The method of spectral analysis of the electric drive current can be supplemented by an analysis of the components of the current of the Park's vector. The research of the hodograph evolution in the moment of appearance and development of defects was carried out using the example of current asymmetry in the phases of an induction motor. The result of the study is the new diagnostic parameters of the asynchronous electric drive. During the research, it was proved that the proposed diagnostic parameters allow determining the type and level of the defect. At the same time, there is no need to stop the equipment and taky it out of service for repair. Modern digital control and monitoring systems can use the proposed parameters based on the stator current of an electrical machine to improve the accuracy and reliability of obtaining diagnostic patterns and predicting their changes in order to improve the equipment maintenance systems. This approach can also be used in systems and objects where there are significant parasitic vibrations and unsteady loads. The extraction of useful information can be carried out in electric drive systems in the structure of which there is a power electric converter.

  9. Information extraction system

    DOEpatents

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  10. Educational System Efficiency Improvement Using Knowledge Discovery in Databases

    ERIC Educational Resources Information Center

    Lukaš, Mirko; Leškovic, Darko

    2007-01-01

    This study describes one of possible way of usage ICT in education system. We basically treated educational system like Business Company and develop appropriate model for clustering of student population. Modern educational systems are forced to extract the most necessary and purposeful information from a large amount of available data. Clustering…

  11. Acquiring Information from Wider Scope to Improve Event Extraction

    DTIC Science & Technology

    2012-05-01

    film ”. 2.3.2 Argument Constraint Even if the scenario is well detected, there is no guarantee of identifying the event correctly. Think about words...from 2003 newswire, with the same genre and time period as ACE 2005 data to avoid possible influences of variations in the genre or time period on the

  12. A data-driven approach for quality assessment of radiologic interpretations.

    PubMed

    Hsu, William; Han, Simon X; Arnold, Corey W; Bui, Alex At; Enzmann, Dieter R

    2016-04-01

    Given the increasing emphasis on delivering high-quality, cost-efficient healthcare, improved methodologies are needed to measure the accuracy and utility of ordered diagnostic examinations in achieving the appropriate diagnosis. Here, we present a data-driven approach for performing automated quality assessment of radiologic interpretations using other clinical information (e.g., pathology) as a reference standard for individual radiologists, subspecialty sections, imaging modalities, and entire departments. Downstream diagnostic conclusions from the electronic medical record are utilized as "truth" to which upstream diagnoses generated by radiology are compared. The described system automatically extracts and compares patient medical data to characterize concordance between clinical sources. Initial results are presented in the context of breast imaging, matching 18 101 radiologic interpretations with 301 pathology diagnoses and achieving a precision and recall of 84% and 92%, respectively. The presented data-driven method highlights the challenges of integrating multiple data sources and the application of information extraction tools to facilitate healthcare quality improvement. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Toward a complete dataset of drug-drug interaction information from publicly available sources.

    PubMed

    Ayvaz, Serkan; Horn, John; Hassanzadeh, Oktie; Zhu, Qian; Stan, Johann; Tatonetti, Nicholas P; Vilar, Santiago; Brochhausen, Mathias; Samwald, Matthias; Rastegar-Mojarad, Majid; Dumontier, Michel; Boyce, Richard D

    2015-06-01

    Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Improved CORF model of simple cell combined with non-classical receptive field and its application on edge detection

    NASA Astrophysics Data System (ADS)

    Sun, Xiao; Chai, Guobei; Liu, Wei; Bao, Wenzhuo; Zhao, Xiaoning; Ming, Delie

    2018-02-01

    Simple cells in primary visual cortex are believed to extract local edge information from a visual scene. In this paper, inspired by different receptive field properties and visual information flow paths of neurons, an improved Combination of Receptive Fields (CORF) model combined with non-classical receptive fields was proposed to simulate the responses of simple cell's receptive fields. Compared to the classical model, the proposed model is able to better imitate simple cell's physiologic structure with consideration of facilitation and suppression of non-classical receptive fields. And on this base, an edge detection algorithm as an application of the improved CORF model was proposed. Experimental results validate the robustness of the proposed algorithm to noise and background interference.

  15. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

    PubMed

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2016-10-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets ( e.g. , application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

  16. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora

    PubMed Central

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2017-01-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes. PMID:28210517

  17. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

    The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less

  18. Extraction of linear features on SAR imagery

    NASA Astrophysics Data System (ADS)

    Liu, Junyi; Li, Deren; Mei, Xin

    2006-10-01

    Linear features are usually extracted from SAR imagery by a few edge detectors derived from the contrast ratio edge detector with a constant probability of false alarm. On the other hand, the Hough Transform is an elegant way of extracting global features like curve segments from binary edge images. Randomized Hough Transform can reduce the computation time and memory usage of the HT drastically. While Randomized Hough Transform will bring about a great deal of cells invalid during the randomized sample. In this paper, we propose a new approach to extract linear features on SAR imagery, which is an almost automatic algorithm based on edge detection and Randomized Hough Transform. The presented improved method makes full use of the directional information of each edge candidate points so as to solve invalid cumulate problems. Applied result is in good agreement with the theoretical study, and the main linear features on SAR imagery have been extracted automatically. The method saves storage space and computational time, which shows its effectiveness and applicability.

  19. Extracting noun phrases for all of MEDLINE.

    PubMed Central

    Bennett, N. A.; He, Q.; Powell, K.; Schatz, B. R.

    1999-01-01

    A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature. PMID:10566444

  20. Lip boundary detection techniques using color and depth information

    NASA Astrophysics Data System (ADS)

    Kim, Gwang-Myung; Yoon, Sung H.; Kim, Jung H.; Hur, Gi Taek

    2002-01-01

    This paper presents our approach to using a stereo camera to obtain 3-D image data to be used to improve existing lip boundary detection techniques. We show that depth information as provided by our approach can be used to significantly improve boundary detection systems. Our system detects the face and mouth area in the image by using color, geometric location, and additional depth information for the face. Initially, color and depth information can be used to localize the face. Then we can determine the lip region from the intensity information and the detected eye locations. The system has successfully been used to extract approximate lip regions using RGB color information of the mouth area. Merely using color information is not robust because the quality of the results may vary depending on light conditions, background, and the human race. To overcome this problem, we used a stereo camera to obtain 3-D facial images. 3-D data constructed from the depth information along with color information can provide more accurate lip boundary detection results as compared to color only based techniques.

  1. Efficacy and Safety of Ashwagandha (Withania somnifera (L.) Dunal) Root Extract in Improving Memory and Cognitive Functions.

    PubMed

    Choudhary, Dnyanraj; Bhattacharyya, Sauvik; Bose, Sekhar

    2017-11-02

    Cognitive decline is often associated with the aging process. Ashwagandha (Withania somnifera (L.) Dunal) has long been used in the traditional Ayurvedic system of medicine to enhance memory and improve cognition. This pilot study was designed to evaluate the efficacy and safety of ashwagandha (Withania somnifera (L.) Dunal) in improving memory and cognitive functioning in adults with mild cognitive impairment (MCI). A prospective, randomized, double-blind, placebo-controlled study was conducted in 50 adults. Subjects were treated with either ashwagandha-root extract (300 mg twice daily) or placebo for eight weeks. After eight weeks of study, the ashwagandha treatment group demonstrated significant improvements compared with the placebo group in both immediate and general memory, as evidenced by Wechsler Memory Scale III subtest scores for logical memory I (p = 0.007), verbal paired associates I (p = 0.042), faces I (p = 0.020), family pictures I (p = 0.006), logical memory II (p = 0.006), verbal paired associates II (p = 0.031), faces II (p = 0.014), and family pictures II (p = 0.006). The treatment group also demonstrated significantly greater improvement in executive function, sustained attention, and information-processing speed as indicated by scores on the Eriksen Flanker task (p = 0.002), Wisconsin Card Sort test (p = 0.014), Trail-Making test part A (p = 0.006), and the Mackworth Clock test (p = 0.009). Ashwagandha may be effective in enhancing both immediate and general memory in people with MCI as well as improving executive function, attention, and information processing speed.

  2. Space Research Data Management in the National Aeronautics and Space Administration

    NASA Technical Reports Server (NTRS)

    Ludwig, G. H.

    1986-01-01

    Space related scientific research has passed through a natural evolutionary process. The task of extracting the meaningful information from the raw data is highly involved and will require data processing capabilities that do not exist today. The results are presented of a three year examination of this subject, using an earlier report as a starting point. The general conclusion is that there are areas in which NASA's data management practices can be improved and recommends specific actions. These actions will enhance NASA's ability to extract more of the potential data and to capitalize on future opportunities.

  3. Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

    PubMed

    Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo

    2017-12-01

    Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures. Source code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr. zheng@itp.ac.cn or dbu@ict.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.

    PubMed

    Segura-Bedmar, Isabel; Martínez, Paloma; de Pablo-Sánchez, César

    2011-03-29

    A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.

  5. Integration and Beyond

    PubMed Central

    Stead, William W.; Miller, Randolph A.; Musen, Mark A.; Hersh, William R.

    2000-01-01

    The vision of integrating information—from a variety of sources, into the way people work, to improve decisions and process—is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of “integration” projects. First-generation projects create a database and use it for multiple purposes. Second-generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction. PMID:10730596

  6. Information extraction from multi-institutional radiology reports.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Computer sciences

    NASA Technical Reports Server (NTRS)

    Smith, Paul H.

    1988-01-01

    The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.

  8. Review of Extracting Information From the Social Web for Health Personalization

    PubMed Central

    Karlsen, Randi; Bonander, Jason

    2011-01-01

    In recent years the Web has come into its own as a social platform where health consumers are actively creating and consuming Web content. Moreover, as the Web matures, consumers are gaining access to personalized applications adapted to their health needs and interests. The creation of personalized Web applications relies on extracted information about the users and the content to personalize. The Social Web itself provides many sources of information that can be used to extract information for personalization apart from traditional Web forms and questionnaires. This paper provides a review of different approaches for extracting information from the Social Web for health personalization. We reviewed research literature across different fields addressing the disclosure of health information in the Social Web, techniques to extract that information, and examples of personalized health applications. In addition, the paper includes a discussion of technical and socioethical challenges related to the extraction of information for health personalization. PMID:21278049

  9. Photometry unlocks 3D information from 2D localization microscopy data.

    PubMed

    Franke, Christian; Sauer, Markus; van de Linde, Sebastian

    2017-01-01

    We developed a straightforward photometric method, temporal, radial-aperture-based intensity estimation (TRABI), that allows users to extract 3D information from existing 2D localization microscopy data. TRABI uses the accurate determination of photon numbers in different regions of the emission pattern of single emitters to generate a z-dependent photometric parameter. This method can determine fluorophore positions up to 600 nm from the focal plane and can be combined with biplane detection to further improve axial localization.

  10. Extraction of endoscopic images for biomedical figure classification

    NASA Astrophysics Data System (ADS)

    Xue, Zhiyun; You, Daekeun; Chachra, Suchet; Antani, Sameer; Long, L. R.; Demner-Fushman, Dina; Thoma, George R.

    2015-03-01

    Modality filtering is an important feature in biomedical image searching systems and may significantly improve the retrieval performance of the system. This paper presents a new method for extracting endoscopic image figures from photograph images in biomedical literature, which are found to have highly diverse content and large variability in appearance. Our proposed method consists of three main stages: tissue image extraction, endoscopic image candidate extraction, and ophthalmic image filtering. For tissue image extraction we use image patch level clustering and MRF relabeling to detect images containing skin/tissue regions. Next, we find candidate endoscopic images by exploiting the round shape characteristics that commonly appear in these images. However, this step needs to compensate for images where endoscopic regions are not entirely round. In the third step we filter out the ophthalmic images which have shape characteristics very similar to the endoscopic images. We do this by using text information, specifically, anatomy terms, extracted from the figure caption. We tested and evaluated our method on a dataset of 115,370 photograph figures, and achieved promising precision and recall rates of 87% and 84%, respectively.

  11. Population Estimation in Singapore Based on Remote Sensing and Open Data

    NASA Astrophysics Data System (ADS)

    Guo, H.; Cao, K.; Wang, P.

    2017-09-01

    Population estimation statistics are widely used in government, commercial and educational sectors for a variety of purposes. With growing emphases on real-time and detailed population information, data users nowadays have switched from traditional census data to more technology-based data source such as LiDAR point cloud and High-Resolution Satellite Imagery. Nevertheless, such data are costly and periodically unavailable. In this paper, the authors use West Coast District, Singapore as a case study to investigate the applicability and effectiveness of using satellite image from Google Earth for extraction of building footprint and population estimation. At the same time, volunteered geographic information (VGI) is also utilized as ancillary data for building footprint extraction. Open data such as Open Street Map OSM could be employed to enhance the extraction process. In view of challenges in building shadow extraction, this paper discusses several methods including buffer, mask and shape index to improve accuracy. It also illustrates population estimation methods based on building height and number of floor estimates. The results show that the accuracy level of housing unit method on population estimation can reach 92.5 %, which is remarkably accurate. This paper thus provides insights into techniques for building extraction and fine-scale population estimation, which will benefit users such as urban planners in terms of policymaking and urban planning of Singapore.

  12. Multi-layer cube sampling for liver boundary detection in PET-CT images.

    PubMed

    Liu, Xinxin; Yang, Jian; Song, Shuang; Song, Hong; Ai, Danni; Zhu, Jianjun; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Liver metabolic information is considered as a crucial diagnostic marker for the diagnosis of fever of unknown origin, and liver recognition is the basis of automatic diagnosis of metabolic information extraction. However, the poor quality of PET and CT images is a challenge for information extraction and target recognition in PET-CT images. The existing detection method cannot meet the requirement of liver recognition in PET-CT images, which is the key problem in the big data analysis of PET-CT images. A novel texture feature descriptor called multi-layer cube sampling (MLCS) is developed for liver boundary detection in low-dose CT and PET images. The cube sampling feature is proposed for extracting more texture information, which uses a bi-centric voxel strategy. Neighbour voxels are divided into three regions by the centre voxel and the reference voxel in the histogram, and the voxel distribution information is statistically classified as texture feature. Multi-layer texture features are also used to improve the ability and adaptability of target recognition in volume data. The proposed feature is tested on the PET and CT images for liver boundary detection. For the liver in the volume data, mean detection rate (DR) and mean error rate (ER) reached 95.15 and 7.81% in low-quality PET images, and 83.10 and 21.08% in low-contrast CT images. The experimental results demonstrated that the proposed method is effective and robust for liver boundary detection.

  13. Improved classification accuracy of powdery mildew infection levels of wine grapes by spatial-spectral analysis of hyperspectral images.

    PubMed

    Knauer, Uwe; Matros, Andrea; Petrovic, Tijana; Zanker, Timothy; Scott, Eileen S; Seiffert, Udo

    2017-01-01

    Hyperspectral imaging is an emerging means of assessing plant vitality, stress parameters, nutrition status, and diseases. Extraction of target values from the high-dimensional datasets either relies on pixel-wise processing of the full spectral information, appropriate selection of individual bands, or calculation of spectral indices. Limitations of such approaches are reduced classification accuracy, reduced robustness due to spatial variation of the spectral information across the surface of the objects measured as well as a loss of information intrinsic to band selection and use of spectral indices. In this paper we present an improved spatial-spectral segmentation approach for the analysis of hyperspectral imaging data and its application for the prediction of powdery mildew infection levels (disease severity) of intact Chardonnay grape bunches shortly before veraison. Instead of calculating texture features (spatial features) for the huge number of spectral bands independently, dimensionality reduction by means of Linear Discriminant Analysis (LDA) was applied first to derive a few descriptive image bands. Subsequent classification was based on modified Random Forest classifiers and selective extraction of texture parameters from the integral image representation of the image bands generated. Dimensionality reduction, integral images, and the selective feature extraction led to improved classification accuracies of up to [Formula: see text] for detached berries used as a reference sample (training dataset). Our approach was validated by predicting infection levels for a sample of 30 intact bunches. Classification accuracy improved with the number of decision trees of the Random Forest classifier. These results corresponded with qPCR results. An accuracy of 0.87 was achieved in classification of healthy, infected, and severely diseased bunches. However, discrimination between visually healthy and infected bunches proved to be challenging for a few samples, perhaps due to colonized berries or sparse mycelia hidden within the bunch or airborne conidia on the berries that were detected by qPCR. An advanced approach to hyperspectral image classification based on combined spatial and spectral image features, potentially applicable to many available hyperspectral sensor technologies, has been developed and validated to improve the detection of powdery mildew infection levels of Chardonnay grape bunches. The spatial-spectral approach improved especially the detection of light infection levels compared with pixel-wise spectral data analysis. This approach is expected to improve the speed and accuracy of disease detection once the thresholds for fungal biomass detected by hyperspectral imaging are established; it can also facilitate monitoring in plant phenotyping of grapevine and additional crops.

  14. The extraction of motion-onset VEP BCI features based on deep learning and compressed sensing.

    PubMed

    Ma, Teng; Li, Hui; Yang, Hao; Lv, Xulin; Li, Peiyang; Liu, Tiejun; Yao, Dezhong; Xu, Peng

    2017-01-01

    Motion-onset visual evoked potentials (mVEP) can provide a softer stimulus with reduced fatigue, and it has potential applications for brain computer interface(BCI)systems. However, the mVEP waveform is seriously masked in the strong background EEG activities, and an effective approach is needed to extract the corresponding mVEP features to perform task recognition for BCI control. In the current study, we combine deep learning with compressed sensing to mine discriminative mVEP information to improve the mVEP BCI performance. The deep learning and compressed sensing approach can generate the multi-modality features which can effectively improve the BCI performance with approximately 3.5% accuracy incensement over all 11 subjects and is more effective for those subjects with relatively poor performance when using the conventional features. Compared with the conventional amplitude-based mVEP feature extraction approach, the deep learning and compressed sensing approach has a higher classification accuracy and is more effective for subjects with relatively poor performance. According to the results, the deep learning and compressed sensing approach is more effective for extracting the mVEP feature to construct the corresponding BCI system, and the proposed feature extraction framework is easy to extend to other types of BCIs, such as motor imagery (MI), steady-state visual evoked potential (SSVEP)and P300. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Extraction of drainage networks from large terrain datasets using high throughput computing

    NASA Astrophysics Data System (ADS)

    Gong, Jianya; Xie, Jibo

    2009-02-01

    Advanced digital photogrammetry and remote sensing technology produces large terrain datasets (LTD). How to process and use these LTD has become a big challenge for GIS users. Extracting drainage networks, which are basic for hydrological applications, from LTD is one of the typical applications of digital terrain analysis (DTA) in geographical information applications. Existing serial drainage algorithms cannot deal with large data volumes in a timely fashion, and few GIS platforms can process LTD beyond the GB size. High throughput computing (HTC), a distributed parallel computing mode, is proposed to improve the efficiency of drainage networks extraction from LTD. Drainage network extraction using HTC involves two key issues: (1) how to decompose the large DEM datasets into independent computing units and (2) how to merge the separate outputs into a final result. A new decomposition method is presented in which the large datasets are partitioned into independent computing units using natural watershed boundaries instead of using regular 1-dimensional (strip-wise) and 2-dimensional (block-wise) decomposition. Because the distribution of drainage networks is strongly related to watershed boundaries, the new decomposition method is more effective and natural. The method to extract natural watershed boundaries was improved by using multi-scale DEMs instead of single-scale DEMs. A HTC environment is employed to test the proposed methods with real datasets.

  16. Improved NASTRAN plotting

    NASA Technical Reports Server (NTRS)

    Chan, Gordon C.

    1991-01-01

    The new 1991 COSMIC/NASTRAN version, compatible with the older versions, tries to remove some old constraints and make it easier to extract information from the plot file. It also includes some useful improvements and new enhancements. New features available in the 1991 version are described. They include a new PLT1 tape with simplified ASCII plot commands and short records, combined hidden and shrunk plot, an x-y-z coordinate system on all structural plots, element offset plot, improved character size control, improved FIND and NOFIND logic, a new NASPLOT post-prosessor to perform screen plotting or generate PostScript files, and a BASIC/NASTPLOT program for PC.

  17. High-Resolution Remote Sensing Image Building Extraction Based on Markov Model

    NASA Astrophysics Data System (ADS)

    Zhao, W.; Yan, L.; Chang, Y.; Gong, L.

    2018-04-01

    With the increase of resolution, remote sensing images have the characteristics of increased information load, increased noise, more complex feature geometry and texture information, which makes the extraction of building information more difficult. To solve this problem, this paper designs a high resolution remote sensing image building extraction method based on Markov model. This method introduces Contourlet domain map clustering and Markov model, captures and enhances the contour and texture information of high-resolution remote sensing image features in multiple directions, and further designs the spectral feature index that can characterize "pseudo-buildings" in the building area. Through the multi-scale segmentation and extraction of image features, the fine extraction from the building area to the building is realized. Experiments show that this method can restrain the noise of high-resolution remote sensing images, reduce the interference of non-target ground texture information, and remove the shadow, vegetation and other pseudo-building information, compared with the traditional pixel-level image information extraction, better performance in building extraction precision, accuracy and completeness.

  18. Concept maps: A tool for knowledge management and synthesis in web-based conversational learning.

    PubMed

    Joshi, Ankur; Singh, Satendra; Jaswal, Shivani; Badyal, Dinesh Kumar; Singh, Tejinder

    2016-01-01

    Web-based conversational learning provides an opportunity for shared knowledge base creation through collaboration and collective wisdom extraction. Usually, the amount of generated information in such forums is very huge, multidimensional (in alignment with the desirable preconditions for constructivist knowledge creation), and sometimes, the nature of expected new information may not be anticipated in advance. Thus, concept maps (crafted from constructed data) as "process summary" tools may be a solution to improve critical thinking and learning by making connections between the facts or knowledge shared by the participants during online discussion This exploratory paper begins with the description of this innovation tried on a web-based interacting platform (email list management software), FAIMER-Listserv, and generated qualitative evidence through peer-feedback. This process description is further supported by a theoretical construct which shows how social constructivism (inclusive of autonomy and complexity) affects the conversational learning. The paper rationalizes the use of concept map as mid-summary tool for extracting information and further sense making out of this apparent intricacy.

  19. Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to Process Medication Information in Outpatient Clinical Notes

    PubMed Central

    Zhou, Li; Plasek, Joseph M; Mahoney, Lisa M; Karipineni, Neelima; Chang, Frank; Yan, Xuemin; Chang, Fenny; Dimaggio, Dana; Goldman, Debora S.; Rocha, Roberto A.

    2011-01-01

    Clinical information is often coded using different terminologies, and therefore is not interoperable. Our goal is to develop a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Evaluators manually reviewed 30 free-text and 10 structured outpatient clinical notes compared to MTERMS output. MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively for medication and temporal information. The local medication terminology had 83.0% coverage compared to RxNorm’s 98.0% coverage for free-text notes. 61.6% of mappings between the terminologies are exact match. Capture of duration was significantly improved (91.7% vs. 52.5%) from systems in the third i2b2 challenge. PMID:22195230

  20. Early Prediction of Students' Grade Point Averages at Graduation: A Data Mining Approach

    ERIC Educational Resources Information Center

    Tekin, Ahmet

    2014-01-01

    Problem Statement: There has recently been interest in educational databases containing a variety of valuable but sometimes hidden data that can be used to help less successful students to improve their academic performance. The extraction of hidden information from these databases often implements aspects of the educational data mining (EDM)…

  1. Rural School Finance: A Critical Analysis of Current Practice in Illinois.

    ERIC Educational Resources Information Center

    Lows, Raymond L.

    School finance is basic to understanding and improving the condition of rural education. Information necessary for financial planning and policy development at the state and local levels is stored in large computer readable data bases and needs to be accessed. Pertinent data elements need to be extracted from the data base and relationships among…

  2. Is There a European View on Health Economic Evaluations? Results from a Synopsis of Methodological Guidelines Used in the EUnetHTA Partner Countries.

    PubMed

    Heintz, Emelie; Gerber-Grote, Andreas; Ghabri, Salah; Hamers, Francoise F; Rupel, Valentina Prevolnik; Slabe-Erker, Renata; Davidson, Thomas

    2016-01-01

    The objectives of this study were to review current methodological guidelines for economic evaluations of all types of technologies in the 33 countries with organizations involved in the European Network for Health Technology Assessment (EUnetHTA), and to provide a general framework for economic evaluation at a European level. Methodological guidelines for health economic evaluations used by EUnetHTA partners were collected through a survey. Information from each guideline was extracted using a pre-tested extraction template. On the basis of the extracted information, a summary describing the methods used by the EUnetHTA countries was written for each methodological item. General recommendations were formulated for methodological issues where the guidelines of the EUnetHTA partners were in agreement or where the usefulness of economic evaluations may be increased by presenting the results in a specific way. At least one contact person from all 33 EUnetHTA countries (100 %) responded to the survey. In total, the review included 51 guidelines, representing 25 countries (eight countries had no methodological guideline for health economic evaluations). On the basis of the results of the extracted information from all 51 guidelines, EUnetHTA issued ten main recommendations for health economic evaluations. The presented review of methodological guidelines for health economic evaluations and the consequent recommendations will hopefully improve the comparability, transferability and overall usefulness of economic evaluations performed within EUnetHTA. Nevertheless, there are still methodological issues that need to be investigated further.

  3. Structured data quality reports to improve EHR data quality.

    PubMed

    Taggart, Jane; Liaw, Siaw-Teng; Yu, Hairong

    2015-12-01

    To examine whether a structured data quality report (SDQR) and feedback sessions with practice principals and managers improve the quality of routinely collected data in EHRs. The intervention was conducted in four general practices participating in the Fairfield neighborhood electronic Practice Based Research Network (ePBRN). Data were extracted from their clinical information systems and summarised as a SDQR to guide feedback to practice principals and managers at 0, 4, 8 and 12 months. Data quality (DQ) metrics included completeness, correctness, consistency and duplication of patient records. Information on data recording practices, data quality improvement, and utility of SDQRs was collected at the feedback sessions at the practices. The main outcome measure was change in the recording of clinical information and level of meeting Royal Australian College of General Practice (RACGP) targets. Birth date was 100% and gender 99% complete at baseline and maintained. DQ of all variables measured improved significantly (p<0.01) over 12 months, but was not sufficient to comply with RACGP standards. Improvement was greatest with allergies. There was no significant change in duplicate records. SDQRs and feedback sessions support general practitioners and practice managers to focus on improving the recording of patient information. However, improved practice DQ, was not sufficient to meet RACGP targets. Randomised controlled studies are required to evaluate strategies to improve data quality and any associated improved safety and quality of care. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Enriching a document collection by integrating information extraction and PDF annotation

    NASA Astrophysics Data System (ADS)

    Powley, Brett; Dale, Robert; Anisimoff, Ilya

    2009-01-01

    Modern digital libraries offer all the hyperlinking possibilities of the World Wide Web: when a reader finds a citation of interest, in many cases she can now click on a link to be taken to the cited work. This paper presents work aimed at providing the same ease of navigation for legacy PDF document collections that were created before the possibility of integrating hyperlinks into documents was ever considered. To achieve our goal, we need to carry out two tasks: first, we need to identify and link citations and references in the text with high reliability; and second, we need the ability to determine physical PDF page locations for these elements. We demonstrate the use of a high-accuracy citation extraction algorithm which significantly improves on earlier reported techniques, and a technique for integrating PDF processing with a conventional text-stream based information extraction pipeline. We demonstrate these techniques in the context of a particular document collection, this being the ACL Anthology; but the same approach can be applied to other document sets.

  5. Photosynthetic Performance of the Imidazolinone Resistant Sunflower Exposed to Single and Combined Treatment by the Herbicide Imazamox and an Amino Acid Extract

    PubMed Central

    Balabanova, Dobrinka A.; Paunov, Momchil; Goltsev, Vasillij; Cuypers, Ann; Vangronsveld, Jaco; Vassilev, Andon

    2016-01-01

    The herbicide imazamox may provoke temporary yellowing and growth retardation in IMI-R sunflower hybrids, more often under stressful environmental conditions. Although, photosynthetic processes are not the primary sites of imazamox action, they might be influenced; therefore, more information about the photosynthetic performance of the herbicide-treated plants could be valuable for a further improvement of the Clearfield technology. Plant biostimulants have been shown to ameliorate damages caused by different stress factors on plants, but very limited information exists about their effects on herbicide-stressed plants. In order to characterize photosynthetic performance of imazamox-treated sunflower IMI-R plants, we carried out experiments including both single and combined treatments by imazamox and a plant biostimulants containing amino acid extract. We found that imazamox application in a rate of 132 μg per plant (equivalent of 40 g active ingredient ha−1) induced negative effects on both light-light dependent photosynthetic redox reactions and leaf gas exchange processes, which was much less pronounced after the combined application of imazamox and amino acid extract. PMID:27826304

  6. Oil-Water Flow Investigations using Planar-Laser Induced Fluorescence and Particle Velocimetry

    NASA Astrophysics Data System (ADS)

    Ibarra, Roberto; Matar, Omar K.; Markides, Christos N.

    2017-11-01

    The study of the complex behaviour of immiscible liquid-liquid flow in pipes requires the implementation of advanced measurement techniques in order to extract detailed in situ information. Laser-based diagnostic techniques allow the extraction of high-resolution space- and time resolve phase and velocity information, which aims to improve the fundamental understanding of these flows and to validate closure relations for advanced multiphase flow models. This work shows a novel simultaneous planar-laser induced fluorescence and particle velocimetry in stratified oil-water flows using two laser light sheets at two different wavelengths for fluids with different refractive indices at horizontal and upward pipe inclinations (<5°) in stratified flow conditions (i.e. separated layers). Complex flow structures are extracted from 2-D instantaneous velocity fields, which are strongly dependent on the pipe inclination at low velocities. The analysis of mean wall-normal velocity profiles and velocity fluctuations suggests the presence of single- and counter-rotating vortices in the azimuthal direction, especially in the oil layer, which can be attributed to the influence of the interfacial waves. Funding from BP, and the TMF Consortium is gratefully acknowledged.

  7. Iterative cross section sequence graph for handwritten character segmentation.

    PubMed

    Dawoud, Amer

    2007-08-01

    The iterative cross section sequence graph (ICSSG) is an algorithm for handwritten character segmentation. It expands the cross section sequence graph concept by applying it iteratively at equally spaced thresholds. The iterative thresholding reduces the effect of information loss associated with image binarization. ICSSG preserves the characters' skeletal structure by preventing the interference of pixels that causes flooding of adjacent characters' segments. Improving the structural quality of the characters' skeleton facilitates better feature extraction and classification, which improves the overall performance of optical character recognition (OCR). Experimental results showed significant improvements in OCR recognition rates compared to other well-established segmentation algorithms.

  8. A Procedure for the supercritical fluid extraction of coal samples, with subsequent analysis of extracted hydrocarbons

    USGS Publications Warehouse

    Kolak, Jonathan J.

    2006-01-01

    Introduction: This report provides a detailed, step-by-step procedure for conducting extractions with supercritical carbon dioxide (CO2) using the ISCO SFX220 supercritical fluid extraction system. Protocols for the subsequent separation and analysis of extracted hydrocarbons are also included in this report. These procedures were developed under the auspices of the project 'Assessment of Geologic Reservoirs for Carbon Dioxide Sequestration' (see http://pubs.usgs.gov/fs/fs026-03/fs026-03.pdf) to investigate possible environmental ramifications associated with CO2 storage (sequestration) in geologic reservoirs, such as deep (~1 km below land surface) coal beds. Supercritical CO2 has been used previously to extract contaminants from geologic matrices. Pressure-temperature conditions within deep coal beds may render CO2 supercritical. In this context, the ability of supercritical CO2 to extract contaminants from geologic materials may serve to mobilize noxious compounds from coal, possibly complicating storage efforts. There currently exists little information on the physicochemical interactions between supercritical CO2 and coal in this setting. The procedures described herein were developed to improve the understanding of these interactions and provide insight into the fate of CO2 and contaminants during simulated CO2 injections.

  9. Improving the performance of lesion-based computer-aided detection schemes of breast masses using a case-based adaptive cueing method

    NASA Astrophysics Data System (ADS)

    Tan, Maxine; Aghaei, Faranak; Wang, Yunzhi; Qian, Wei; Zheng, Bin

    2016-03-01

    Current commercialized CAD schemes have high false-positive (FP) detection rates and also have high correlations in positive lesion detection with radiologists. Thus, we recently investigated a new approach to improve the efficacy of applying CAD to assist radiologists in reading and interpreting screening mammograms. Namely, we developed a new global feature based CAD approach/scheme that can cue the warning sign on the cases with high risk of being positive. In this study, we investigate the possibility of fusing global feature or case-based scores with the local or lesion-based CAD scores using an adaptive cueing method. We hypothesize that the information from the global feature extraction (features extracted from the whole breast regions) are different from and can provide supplementary information to the locally-extracted features (computed from the segmented lesion regions only). On a large and diverse full-field digital mammography (FFDM) testing dataset with 785 cases (347 negative and 438 cancer cases with masses only), we ran our lesion-based and case-based CAD schemes "as is" on the whole dataset. To assess the supplementary information provided by the global features, we used an adaptive cueing method to adaptively adjust the original CAD-generated detection scores (Sorg) of a detected suspicious mass region based on the computed case-based score (Scase) of the case associated with this detected region. Using the adaptive cueing method, better sensitivity results were obtained at lower FP rates (<= 1 FP per image). Namely, increases of sensitivities (in the FROC curves) of up to 6.7% and 8.2% were obtained for the ROI and Case-based results, respectively.

  10. Deep Question Answering for protein annotation

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

    2015-01-01

    Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ PMID:26384372

  11. Deep Question Answering for protein annotation.

    PubMed

    Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

    2015-01-01

    Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.

  12. Leveraging Semantic Knowledge in IRB Databases to Improve Translation Science

    PubMed Central

    Hurdle, John F.; Botkin, Jeffery; Rindflesch, Thomas C.

    2007-01-01

    We introduce the notion that research administrative databases (RADs), such as those increasingly used to manage information flow in the Institutional Review Board (IRB), offer a novel, useful, and mine-able data source overlooked by informaticists. As a proof of concept, using an IRB database we extracted all titles and abstracts from system startup through January 2007 (n=1,876); formatted these in a pseudo-MEDLINE format; and processed them through the SemRep semantic knowledge extraction system. Even though SemRep is tuned to find semantic relations in MEDLINE citations, we found that it performed comparably well on the IRB texts. When adjusted to eliminate non-healthcare IRB submissions (e.g., economic and education studies), SemRep extracted an average of 7.3 semantic relations per IRB abstract (compared to an average of 11.1 for MEDLINE citations) with a precision of 70% (compared to 78% for MEDLINE). We conclude that RADs, as represented by IRB data, are mine-able with existing tools, but that performance will improve as these tools are tuned for RAD structures. PMID:18693856

  13. MARS: bringing the automation of small-molecule bioanalytical sample preparations to a new frontier.

    PubMed

    Li, Ming; Chou, Judy; Jing, Jing; Xu, Hui; Costa, Aldo; Caputo, Robin; Mikkilineni, Rajesh; Flannelly-King, Shane; Rohde, Ellen; Gan, Lawrence; Klunk, Lewis; Yang, Liyu

    2012-06-01

    In recent years, there has been a growing interest in automating small-molecule bioanalytical sample preparations specifically using the Hamilton MicroLab(®) STAR liquid-handling platform. In the most extensive work reported thus far, multiple small-molecule sample preparation assay types (protein precipitation extraction, SPE and liquid-liquid extraction) have been integrated into a suite that is composed of graphical user interfaces and Hamilton scripts. Using that suite, bioanalytical scientists have been able to automate various sample preparation methods to a great extent. However, there are still areas that could benefit from further automation, specifically, the full integration of analytical standard and QC sample preparation with study sample extraction in one continuous run, real-time 2D barcode scanning on the Hamilton deck and direct Laboratory Information Management System database connectivity. We developed a new small-molecule sample-preparation automation system that improves in all of the aforementioned areas. The improved system presented herein further streamlines the bioanalytical workflow, simplifies batch run design, reduces analyst intervention and eliminates sample-handling error.

  14. Enzymatic treatment to improve the quality of black tea extracts.

    PubMed

    Chandini, S K; Rao, L Jaganmohan; Gowthaman, M K; Haware, D J; Subramanian, R

    2011-08-01

    Enzymatic extraction was investigated to improve the quality of black tea extracts with pretreatment of pectinase and tannase independently, successively and simultaneously. Pectinase improved the extractable-solids-yield (ESY) up to 11.5%, without much of an improvement in polyphenols recovery, while tannase pre-treatment showed a significant improvement in polyphenols recovery (14.3%) along with an 11.1% improvement in ESY. Among the four treatments, tannase-alone treatment showed the maximum improvement in tea quality, with higher polyphenols-in-extracted solids. Treatments involving tannase resulted in the significant release of gallic acid, due to its hydrolytic activity, leading to greater solubility besides favourably improving TF/TR ratio. The results suggested that employing a single enzyme, tannase, for the pre-treatment of black tea is desirable. Enzymatic extraction may be preferred over enzymatic clarification as it not only displayed reduction in tea cream and turbidity but also improved the recovery of polyphenols and ESY in the extract, as well as maintaining a good balance of tea quality. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. SHORT-TERM SOLAR FLARE PREDICTION USING MULTIRESOLUTION PREDICTORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu Daren; Huang Xin; Hu Qinghua

    2010-01-20

    Multiresolution predictors of solar flares are constructed by a wavelet transform and sequential feature extraction method. Three predictors-the maximum horizontal gradient, the length of neutral line, and the number of singular points-are extracted from Solar and Heliospheric Observatory/Michelson Doppler Imager longitudinal magnetograms. A maximal overlap discrete wavelet transform is used to decompose the sequence of predictors into four frequency bands. In each band, four sequential features-the maximum, the mean, the standard deviation, and the root mean square-are extracted. The multiresolution predictors in the low-frequency band reflect trends in the evolution of newly emerging fluxes. The multiresolution predictors in the high-frequencymore » band reflect the changing rates in emerging flux regions. The variation of emerging fluxes is decoupled by wavelet transform in different frequency bands. The information amount of these multiresolution predictors is evaluated by the information gain ratio. It is found that the multiresolution predictors in the lowest and highest frequency bands contain the most information. Based on these predictors, a C4.5 decision tree algorithm is used to build the short-term solar flare prediction model. It is found that the performance of the short-term solar flare prediction model based on the multiresolution predictors is greatly improved.« less

  16. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text

    PubMed Central

    Divita, Guy; Carter, Marjorie E.; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H.; Gundlapalli, Adi V.

    2016-01-01

    Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records. PMID:27683667

  17. Using precise word timing information improves decoding accuracy in a multiband-accelerated multimodal reading experiment.

    PubMed

    Vu, An T; Phillips, Jeffrey S; Kay, Kendrick; Phillips, Matthew E; Johnson, Matthew R; Shinkareva, Svetlana V; Tubridy, Shannon; Millin, Rachel; Grossman, Murray; Gureckis, Todd; Bhattacharyya, Rajan; Yacoub, Essa

    2016-01-01

    The blood-oxygen-level-dependent (BOLD) signal measured in functional magnetic resonance imaging (fMRI) experiments is generally regarded as sluggish and poorly suited for probing neural function at the rapid timescales involved in sentence comprehension. However, recent studies have shown the value of acquiring data with very short repetition times (TRs), not merely in terms of improvements in contrast to noise ratio (CNR) through averaging, but also in terms of additional fine-grained temporal information. Using multiband-accelerated fMRI, we achieved whole-brain scans at 3-mm resolution with a TR of just 500 ms at both 3T and 7T field strengths. By taking advantage of word timing information, we found that word decoding accuracy across two separate sets of scan sessions improved significantly, with better overall performance at 7T than at 3T. The effect of TR was also investigated; we found that substantial word timing information can be extracted using fast TRs, with diminishing benefits beyond TRs of 1000 ms.

  18. Preliminary Study of Information Extraction of LANDSAT TM Data for a Suburban/regional Test Site

    NASA Technical Reports Server (NTRS)

    Toll, D. L.

    1985-01-01

    A substantial amount of spectral information is available from TM (as compared to MSS) data for a 14.25 square km area between Beltsville and Laurel, Maryland. Large buildings and street patterns were resolved in the TM imagery. While there was added information content in TM data for discriminating surburban/regional land cover, characteristics of MSS can improve land cover discrimination over TM when conventional classification procedures are used on digital data. The improved qualitization of TM is likely valuable in situations where there are spectral similarities between classes. The spatial resolution in TM decreased land cover discrimination as a result of increased within class variability. For many general digital evaluations, inclusion of four bands representing the four spectral regions can provide much useful land cover discrimination. Inclusion of TM 6 indicates an improvement in spectral class discrimination. Of primary spectral importance is the discrimination between water, vegetative surfaces, and impervious surfaces due to differences in thermal properties. Results from the principle component transformed data clearly indicates additional information content in TM over MSS.

  19. Can multilinguality improve Biomedical Word Sense Disambiguation?

    PubMed

    Duque, Andres; Martinez-Romo, Juan; Araujo, Lourdes

    2016-12-01

    Ambiguity in the biomedical domain represents a major issue when performing Natural Language Processing tasks over the huge amount of available information in the field. For this reason, Word Sense Disambiguation is critical for achieving accurate systems able to tackle complex tasks such as information extraction, summarization or document classification. In this work we explore whether multilinguality can help to solve the problem of ambiguity, and the conditions required for a system to improve the results obtained by monolingual approaches. Also, we analyze the best ways to generate those useful multilingual resources, and study different languages and sources of knowledge. The proposed system, based on co-occurrence graphs containing biomedical concepts and textual information, is evaluated on a test dataset frequently used in biomedicine. We can conclude that multilingual resources are able to provide a clear improvement of more than 7% compared to monolingual approaches, for graphs built from a small number of documents. Also, empirical results show that automatically translated resources are a useful source of information for this particular task. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

    ERIC Educational Resources Information Center

    Sun, Chong

    2012-01-01

    More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…

  1. Mutual information-based LPI optimisation for radar network

    NASA Astrophysics Data System (ADS)

    Shi, Chenguang; Zhou, Jianjiang; Wang, Fei; Chen, Jun

    2015-07-01

    Radar network can offer significant performance improvement for target detection and information extraction employing spatial diversity. For a fixed number of radars, the achievable mutual information (MI) for estimating the target parameters may extend beyond a predefined threshold with full power transmission. In this paper, an effective low probability of intercept (LPI) optimisation algorithm is presented to improve LPI performance for radar network. Based on radar network system model, we first provide Schleher intercept factor for radar network as an optimisation metric for LPI performance. Then, a novel LPI optimisation algorithm is presented, where for a predefined MI threshold, Schleher intercept factor for radar network is minimised by optimising the transmission power allocation among radars in the network such that the enhanced LPI performance for radar network can be achieved. The genetic algorithm based on nonlinear programming (GA-NP) is employed to solve the resulting nonconvex and nonlinear optimisation problem. Some simulations demonstrate that the proposed algorithm is valuable and effective to improve the LPI performance for radar network.

  2. Spatial-spectral preprocessing for endmember extraction on GPU's

    NASA Astrophysics Data System (ADS)

    Jimenez, Luis I.; Plaza, Javier; Plaza, Antonio; Li, Jun

    2016-10-01

    Spectral unmixing is focused in the identification of spectrally pure signatures, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Mainly focused on the spectral information contained in the hyperspectral images, endmember extraction techniques have recently included spatial information to achieve more accurate results. Several algorithms have been developed for automatic or semi-automatic identification of endmembers using spatial and spectral information, including the spectral-spatial endmember extraction (SSEE) where, within a preprocessing step in the technique, both sources of information are extracted from the hyperspectral image and equally used for this purpose. Previous works have implemented the SSEE technique in four main steps: 1) local eigenvectors calculation in each sub-region in which the original hyperspectral image is divided; 2) computation of the maxima and minima projection of all eigenvectors over the entire hyperspectral image in order to obtain a candidates pixels set; 3) expansion and averaging of the signatures of the candidate set; 4) ranking based on the spectral angle distance (SAD). The result of this method is a list of candidate signatures from which the endmembers can be extracted using various spectral-based techniques, such as orthogonal subspace projection (OSP), vertex component analysis (VCA) or N-FINDR. Considering the large volume of data and the complexity of the calculations, there is a need for efficient implementations. Latest- generation hardware accelerators such as commodity graphics processing units (GPUs) offer a good chance for improving the computational performance in this context. In this paper, we develop two different implementations of the SSEE algorithm using GPUs. Both are based on the eigenvectors computation within each sub-region of the first step, one using the singular value decomposition (SVD) and another one using principal component analysis (PCA). Based on our experiments with hyperspectral data sets, high computational performance is observed in both cases.

  3. Extracting Useful Semantic Information from Large Scale Corpora of Text

    ERIC Educational Resources Information Center

    Mendoza, Ray Padilla, Jr.

    2012-01-01

    Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…

  4. A systematic review of systematic reviews on interventions for caregivers of people with chronic conditions.

    PubMed

    Corry, Margarita; While, Alison; Neenan, Kathleen; Smith, Valerie

    2015-04-01

    To evaluate the effectiveness of interventions to support caregivers of people with selected chronic conditions. Informal caregivers provide millions of care hours each week contributing to significant healthcare savings. Despite much research evaluating a range of interventions for caregivers, their impact remains unclear. A systematic review of systematic reviews of interventions to support caregivers of people with selected chronic conditions. The electronic databases of PubMed, CINAHL, British Nursing Index, PsycINFO, Social Science Index (January 1990-May 2014) and The Cochrane Library (Issue 6, June 2014), were searched using Medical Subject Heading and index term combinations of the keywords caregiver, systematic review, intervention and named chronic conditions. Papers were included if they reported a systematic review of interventions for caregivers of people with chronic conditions. The methodological quality of the included reviews was independently assessed by two reviewers using R-AMSTAR. Data were independently extracted by two reviewers using a pre-designed data extraction form. Narrative synthesis of review findings was used to present the results. Eight systematic reviews were included. There was evidence that education and support programme interventions improved caregiver quality of life. Information-giving interventions improved caregiver knowledge for stroke caregivers. Education, support and information-giving interventions warrant further investigation across caregiver groups. A large-scale funded programme for caregiver research is required to ensure that studies are of high quality to inform service development across settings. © 2014 John Wiley & Sons Ltd.

  5. Object-Based Paddy Rice Mapping Using HJ-1A/B Data and Temporal Features Extracted from Time Series MODIS NDVI Data

    PubMed Central

    Singha, Mrinal; Wu, Bingfang; Zhang, Miao

    2016-01-01

    Accurate and timely mapping of paddy rice is vital for food security and environmental sustainability. This study evaluates the utility of temporal features extracted from coarse resolution data for object-based paddy rice classification of fine resolution data. The coarse resolution vegetation index data is first fused with the fine resolution data to generate the time series fine resolution data. Temporal features are extracted from the fused data and added with the multi-spectral data to improve the classification accuracy. Temporal features provided the crop growth information, while multi-spectral data provided the pattern variation of paddy rice. The achieved overall classification accuracy and kappa coefficient were 84.37% and 0.68, respectively. The results indicate that the use of temporal features improved the overall classification accuracy of a single-date multi-spectral image by 18.75% from 65.62% to 84.37%. The minimum sensitivity (MS) of the paddy rice classification has also been improved. The comparison showed that the mapped paddy area was analogous to the agricultural statistics at the district level. This work also highlighted the importance of feature selection to achieve higher classification accuracies. These results demonstrate the potential of the combined use of temporal and spectral features for accurate paddy rice classification. PMID:28025525

  6. Improvement of a headspace solid phase microextraction-gas chromatography/mass spectrometry method for the analysis of wheat bread volatile compounds.

    PubMed

    Raffo, Antonio; Carcea, Marina; Castagna, Claudia; Magrì, Andrea

    2015-08-07

    An improved method based on headspace solid phase microextraction combined with gas chromatography-mass spectrometry (HS-SPME/GC-MS) was proposed for the semi-quantitative determination of wheat bread volatile compounds isolated from both whole slice and crust samples. A DVB/CAR/PDMS fibre was used to extract volatiles from the headspace of a bread powdered sample dispersed in a sodium chloride (20%) aqueous solution and kept for 60min at 50°C under controlled stirring. Thirty-nine out of all the extracted volatiles were fully identified, whereas for 95 other volatiles a tentative identification was proposed, to give a complete as possible profile of wheat bread volatile compounds. The use of an array of ten structurally and physicochemically similar internal standards allowed to markedly improve method precision with respect to previous HS-SPME/GC-MS methods for bread volatiles. Good linearity of the method was verified for a selection of volatiles from several chemical groups by calibration with matrix-matched extraction solutions. This simple, rapid, precise and sensitive method could represent a valuable tool to obtain semi-quantitative information when investigating the influence of technological factors on volatiles formation in wheat bread and other bakery products. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Object-Based Paddy Rice Mapping Using HJ-1A/B Data and Temporal Features Extracted from Time Series MODIS NDVI Data.

    PubMed

    Singha, Mrinal; Wu, Bingfang; Zhang, Miao

    2016-12-22

    Accurate and timely mapping of paddy rice is vital for food security and environmental sustainability. This study evaluates the utility of temporal features extracted from coarse resolution data for object-based paddy rice classification of fine resolution data. The coarse resolution vegetation index data is first fused with the fine resolution data to generate the time series fine resolution data. Temporal features are extracted from the fused data and added with the multi-spectral data to improve the classification accuracy. Temporal features provided the crop growth information, while multi-spectral data provided the pattern variation of paddy rice. The achieved overall classification accuracy and kappa coefficient were 84.37% and 0.68, respectively. The results indicate that the use of temporal features improved the overall classification accuracy of a single-date multi-spectral image by 18.75% from 65.62% to 84.37%. The minimum sensitivity (MS) of the paddy rice classification has also been improved. The comparison showed that the mapped paddy area was analogous to the agricultural statistics at the district level. This work also highlighted the importance of feature selection to achieve higher classification accuracies. These results demonstrate the potential of the combined use of temporal and spectral features for accurate paddy rice classification.

  8. Automatic extraction of relations between medical concepts in clinical texts

    PubMed Central

    Harabagiu, Sanda; Roberts, Kirk

    2011-01-01

    Objective A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records. Materials and methods A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier. Results The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7. Discussion Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction. Conclusion Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available. PMID:21846787

  9. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  10. High-efficient Extraction of Drainage Networks from Digital Elevation Model Data Constrained by Enhanced Flow Enforcement from Known River Map

    NASA Astrophysics Data System (ADS)

    Wu, T.; Li, T.; Li, J.; Wang, G.

    2017-12-01

    Improved drainage network extraction can be achieved by flow enforcement whereby information of known river maps is imposed to the flow-path modeling process. However, the common elevation-based stream burning method can sometimes cause unintended topological errors and misinterpret the overall drainage pattern. We presented an enhanced flow enforcement method to facilitate accurate and efficient process of drainage network extraction. Both the topology of the mapped hydrography and the initial landscape of the DEM are well preserved and fully utilized in the proposed method. An improved stream rasterization is achieved here, yielding continuous, unambiguous and stream-collision-free raster equivalent of stream vectors for flow enforcement. By imposing priority-based enforcement with a complementary flow direction enhancement procedure, the drainage patterns of the mapped hydrography are fully represented in the derived results. The proposed method was tested over the Rogue River Basin, using DEMs with various resolutions. As indicated by the visual and statistical analyses, the proposed method has three major advantages: (1) it significantly reduces the occurrences of topological errors, yielding very accurate watershed partition and channel delineation, (2) it ensures scale-consistent performance at DEMs of various resolutions, and (3) the entire extraction process is well-designed to achieve great computational efficiency.

  11. Sleep Promotes the Extraction of Grammatical Rules

    PubMed Central

    Nieuwenhuis, Ingrid L. C.; Folia, Vasiliki; Forkstam, Christian; Jensen, Ole; Petersson, Karl Magnus

    2013-01-01

    Grammar acquisition is a high level cognitive function that requires the extraction of complex rules. While it has been proposed that offline time might benefit this type of rule extraction, this remains to be tested. Here, we addressed this question using an artificial grammar learning paradigm. During a short-term memory cover task, eighty-one human participants were exposed to letter sequences generated according to an unknown artificial grammar. Following a time delay of 15 min, 12 h (wake or sleep) or 24 h, participants classified novel test sequences as Grammatical or Non-Grammatical. Previous behavioral and functional neuroimaging work has shown that classification can be guided by two distinct underlying processes: (1) the holistic abstraction of the underlying grammar rules and (2) the detection of sequence chunks that appear at varying frequencies during exposure. Here, we show that classification performance improved after sleep. Moreover, this improvement was due to an enhancement of rule abstraction, while the effect of chunk frequency was unaltered by sleep. These findings suggest that sleep plays a critical role in extracting complex structure from separate but related items during integrative memory processing. Our findings stress the importance of alternating periods of learning with sleep in settings in which complex information must be acquired. PMID:23755173

  12. Built-up Areas Extraction in High Resolution SAR Imagery based on the method of Multiple Feature Weighted Fusion

    NASA Astrophysics Data System (ADS)

    Liu, X.; Zhang, J. X.; Zhao, Z.; Ma, A. D.

    2015-06-01

    Synthetic aperture radar in the application of remote sensing technology is becoming more and more widely because of its all-time and all-weather operation, feature extraction research in high resolution SAR image has become a hot topic of concern. In particular, with the continuous improvement of airborne SAR image resolution, image texture information become more abundant. It's of great significance to classification and extraction. In this paper, a novel method for built-up areas extraction using both statistical and structural features is proposed according to the built-up texture features. First of all, statistical texture features and structural features are respectively extracted by classical method of gray level co-occurrence matrix and method of variogram function, and the direction information is considered in this process. Next, feature weights are calculated innovatively according to the Bhattacharyya distance. Then, all features are weighted fusion. At last, the fused image is classified with K-means classification method and the built-up areas are extracted after post classification process. The proposed method has been tested by domestic airborne P band polarization SAR images, at the same time, two groups of experiments based on the method of statistical texture and the method of structural texture were carried out respectively. On the basis of qualitative analysis, quantitative analysis based on the built-up area selected artificially is enforced, in the relatively simple experimentation area, detection rate is more than 90%, in the relatively complex experimentation area, detection rate is also higher than the other two methods. In the study-area, the results show that this method can effectively and accurately extract built-up areas in high resolution airborne SAR imagery.

  13. Integrated feature extraction and selection for neuroimage classification

    NASA Astrophysics Data System (ADS)

    Fan, Yong; Shen, Dinggang

    2009-02-01

    Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.

  14. Fast Reduction Method in Dominance-Based Information Systems

    NASA Astrophysics Data System (ADS)

    Li, Yan; Zhou, Qinghua; Wen, Yongchuan

    2018-01-01

    In real world applications, there are often some data with continuous values or preference-ordered values. Rough sets based on dominance relations can effectively deal with these kinds of data. Attribute reduction can be done in the framework of dominance-relation based approach to better extract decision rules. However, the computational cost of the dominance classes greatly affects the efficiency of attribute reduction and rule extraction. This paper presents an efficient method of computing dominance classes, and further compares it with traditional method with increasing attributes and samples. Experiments on UCI data sets show that the proposed algorithm obviously improves the efficiency of the traditional method, especially for large-scale data.

  15. Ballistic missile precession frequency extraction based on the Viterbi & Kalman algorithm

    NASA Astrophysics Data System (ADS)

    Wu, Longlong; Xie, Yongjie; Xu, Daping; Ren, Li

    2015-12-01

    Radar Micro-Doppler signatures are of great potential for target detection, classification and recognition. In the mid-course phase, warheads flying outside the atmosphere are usually accompanied by precession. Precession may induce additional frequency modulations on the returned radar signal, which can be regarded as a unique signature and provide additional information that is complementary to existing target recognition methods. The main purpose of this paper is to establish a more actual precession model of conical ballistic missile warhead and extract the precession parameters by utilizing Viterbi & Kalman algorithm, which improving the precession frequency estimation accuracy evidently , especially in low SNR.

  16. An improvement of vehicle detection under shadow regions in satellite imagery

    NASA Astrophysics Data System (ADS)

    Karim, Shahid; Zhang, Ye; Ali, Saad; Asif, Muhammad Rizwan

    2018-04-01

    The processing of satellite imagery is dependent upon the quality of imagery. Due to low resolution, it is difficult to extract accurate information according to the requirements of applications. For the purpose of vehicle detection under shadow regions, we have used HOG for feature extraction, SVM is used for classification and HOG is discerned worthwhile tool for complex environments. Shadow images have been scrutinized and found very complex for detection as observed very low detection rates therefore our dedication is towards enhancement of detection rate under shadow regions by implementing appropriate preprocessing. Vehicles are precisely detected under non-shadow regions with high detection rate than shadow regions.

  17. A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents

    PubMed Central

    2011-01-01

    Background A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. Methods This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. Results We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Conclusions Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts. PMID:21489220

  18. A dental vision system for accurate 3D tooth modeling.

    PubMed

    Zhang, Li; Alemzadeh, K

    2006-01-01

    This paper describes an active vision system based reverse engineering approach to extract the three-dimensional (3D) geometric information from dental teeth and transfer this information into Computer-Aided Design/Computer-Aided Manufacture (CAD/CAM) systems to improve the accuracy of 3D teeth models and at the same time improve the quality of the construction units to help patient care. The vision system involves the development of a dental vision rig, edge detection, boundary tracing and fast & accurate 3D modeling from a sequence of sliced silhouettes of physical models. The rig is designed using engineering design methods such as a concept selection matrix and weighted objectives evaluation chart. Reconstruction results and accuracy evaluation are presented on digitizing different teeth models.

  19. Utilization of ontology look-up services in information retrieval for biomedical literature.

    PubMed

    Vishnyakova, Dina; Pasche, Emilie; Lovis, Christian; Ruch, Patrick

    2013-01-01

    With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance efforts. In this context, we developed Ontology Look-up services (OLS), based on NEWT and MeSH vocabularies. Our services were involved in some information retrieval tasks such as gene/disease normalization. The implementation of OLS services significantly accelerated the extraction of particular biomedical facts by structuring and enriching the data context. The results of precision in normalization tasks were boosted on about 20%.

  20. Variable-rate optical communication through the turbulent atmosphere. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Levitt, B. K.

    1971-01-01

    It was demonstrated that the data transmitter can extract real time, channel state information by processing the field received when a pilot tone is sent from the data receiver to the data transmitter. Based on these channel measurements, optimal variable rate techniques were derived and significant improvements in system perforamnce were obtained, particularly at low bit error rates.

  1. A Knowledge-Based Approach to Automatic Detection of Equipment Alarm Sounds in a Neonatal Intensive Care Unit Environment.

    PubMed

    Raboshchuk, Ganna; Nadeu, Climent; Jancovic, Peter; Lilja, Alex Peiro; Kokuer, Munevver; Munoz Mahamud, Blanca; Riverola De Veciana, Ana

    2018-01-01

    A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage, and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modeling, and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%.

  2. A Knowledge-Based Approach to Automatic Detection of Equipment Alarm Sounds in a Neonatal Intensive Care Unit Environment

    PubMed Central

    Nadeu, Climent; Jančovič, Peter; Lilja, Alex Peiró; Köküer, Münevver; Muñoz Mahamud, Blanca; Riverola De Veciana, Ana

    2018-01-01

    A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage, and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modeling, and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%. PMID:29404227

  3. Mining chemical information from open patents

    PubMed Central

    2011-01-01

    Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured. PMID:21999425

  4. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports

    PubMed Central

    Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.

    2010-01-01

    Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations. PMID:21031012

  5. ICT Solutions for Highly-Customized Water Demand Management Strategies

    NASA Astrophysics Data System (ADS)

    Giuliani, M.; Cominola, A.; Castelletti, A.; Fraternali, P.; Guardiola, J.; Barba, J.; Pulido-Velazquez, M.; Rizzoli, A. E.

    2016-12-01

    The recent deployment of smart metering networks is opening new opportunities for advancing the design of residential water demand management strategies (WDMS) relying on improved understanding of water consumers' behaviors. Recent applications showed that retrieving information on users' consumption behaviors, along with their explanatory and/or causal factors, is key to spot potential areas where targeting water saving efforts, and to design user-tailored WDMS. In this study, we explore the potential of ICT-based solutions in supporting the design and implementation of highly customized WDMS. On one side, the collection of consumption data at high spatial and temporal resolutions requires big data analytics and machine learning techniques to extract typical consumption features from the metered population of water users. On the other side, ICT solutions and gamifications can be used as effective means for facilitating both users' engagement and the collection of socio-psychographic users' information. This latter allows interpreting and improving the extracted profiles, ultimately supporting the customization of WDMS, such as awareness campaigns or personalized recommendations. Our approach is implemented in the SmartH2O platform and demonstrated in a pilot application in Valencia, Spain. Results show how the analysis of the smart metered consumption data, combined with the information retrieved from an ICT gamified web user portal, successfully identify the typical consumption profiles of the metered users and supports the design of alternative WDMS targeting the different users' profiles.

  6. Multiscale deep features learning for land-use scene recognition

    NASA Astrophysics Data System (ADS)

    Yuan, Baohua; Li, Shijin; Li, Ning

    2018-01-01

    The features extracted from deep convolutional neural networks (CNNs) have shown their promise as generic descriptors for land-use scene recognition. However, most of the work directly adopts the deep features for the classification of remote sensing images, and does not encode the deep features for improving their discriminative power, which can affect the performance of deep feature representations. To address this issue, we propose an effective framework, LASC-CNN, obtained by locality-constrained affine subspace coding (LASC) pooling of a CNN filter bank. LASC-CNN obtains more discriminative deep features than directly extracted from CNNs. Furthermore, LASC-CNN builds on the top convolutional layers of CNNs, which can incorporate multiscale information and regions of arbitrary resolution and sizes. Our experiments have been conducted using two widely used remote sensing image databases, and the results show that the proposed method significantly improves the performance when compared to other state-of-the-art methods.

  7. Occupational Exposures in the Oil and Gas Extraction Industry: State of the Science and Research Recommendations

    PubMed Central

    Witter, Roxana Z.; Tenney, Liliana; Clark, Suzanne; Newman, Lee S.

    2015-01-01

    The oil and gas extraction industry is rapidly growing due to horizontal drilling and high volume hydraulic fracturing (HVHF). This growth has provided new jobs and economic stimulus. The industry occupational fatality rate is 2.5 times higher than the construction industry and 7 times higher than general industry; however injury rates are lower than the construction industry, suggesting injuries are not being reported. Some workers are exposed to crystalline silica at hazardous levels, above occupational health standards. Other hazards (particulate, benzene, noise, radiation) exist. In this article, we review occupational fatality and injury rate data; discuss research looking at root causes of fatal injuries and hazardous exposures; review interventions aimed at improving occupational health and safety; and discuss information gaps and areas of needed research. We also describe Wyoming efforts to improve occupational safety in this industry, as a case example. PMID:24634090

  8. Autism, Context/Noncontext Information Processing, and Atypical Development

    PubMed Central

    Skoyles, John R.

    2011-01-01

    Autism has been attributed to a deficit in contextual information processing. Attempts to understand autism in terms of such a defect, however, do not include more recent computational work upon context. This work has identified that context information processing depends upon the extraction and use of the information hidden in higher-order (or indirect) associations. Higher-order associations underlie the cognition of context rather than that of situations. This paper starts by examining the differences between higher-order and first-order (or direct) associations. Higher-order associations link entities not directly (as with first-order ones) but indirectly through all the connections they have via other entities. Extracting this information requires the processing of past episodes as a totality. As a result, this extraction depends upon specialised extraction processes separate from cognition. This information is then consolidated. Due to this difference, the extraction/consolidation of higher-order information can be impaired whilst cognition remains intact. Although not directly impaired, cognition will be indirectly impaired by knock on effects such as cognition compensating for absent higher-order information with information extracted from first-order associations. This paper discusses the implications of this for the inflexible, literal/immediate, and inappropriate information processing of autistic individuals. PMID:22937255

  9. Assessing the Agreement Between Eo-Based Semi-Automated Landslide Maps with Fuzzy Manual Landslide Delineation

    NASA Astrophysics Data System (ADS)

    Albrecht, F.; Hölbling, D.; Friedl, B.

    2017-09-01

    Landslide mapping benefits from the ever increasing availability of Earth Observation (EO) data resulting from programmes like the Copernicus Sentinel missions and improved infrastructure for data access. However, there arises the need for improved automated landslide information extraction processes from EO data while the dominant method is still manual delineation. Object-based image analysis (OBIA) provides the means for the fast and efficient extraction of landslide information. To prove its quality, automated results are often compared to manually delineated landslide maps. Although there is awareness of the uncertainties inherent in manual delineations, there is a lack of understanding how they affect the levels of agreement in a direct comparison of OBIA-derived landslide maps and manually derived landslide maps. In order to provide an improved reference, we present a fuzzy approach for the manual delineation of landslides on optical satellite images, thereby making the inherent uncertainties of the delineation explicit. The fuzzy manual delineation and the OBIA classification are compared by accuracy metrics accepted in the remote sensing community. We have tested this approach for high resolution (HR) satellite images of three large landslides in Austria and Italy. We were able to show that the deviation of the OBIA result from the manual delineation can mainly be attributed to the uncertainty inherent in the manual delineation process, a relevant issue for the design of validation processes for OBIA-derived landslide maps.

  10. Structured reporting of MRI of the shoulder - improvement of report quality?

    PubMed

    Gassenmaier, Sebastian; Armbruster, Marco; Haasters, Florian; Helfen, Tobias; Henzler, Thomas; Alibek, Sedat; Pförringer, Dominik; Sommer, Wieland H; Sommer, Nora N

    2017-10-01

    To evaluate the effect of structured reports (SRs) in comparison to non-structured narrative free text (NRs) shoulder MRI reports and potential effects of both types of reporting on completeness, readability, linguistic quality and referring surgeons' satisfaction. Thirty patients after trauma or with suspected degenerative changes of the shoulder were included in this study (2012-2015). All patients underwent shoulder MRI for further assessment and possible surgical planning. NRs were generated during clinical routine. Corresponding SRs were created using a dedicated template. All 60 reports were evaluated by two experienced orthopaedic shoulder surgeons using a questionnaire that included eight questions. Eighty per cent of the SRs were fully complete without any missing key features whereas only 45% of the NRs were fully complete (p < 0.001). The extraction of information was regarded to be easy in 92% of the SRs and 63% of the NRs. The overall quality of the SRs was rated better than that of the NRs (p < 0.001). Structured reporting of shoulder MRI improves the readability as well as the linguistic quality of radiological reports, and potentially leads to a higher satisfaction of referring physicians. • Structured MRI reports of the shoulder improve readability. • Structured reporting facilitates information extraction. • Referring physicians prefer structured reports to narrative free text reports. • Structured MRI reports of the shoulder can reduce radiologist re-consultations.

  11. Two-stage Framework for a Topology-Based Projection and Visualization of Classified Document Collections

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oesterling, Patrick; Scheuermann, Gerik; Teresniak, Sven

    During the last decades, electronic textual information has become the world's largest and most important information source available. People have added a variety of daily newspapers, books, scientific and governmental publications, blogs and private messages to this wellspring of endless information and knowledge. Since neither the existing nor the new information can be read in its entirety, computers are used to extract and visualize meaningful or interesting topics and documents from this huge information clutter. In this paper, we extend, improve and combine existing individual approaches into an overall framework that supports topological analysis of high dimensional document point cloudsmore » given by the well-known tf-idf document-term weighting method. We show that traditional distance-based approaches fail in very high dimensional spaces, and we describe an improved two-stage method for topology-based projections from the original high dimensional information space to both two dimensional (2-D) and three dimensional (3-D) visualizations. To show the accuracy and usability of this framework, we compare it to methods introduced recently and apply it to complex document and patent collections.« less

  12. Study on the extraction method of tidal flat area in northern Jiangsu Province based on remote sensing waterlines

    NASA Astrophysics Data System (ADS)

    Zhang, Yuanyuan; Gao, Zhiqiang; Liu, Xiangyang; Xu, Ning; Liu, Chaoshun; Gao, Wei

    2016-09-01

    Reclamation caused a significant dynamic change in the coastal zone, the tidal flat zone is an unstable reserve land resource, it has important significance for its research. In order to realize the efficient extraction of the tidal flat area information, this paper takes Rudong County in Jiangsu Province as the research area, using the HJ1A/1B images as the data source, on the basis of previous research experience and literature review, the paper chooses the method of object-oriented classification as a semi-automatic extraction method to generate waterlines. Then waterlines are analyzed by DSAS software to obtain tide points, automatic extraction of outer boundary points are followed under the use of Python to determine the extent of tidal flats in 2014 of Rudong County, the extraction area was 55182hm2, the confusion matrix is used to verify the accuracy and the result shows that the kappa coefficient is 0.945. The method could improve deficiencies of previous studies and its available free nature on the Internet makes a generalization.

  13. Predicting extractives content of Eucalyptus bosistoana F. Muell. Heartwood from stem cores by near infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Li, Yanjie; Altaner, Clemens

    2018-06-01

    Time and resource are the restricting factors for the wider use of chemical information of wood in tree breeding programs. NIR offers an advantage over wet-chemical analysis in these aspects and is starting to be used for tree breeding. This work describes the development of a NIR-based assessment of extractive content in heartwood of E. bosistoana, which does not require milling and conditioning of the samples. This was achieved by applying the signal processing algorithms (external parameter orthogonalisation (EPO) and significance multivariate correlation (sMC)) to spectra obtained from solid wood cores, which were able to correct for moisture content, grain direction and sample form. The accuracy of extractive content predictions was further improved by variable selection, resulting in a root mean square error of 1.27%. Considering the range of extractive content in E. bosistoana heartwood of 1.3 to 15.0%, the developed NIR calibration has the potential to be used in an E. bosistoana breeding program or to assess the special variation in extractive content throughout a stem.

  14. Multivariate analysis of the volatile components in tobacco based on infrared-assisted extraction coupled to headspace solid-phase microextraction and gas chromatography-mass spectrometry.

    PubMed

    Yang, Yanqin; Pan, Yuanjiang; Zhou, Guojun; Chu, Guohai; Jiang, Jian; Yuan, Kailong; Xia, Qian; Cheng, Changhe

    2016-11-01

    A novel infrared-assisted extraction coupled to headspace solid-phase microextraction followed by gas chromatography with mass spectrometry method has been developed for the rapid determination of the volatile components in tobacco. The optimal extraction conditions for maximizing the extraction efficiency were as follows: 65 μm polydimethylsiloxane-divinylbenzene fiber, extraction time of 20 min, infrared power of 175 W, and distance between the infrared lamp and the headspace vial of 2 cm. Under the optimum conditions, 50 components were found to exist in all ten tobacco samples from different geographical origins. Compared with conventional water-bath heating and nonheating extraction methods, the extraction efficiency of infrared-assisted extraction was greatly improved. Furthermore, multivariate analysis including principal component analysis, hierarchical cluster analysis, and similarity analysis were performed to evaluate the chemical information of these samples and divided them into three classifications, including rich, moderate, and fresh flavors. The above-mentioned classification results were consistent with the sensory evaluation, which was pivotal and meaningful for tobacco discrimination. As a simple, fast, cost-effective, and highly efficient method, the infrared-assisted extraction coupled to headspace solid-phase microextraction technique is powerful and promising for distinguishing the geographical origins of the tobacco samples coupled to suitable chemometrics. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

    ERIC Educational Resources Information Center

    North, Jamie S.; Williams, A. Mark

    2008-01-01

    The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…

  16. Wavelet phase extracting demodulation algorithm based on scale factor for optical fiber Fabry-Perot sensing.

    PubMed

    Zhang, Baolin; Tong, Xinglin; Hu, Pan; Guo, Qian; Zheng, Zhiyuan; Zhou, Chaoran

    2016-12-26

    Optical fiber Fabry-Perot (F-P) sensors have been used in various on-line monitoring of physical parameters such as acoustics, temperature and pressure. In this paper, a wavelet phase extracting demodulation algorithm for optical fiber F-P sensing is first proposed. In application of this demodulation algorithm, search range of scale factor is determined by estimated cavity length which is obtained by fast Fourier transform (FFT) algorithm. Phase information of each point on the optical interference spectrum can be directly extracted through the continuous complex wavelet transform without de-noising. And the cavity length of the optical fiber F-P sensor is calculated by the slope of fitting curve of the phase. Theorical analysis and experiment results show that this algorithm can greatly reduce the amount of computation and improve demodulation speed and accuracy.

  17. Can we replace curation with information extraction software?

    PubMed

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.

  18. A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.

    PubMed

    Redd, Andrew M; Gundlapalli, Adi V; Divita, Guy; Carter, Marjorie E; Tran, Le-Thuy; Samore, Matthew H

    2017-07-01

    Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up. Published by Elsevier Inc.

  19. Text Mining for Protein Docking

    PubMed Central

    Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.

    2015-01-01

    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466

  20. Fundamental finite key limits for one-way information reconciliation in quantum key distribution

    NASA Astrophysics Data System (ADS)

    Tomamichel, Marco; Martinez-Mateo, Jesus; Pacher, Christoph; Elkouss, David

    2017-11-01

    The security of quantum key distribution protocols is guaranteed by the laws of quantum mechanics. However, a precise analysis of the security properties requires tools from both classical cryptography and information theory. Here, we employ recent results in non-asymptotic classical information theory to show that one-way information reconciliation imposes fundamental limitations on the amount of secret key that can be extracted in the finite key regime. In particular, we find that an often used approximation for the information leakage during information reconciliation is not generally valid. We propose an improved approximation that takes into account finite key effects and numerically test it against codes for two probability distributions, that we call binary-binary and binary-Gaussian, that typically appear in quantum key distribution protocols.

  1. Temporally coherent 4D video segmentation for teleconferencing

    NASA Astrophysics Data System (ADS)

    Ehmann, Jana; Guleryuz, Onur G.

    2013-09-01

    We develop an algorithm for 4-D (RGB+Depth) video segmentation targeting immersive teleconferencing ap- plications on emerging mobile devices. Our algorithm extracts users from their environments and places them onto virtual backgrounds similar to green-screening. The virtual backgrounds increase immersion and interac- tivity, relieving the users of the system from distractions caused by disparate environments. Commodity depth sensors, while providing useful information for segmentation, result in noisy depth maps with a large number of missing depth values. By combining depth and RGB information, our work signi¯cantly improves the other- wise very coarse segmentation. Further imposing temporal coherence yields compositions where the foregrounds seamlessly blend with the virtual backgrounds with minimal °icker and other artifacts. We achieve said improve- ments by correcting the missing information in depth maps before fast RGB-based segmentation, which operates in conjunction with temporal coherence. Simulation results indicate the e±cacy of the proposed system in video conferencing scenarios.

  2. The Knowledge Program: an Innovative, Comprehensive Electronic Data Capture System and Warehouse

    PubMed Central

    Katzan, Irene; Speck, Micheal; Dopler, Chris; Urchek, John; Bielawski, Kay; Dunphy, Cheryl; Jehi, Lara; Bae, Charles; Parchman, Alandra

    2011-01-01

    Data contained in the electronic health record (EHR) present a tremendous opportunity to improve quality-of-care and enhance research capabilities. However, the EHR is not structured to provide data for such purposes: most clinical information is entered as free text and content varies substantially between providers. Discrete information on patients’ functional status is typically not collected. Data extraction tools are often unavailable. We have developed the Knowledge Program (KP), a comprehensive initiative to improve the collection of discrete clinical information into the EHR and the retrievability of data for use in research, quality, and patient care. A distinct feature of the KP is the systematic collection of patient-reported outcomes, which is captured discretely, allowing more refined analyses of care outcomes. The KP capitalizes on features of the Epic EHR and utilizes an external IT infrastructure distinct from Epic for enhanced functionality. Here, we describe the development and implementation of the KP. PMID:22195124

  3. Augmenting Satellite Precipitation Estimation with Lightning Information

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mahrooghy, Majid; Anantharaj, Valentine G; Younan, Nicolas H.

    2013-01-01

    We have used lightning information to augment the Precipitation Estimation from Remotely Sensed Imagery using an Artificial Neural Network - Cloud Classification System (PERSIANN-CCS). Co-located lightning data are used to segregate cloud patches, segmented from GOES-12 infrared data, into either electrified (EL) or non-electrified (NEL) patches. A set of features is extracted separately for the EL and NEL cloud patches. The features for the EL cloud patches include new features based on the lightning information. The cloud patches are classified and clustered using self-organizing maps (SOM). Then brightness temperature and rain rate (T-R) relationships are derived for the different clusters.more » Rain rates are estimated for the cloud patches based on their representative T-R relationship. The Equitable Threat Score (ETS) for daily precipitation estimates is improved by almost 12% for the winter season. In the summer, no significant improvements in ETS are noted.« less

  4. A semi-automatic method for extracting thin line structures in images as rooted tree network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brazzini, Jacopo; Dillard, Scott; Soille, Pierre

    2010-01-01

    This paper addresses the problem of semi-automatic extraction of line networks in digital images - e.g., road or hydrographic networks in satellite images, blood vessels in medical images, robust. For that purpose, we improve a generic method derived from morphological and hydrological concepts and consisting in minimum cost path estimation and flow simulation. While this approach fully exploits the local contrast and shape of the network, as well as its arborescent nature, we further incorporate local directional information about the structures in the image. Namely, an appropriate anisotropic metric is designed by using both the characteristic features of the targetmore » network and the eigen-decomposition of the gradient structure tensor of the image. Following, the geodesic propagation from a given seed with this metric is combined with hydrological operators for overland flow simulation to extract the line network. The algorithm is demonstrated for the extraction of blood vessels in a retina image and of a river network in a satellite image.« less

  5. Efficacy Evaluation of Different Wavelet Feature Extraction Methods on Brain MRI Tumor Detection

    NASA Astrophysics Data System (ADS)

    Nabizadeh, Nooshin; John, Nigel; Kubat, Miroslav

    2014-03-01

    Automated Magnetic Resonance Imaging brain tumor detection and segmentation is a challenging task. Among different available methods, feature-based methods are very dominant. While many feature extraction techniques have been employed, it is still not quite clear which of feature extraction methods should be preferred. To help improve the situation, we present the results of a study in which we evaluate the efficiency of using different wavelet transform features extraction methods in brain MRI abnormality detection. Applying T1-weighted brain image, Discrete Wavelet Transform (DWT), Discrete Wavelet Packet Transform (DWPT), Dual Tree Complex Wavelet Transform (DTCWT), and Complex Morlet Wavelet Transform (CMWT) methods are applied to construct the feature pool. Three various classifiers as Support Vector Machine, K Nearest Neighborhood, and Sparse Representation-Based Classifier are applied and compared for classifying the selected features. The results show that DTCWT and CMWT features classified with SVM, result in the highest classification accuracy, proving of capability of wavelet transform features to be informative in this application.

  6. Ensemble methods with simple features for document zone classification

    NASA Astrophysics Data System (ADS)

    Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing

    2012-01-01

    Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.

  7. Ship Detection Based on Multiple Features in Random Forest Model for Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Li, N.; Ding, L.; Zhao, H.; Shi, J.; Wang, D.; Gong, X.

    2018-04-01

    A novel method for detecting ships which aim to make full use of both the spatial and spectral information from hyperspectral images is proposed. Firstly, the band which is high signal-noise ratio in the range of near infrared or short-wave infrared spectrum, is used to segment land and sea on Otsu threshold segmentation method. Secondly, multiple features that include spectral and texture features are extracted from hyperspectral images. Principal components analysis (PCA) is used to extract spectral features, the Grey Level Co-occurrence Matrix (GLCM) is used to extract texture features. Finally, Random Forest (RF) model is introduced to detect ships based on the extracted features. To illustrate the effectiveness of the method, we carry out experiments over the EO-1 data by comparing single feature and different multiple features. Compared with the traditional single feature method and Support Vector Machine (SVM) model, the proposed method can stably achieve the target detection of ships under complex background and can effectively improve the detection accuracy of ships.

  8. Study on Karst Information Identification of Qiandongnan Prefecture Based on RS and GIS Technology

    NASA Astrophysics Data System (ADS)

    Yao, M.; Zhou, G.; Wang, W.; Wu, Z.; Huang, Y.; Huang, X.

    2018-04-01

    Karst area is a pure natural resource base, at the same time, due to the special geological environment; there are droughts and floods alternating with frequent karst collapse, rocky desertification and other resource and environment problems, which seriously restrict the sustainable economic and social development in karst areas. Therefore, this paper identifies and studies the karst, and clarifies the distribution of karst. Provide basic data for the rational development of resources in the karst region and the governance of desertification. Due to the uniqueness of the karst landscape, it can't be directly recognized and extracted by computer in remote sensing images. Therefore, this paper uses the idea of "RS + DEM" to solve the above problems. this article is based on Landsat-5 TM imagery in 2010 and DEM data, proposes the methods to identify karst information research what is use of slope vector diagram, vegetation distribution map, distribution map of karst rocky desertification and other auxiliary data in combination with the signs for human-computer interaction interpretation, identification and extraction of peak forest, peaks cluster and isolated peaks, and further extraction of karst depression. Experiments show that this method achieves the "RS + DEM" mode through the reasonable combination of remote sensing images and DEM data. It not only effectively extracts karst areas covered with vegetation, but also quickly and accurately locks down the karst area and greatly improves the efficiency and precision of visual interpretation. The accurate interpretation rate of karst information in study area in this paper is 86.73 %.

  9. Extracting Rayleigh wave dispersion from ambient noise across the Indian Ocean

    NASA Astrophysics Data System (ADS)

    Ma, Z.; Dalton, C. A.

    2016-12-01

    Rayleigh wave dispersion extracted from ambient seismic noise has been widely used to image crustal and uppermost mantle structure. Applications of this approach in continental settings are abundant, but there have been relatively few studies within ocean basins. In this presentation, we will first demonstrate the feasibility of extracting high quality Rayleigh wave dispersion information from ambient noise across the entire Indian Ocean basin. Phase arrival times measured from ambient noise are largely consistent with the ones predicted from 2-D phase velocity maps that were determined from earthquake data alone. Secondly, we show that adding dispersion information extracted from ambient noise to existing earthquake data can indeed improve the resolution of phase velocity maps by about 20% in the western Indian Ocean region where the station distribution is the densest. High quality Rayleigh wave dispersion information can be obtained from stacking waveforms over less than two years at land stations and less than four years at island stations. After removing the age dependent average velocities, the 2-D phase velocity maps show slow anomalies associated with the Seychelles-Mascarene plateau. Forward modeling suggests that the crust is about 15-25 km thick in this area, which agrees with previous estimates obtained from gravity data. We also observe that the slow anomaly related to the Central Indian Ridge is asymmetric. The center of this slow anomaly lies to the west side of ridge, which is opposite to the ridge migration direction. This asymmetry probably reflects the interactions between the ridge and nearby hotspots.

  10. Improved Spatial Differencing Scheme for 2-D DOA Estimation of Coherent Signals with Uniform Rectangular Arrays.

    PubMed

    Shi, Junpeng; Hu, Guoping; Sun, Fenggang; Zong, Binfeng; Wang, Xin

    2017-08-24

    This paper proposes an improved spatial differencing (ISD) scheme for two-dimensional direction of arrival (2-D DOA) estimation of coherent signals with uniform rectangular arrays (URAs). We first divide the URA into a number of row rectangular subarrays. Then, by extracting all the data information of each subarray, we only perform difference-operation on the auto-correlations, while the cross-correlations are kept unchanged. Using the reconstructed submatrices, both the forward only ISD (FO-ISD) and forward backward ISD (FB-ISD) methods are developed under the proposed scheme. Compared with the existing spatial smoothing techniques, the proposed scheme can use more data information of the sample covariance matrix and also suppress the effect of additive noise more effectively. Simulation results show that both FO-ISD and FB-ISD can improve the estimation performance largely as compared to the others, in white or colored noise conditions.

  11. Improved Spatial Differencing Scheme for 2-D DOA Estimation of Coherent Signals with Uniform Rectangular Arrays

    PubMed Central

    Hu, Guoping; Zong, Binfeng; Wang, Xin

    2017-01-01

    This paper proposes an improved spatial differencing (ISD) scheme for two-dimensional direction of arrival (2-D DOA) estimation of coherent signals with uniform rectangular arrays (URAs). We first divide the URA into a number of row rectangular subarrays. Then, by extracting all the data information of each subarray, we only perform difference-operation on the auto-correlations, while the cross-correlations are kept unchanged. Using the reconstructed submatrices, both the forward only ISD (FO-ISD) and forward backward ISD (FB-ISD) methods are developed under the proposed scheme. Compared with the existing spatial smoothing techniques, the proposed scheme can use more data information of the sample covariance matrix and also suppress the effect of additive noise more effectively. Simulation results show that both FO-ISD and FB-ISD can improve the estimation performance largely as compared to the others, in white or colored noise conditions. PMID:28837115

  12. Logarithmic profile mapping multi-scale Retinex for restoration of low illumination images

    NASA Astrophysics Data System (ADS)

    Shi, Haiyan; Kwok, Ngaiming; Wu, Hongkun; Li, Ruowei; Liu, Shilong; Lin, Ching-Feng; Wong, Chin Yeow

    2018-04-01

    Images are valuable information sources for many scientific and engineering applications. However, images captured in poor illumination conditions would have a large portion of dark regions that could heavily degrade the image quality. In order to improve the quality of such images, a restoration algorithm is developed here that transforms the low input brightness to a higher value using a modified Multi-Scale Retinex approach. The algorithm is further improved by a entropy based weighting with the input and the processed results to refine the necessary amplification at regions of low brightness. Moreover, fine details in the image are preserved by applying the Retinex principles to extract and then re-insert object edges to obtain an enhanced image. Results from experiments using low and normal illumination images have shown satisfactory performances with regard to the improvement in information contents and the mitigation of viewing artifacts.

  13. An Improved Text Localization Method for Natural Scene Images

    NASA Astrophysics Data System (ADS)

    Jiang, Mengdi; Cheng, Jianghua; Chen, Minghui; Ku, Xishu

    2018-01-01

    In order to extract text information effectively from natural scene image with complex background, multi-orientation perspective and multilingual languages, we present a new method based on the improved Stroke Feature Transform (SWT). Firstly, The Maximally Stable Extremal Region (MSER) method is used to detect text candidate regions. Secondly, the SWT algorithm is used in the candidate regions, which can improve the edge detection compared with tradition SWT method. Finally, the Frequency-tuned (FT) visual saliency is introduced to remove non-text candidate regions. The experiment results show that, the method can achieve good robustness for complex background with multi-orientation perspective, various characters and font sizes.

  14. Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

    PubMed

    Dave, Jaydev K; Gingold, Eric L

    2013-01-01

    The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.

  15. Accurate facade feature extraction method for buildings from three-dimensional point cloud data considering structural information

    NASA Astrophysics Data System (ADS)

    Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia

    2018-05-01

    Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.

  16. Ginkgo biloba Extract for Patients with Early Diabetic Nephropathy: A Systematic Review

    PubMed Central

    Zhang, Lei; Mao, Wei; Guo, Xinfeng; Wu, Yifan; Li, Chuang; Lu, Zhaoyu; Su, Guobin; Li, Xiaoyan; Liu, Zhuangzhu; Guo, Rong; Jie, Xina; Wen, Zehuai; Liu, Xusheng

    2013-01-01

    Objectives. To evaluate the effectiveness and safety of a Ginkgo biloba extract for patients with early diabetic nephropathy. Methods. Randomised controlled trials (RCTs) conducted on adults with early diabetic nephropathy which used Gingko biloba extract were included. The major databases were searched, and manufacturers of Gingko biloba products were contacted for information on any published or unpublished studies. Two authors independently extracted the data from the included studies. Data analysis was conducted using Review Manager 5.0 software. Results. Sixteen RCTs were included. Ginkgo biloba extract decreased the urinary albumin excretion rate (UAER), fasting blood glucose (FBG), serum creatinine (SCR), and blood urea nitrogen (BUN). The extract also improved hemorheology. The methodological quality in the included studies was low. The explicit generation of the allocation sequence was described in only 6 trials. None of the included trials were confirmed to use blinding. Three studies had observed adverse events. One study using angiotensin-converting enzyme inhibitor (ACEi) reported mild cough in both groups. No serious adverse effects were reported. Conclusions. Gingko biloba extract is a valuable drug which has prospect in treating early diabetic nephropathy, especially with high UAER baseline level. The safety for early diabetic nephropathy is uncertain. Long-term, double-blinded RCTs with large sample sizes are still needed to provide stronger evidence. PMID:23533513

  17. Hippophae rhamnoides oil-in-water (O/W) emulsion improves barrier function in healthy human subjects.

    PubMed

    Khan, Barkat Ali; Akhtar, Naveed

    2014-11-01

    This study aimed to investigate the changes in skin barrier function in human subjects, following long-term topical application of Hippophae rhamnoides oil-in-water (O/W) emulsion whereas effects were measred using non-invasive probes like tewameter and corneometer. For this purpose, two stable oil-in-water (O/W) emulsions were formulated one with 5% Hippophae rhamnoides extract and other without extracts. Thirteen healthy, male subjects with a mean age 27 ± 4.8 years were enrolled after their informed consents. The subjects were instructed to apply either the active formulation or the base formulation over 84 days while they were not known with the contents of either formulation. Biometrological measurements of skin hydration and transepidermal water loss (TEWL) were performed on both sides of the face in each volunteer at baseline and on day 07, 14, 21, 28, 42, 56, 70 and 84. The statistical analysis revealed formulation with 5% plant extract was superior compared to placebo (base formulation) as formulation with extract have shown extremely significant improvements in skin hydration (p=0.0003) and TEWL (p=0.0087) throughout treatment course. Moreover, a significant (p<0.05) correlation between the active formulation and the improvement of the skin barrier functions was observed. The active formulation found to be superior to that of placebo. Results affirmed that future studies are necessary to clinically evaluate the active formulation hence it can be proposed that Hippophae rhamnoides emulsion could be an alternative pharmacological tool in treating barrier compromised conditions of skin.

  18. The Agent of extracting Internet Information with Lead Order

    NASA Astrophysics Data System (ADS)

    Mo, Zan; Huang, Chuliang; Liu, Aijun

    In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.

  19. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension.

    PubMed

    Yu, Hong; Agarwal, Shashank; Johnston, Mark; Cohen, Aaron

    2009-01-06

    Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension. Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score. Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39-68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86-97%; this indicates that researchers felt that only 3-14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles. We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend.

  20. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension

    PubMed Central

    2009-01-01

    Background Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension. Methods Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score. Results Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39–68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86–97%; this indicates that researchers felt that only 3–14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles. Conclusion We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend. PMID:19126221

  1. Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection

    PubMed Central

    Lossio-Ventura, Juan Antonio; Hogan, William; Modave, François; Hicks, Amanda; Hanna, Josh; Guo, Yi; He, Zhe; Bian, Jiang

    2017-01-01

    Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993. PMID:28503356

  2. Fast Localization in Large-Scale Environments Using Supervised Indexing of Binary Features.

    PubMed

    Youji Feng; Lixin Fan; Yihong Wu

    2016-01-01

    The essence of image-based localization lies in matching 2D key points in the query image and 3D points in the database. State-of-the-art methods mostly employ sophisticated key point detectors and feature descriptors, e.g., Difference of Gaussian (DoG) and Scale Invariant Feature Transform (SIFT), to ensure robust matching. While a high registration rate is attained, the registration speed is impeded by the expensive key point detection and the descriptor extraction. In this paper, we propose to use efficient key point detectors along with binary feature descriptors, since the extraction of such binary features is extremely fast. The naive usage of binary features, however, does not lend itself to significant speedup of localization, since existing indexing approaches, such as hierarchical clustering trees and locality sensitive hashing, are not efficient enough in indexing binary features and matching binary features turns out to be much slower than matching SIFT features. To overcome this, we propose a much more efficient indexing approach for approximate nearest neighbor search of binary features. This approach resorts to randomized trees that are constructed in a supervised training process by exploiting the label information derived from that multiple features correspond to a common 3D point. In the tree construction process, node tests are selected in a way such that trees have uniform leaf sizes and low error rates, which are two desired properties for efficient approximate nearest neighbor search. To further improve the search efficiency, a probabilistic priority search strategy is adopted. Apart from the label information, this strategy also uses non-binary pixel intensity differences available in descriptor extraction. By using the proposed indexing approach, matching binary features is no longer much slower but slightly faster than matching SIFT features. Consequently, the overall localization speed is significantly improved due to the much faster key point detection and descriptor extraction. It is empirically demonstrated that the localization speed is improved by an order of magnitude as compared with state-of-the-art methods, while comparable registration rate and localization accuracy are still maintained.

  3. Development of the major trauma case review tool.

    PubMed

    Curtis, Kate; Mitchell, Rebecca; McCarthy, Amy; Wilson, Kellie; Van, Connie; Kennedy, Belinda; Tall, Gary; Holland, Andrew; Foster, Kim; Dickinson, Stuart; Stelfox, Henry T

    2017-02-28

    As many as half of all patients with major traumatic injuries do not receive the recommended care, with variance in preventable mortality reported across the globe. This variance highlights the need for a comprehensive process for monitoring and reviewing patient care, central to which is a consistent peer-review process that includes trauma system safety and human factors. There is no published, evidence-informed standardised tool that considers these factors for use in adult or paediatric trauma case peer-review. The aim of this research was to develop and validate a trauma case review tool to facilitate clinical review of paediatric trauma patient care in extracting information to facilitate monitoring, inform change and enable loop closure. Development of the trauma case review tool was multi-faceted, beginning with a review of the trauma audit tool literature. Data were extracted from the literature to inform iterative tool development using a consensus approach. Inter-rater agreement was assessed for both the pilot and finalised versions of the tool. The final trauma case review tool contained ten sections, including patient factors (such as pre-existing conditions), presenting problem, a timeline of events, factors contributing to the care delivery problem (including equipment, work environment, staff action, organizational factors), positive aspects of care and the outcome of panel discussion. After refinement, the inter-rater reliability of the human factors and outcome components of the tool improved with an average 86% agreement between raters. This research developed an evidence-informed tool for use in paediatric trauma case review that considers both system safety and human factors to facilitate clinical review of trauma patient care. This tool can be used to identify opportunities for improvement in trauma care and guide quality assurance activities. Validation is required in the adult population.

  4. New techniques for positron emission tomography in the study of human neurological disorders

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuhl, D.E.

    1993-01-01

    This progress report describes accomplishments of four programs. The four programs are entitled (1) Faster,simpler processing of positron-computing precursors: New physicochemical approaches, (2) Novel solid phase reagents and methods to improve radiosynthesis and isotope production, (3) Quantitative evaluation of the extraction of information from PET images, and (4) Optimization of tracer kinetic methods for radioligand studies in PET.

  5. Image Analysis and Modeling

    DTIC Science & Technology

    1976-03-01

    This report summarizes the results of the research program on Image Analysis and Modeling supported by the Defense Advanced Research Projects Agency...The objective is to achieve a better understanding of image structure and to use this knowledge to develop improved image models for use in image ... analysis and processing tasks such as information extraction, image enhancement and restoration, and coding. The ultimate objective of this research is

  6. Outcome-Focused Market Intelligence: Extracting Better Value and Effectiveness from Strategic Sourcing

    DTIC Science & Technology

    2013-04-01

    disseminating information are not systematically taught or developed in the government’s acquisition workforce. However, a study of 30 large firms ...to keep themselves abreast of changes in the marketplace, such as technological advances, process improvements, and available sources of supply. The...and performance measurement (Monczka & Petersen, 2008). Firms that develop supply management strategic plans typically set three-to-five year

  7. Extracting DEM from airborne X-band data based on PolInSAR

    NASA Astrophysics Data System (ADS)

    Hou, X. X.; Huang, G. M.; Zhao, Z.

    2015-06-01

    Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) is a new trend of SAR remote sensing technology which combined polarized multichannel information and Interferometric information. It is of great significance for extracting DEM in some regions with low precision of DEM such as vegetation coverage area and building concentrated area. In this paper we describe our experiments with high-resolution X-band full Polarimetric SAR data acquired by a dual-baseline interferometric airborne SAR system over an area of Danling in southern China. Pauli algorithm is used to generate the double polarimetric interferometry data, Singular Value Decomposition (SVD), Numerical Radius (NR) and Phase diversity (PD) methods are used to generate the full polarimetric interferometry data. Then we can make use of the polarimetric interferometric information to extract DEM with processing of pre filtering , image registration, image resampling, coherence optimization, multilook processing, flat-earth removal, interferogram filtering, phase unwrapping, parameter calibration, height derivation and geo-coding. The processing system named SARPlore has been exploited based on VC++ led by Chinese Academy of Surveying and Mapping. Finally compared optimization results with the single polarimetric interferometry, it has been observed that optimization ways can reduce the interferometric noise and the phase unwrapping residuals, and improve the precision of DEM. The result of full polarimetric interferometry is better than double polarimetric interferometry. Meanwhile, in different terrain, the result of full polarimetric interferometry will have a different degree of increase.

  8. Coherent Wave Measurement Buoy Arrays to Support Wave Energy Extraction

    NASA Astrophysics Data System (ADS)

    Spada, F.; Chang, G.; Jones, C.; Janssen, T. T.; Barney, P.; Roberts, J.

    2016-02-01

    Wave energy is the most abundant form of hydrokinetic energy in the United States and wave energy converters (WECs) are being developed to extract the maximum possible power from the prevailing wave climate. However, maximum wave energy capture is currently limited by the narrow banded frequency response of WECs as well as extended protective shutdown requirements during periods of large waves. These limitations must be overcome in order to maximize energy extraction, thus significantly decreasing the cost of wave energy and making it a viable energy source. Techno-economic studies of several WEC devices have shown significant potential to improve wave energy capture efficiency through operational control strategies that incorporate real-time information about local surface wave motions. Integral Consulting Inc., with ARPA-E support, is partnering with Sandia National Laboratories and Spoondrift LLC to develop a coherent array of wave-measuring devices to relay and enable the prediction of wave-resolved surface dynamics at a WEC location ahead of real time. This capability will provide necessary information to optimize power production of WECs through control strategies, thereby allowing for a single WEC design to perform more effectively across a wide range of wave environments. The information, data, or work presented herein was funded in part by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award Number DE-AR0000514.

  9. Investigating the feasibility of using partial least squares as a method of extracting salient information for the evaluation of digital breast tomosynthesis

    NASA Astrophysics Data System (ADS)

    Zhang, George Z.; Myers, Kyle J.; Park, Subok

    2013-03-01

    Digital breast tomosynthesis (DBT) has shown promise for improving the detection of breast cancer, but it has not yet been fully optimized due to a large space of system parameters to explore. A task-based statistical approach1 is a rigorous method for evaluating and optimizing this promising imaging technique with the use of optimal observers such as the Hotelling observer (HO). However, the high data dimensionality found in DBT has been the bottleneck for the use of a task-based approach in DBT evaluation. To reduce data dimensionality while extracting salient information for performing a given task, efficient channels have to be used for the HO. In the past few years, 2D Laguerre-Gauss (LG) channels, which are a complete basis for stationary backgrounds and rotationally symmetric signals, have been utilized for DBT evaluation2, 3 . But since background and signal statistics from DBT data are neither stationary nor rotationally symmetric, LG channels may not be efficient in providing reliable performance trends as a function of system parameters. Recently, partial least squares (PLS) has been shown to generate efficient channels for the Hotelling observer in detection tasks involving random backgrounds and signals.4 In this study, we investigate the use of PLS as a method for extracting salient information from DBT in order to better evaluate such systems.

  10. Assessment of the Reconstructed Aerodynamics of the Mars Science Laboratory Entry Vehicle

    NASA Technical Reports Server (NTRS)

    Schoenenberger, Mark; Van Norman, John W.; Dyakonov, Artem A.; Karlgaard, Christopher D.; Way, David W.; Kutty, Prasad

    2013-01-01

    On August 5, 2012, the Mars Science Laboratory entry vehicle successfully entered Mars atmosphere, flying a guided entry until parachute deploy. The Curiosity rover landed safely in Gale crater upon completion of the Entry Descent and Landing sequence. This paper compares the aerodynamics of the entry capsule extracted from onboard flight data, including Inertial Measurement Unit (IMU) accelerometer and rate gyro information, and heatshield surface pressure measurements. From the onboard data, static force and moment data has been extracted. This data is compared to preflight predictions. The information collected by MSL represents the most complete set of information collected during Mars entry to date. It allows the separation of aerodynamic performance from atmospheric conditions. The comparisons show the MSL aerodynamic characteristics have been identified and resolved to an accuracy better than the aerodynamic database uncertainties used in preflight simulations. A number of small anomalies have been identified and are discussed. This data will help revise aerodynamic databases for future missions and will guide computational fluid dynamics (CFD) development to improved prediction codes.

  11. Medical knowledge discovery and management.

    PubMed

    Prior, Fred

    2009-05-01

    Although the volume of medical information is growing rapidly, the ability to rapidly convert this data into "actionable insights" and new medical knowledge is lagging far behind. The first step in the knowledge discovery process is data management and integration, which logically can be accomplished through the application of data warehouse technologies. A key insight that arises from efforts in biosurveillance and the global scope of military medicine is that information must be integrated over both time (longitudinal health records) and space (spatial localization of health-related events). Once data are compiled and integrated it is essential to encode the semantics and relationships among data elements through the use of ontologies and semantic web technologies to convert data into knowledge. Medical images form a special class of health-related information. Traditionally knowledge has been extracted from images by human observation and encoded via controlled terminologies. This approach is rapidly being replaced by quantitative analyses that more reliably support knowledge extraction. The goals of knowledge discovery are the improvement of both the timeliness and accuracy of medical decision making and the identification of new procedures and therapies.

  12. Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer.

    PubMed

    Hoogendoorn, Mark; Szolovits, Peter; Moons, Leon M G; Numans, Mattijs E

    2016-05-01

    Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance. We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Protocole of a controlled before-after evaluation of a national health information technology-based program to improve healthcare coordination and access to information.

    PubMed

    Saillour-Glénisson, Florence; Duhamel, Sylvie; Fourneyron, Emmanuelle; Huiart, Laetitia; Joseph, Jean Philippe; Langlois, Emmanuel; Pincemail, Stephane; Ramel, Viviane; Renaud, Thomas; Roberts, Tamara; Sibé, Matthieu; Thiessard, Frantz; Wittwer, Jerome; Salmi, Louis Rachid

    2017-04-21

    Improvement of coordination of all health and social care actors in the patient pathways is an important issue in many countries. Health Information (HI) technology has been considered as a potentially effective answer to this issue. The French Health Ministry first funded the development of five TSN ("Territoire de Soins Numérique"/Digital health territories) projects, aiming at improving healthcare coordination and access to information for healthcare providers, patients and the population, and at improving healthcare professionals work organization. The French Health Ministry then launched a call for grant to fund one research project consisting in evaluating the TSN projects implementation and impact and in developing a model for HI technology evaluation. EvaTSN is mainly based on a controlled before-after study design. Data collection covers three periods: before TSN program implementation, during early TSN program implementation and at late TSN program implementation, in the five TSN projects' territories and in five comparison territories. Three populations will be considered: "TSN-targeted people" (healthcare system users and people having characteristics targeted by the TSN projects), "TSN patient users" (people included in TSN experimentations or using particular services) and "TSN professional users" (healthcare professionals involved in TSN projects). Several samples will be made in each population depending on the objective, axis and stage of the study. Four types of data sources are considered: 1) extractions from the French National Heath Insurance Database (SNIIRAM) and the French Autonomy Personalized Allowance database, 2) Ad hoc surveys collecting information on knowledge of TSN projects, TSN program use, ease of use, satisfaction and understanding, TSN pathway experience and appropriateness of hospital admissions, 3) qualitative analyses using semi-directive interviews and focus groups and document analyses and 4) extractions of TSN implementation indicators from TSN program database. EvaTSN is a challenging French national project for the production of evidenced-based information on HI technologies impact and on the context and conditions of their effectiveness and efficiency. We will be able to support health care management in order to implement HI technologies. We will also be able to produce an evaluation toolkit for HI technology evaluation. ClinicalTrials.gov ID: NCT02837406 , 08/18/2016.

  14. Optimal Information Extraction of Laser Scanning Dataset by Scale-Adaptive Reduction

    NASA Astrophysics Data System (ADS)

    Zang, Y.; Yang, B.

    2018-04-01

    3D laser technology is widely used to collocate the surface information of object. For various applications, we need to extract a good perceptual quality point cloud from the scanned points. To solve the problem, most of existing methods extract important points based on a fixed scale. However, geometric features of 3D object come from various geometric scales. We propose a multi-scale construction method based on radial basis function. For each scale, important points are extracted from the point cloud based on their importance. We apply a perception metric Just-Noticeable-Difference to measure degradation of each geometric scale. Finally, scale-adaptive optimal information extraction is realized. Experiments are undertaken to evaluate the effective of the proposed method, suggesting a reliable solution for optimal information extraction of object.

  15. Individual Learning Route as a Way of Highly Qualified Specialists Training for Extraction of Solid Commercial Minerals Enterprises

    NASA Astrophysics Data System (ADS)

    Oschepkova, Elena; Vasinskaya, Irina; Sockoluck, Irina

    2017-11-01

    In view of changing educational paradigm (adopting of two-tier system of higher education concept - undergraduate and graduate programs) a need of using of modern learning and information and communications technologies arises putting into practice learner-centered approaches in training of highly qualified specialists for extraction and processing of solid commercial minerals enterprises. In the unstable market demand situation and changeable institutional environment, from one side, and necessity of work balancing, supplying conditions and product quality when mining-and-geological parameters change, from the other side, mining enterprises have to introduce and develop the integrated management process of product and informative and logistic flows under united management system. One of the main limitations, which keeps down the developing process on Russian mining enterprises, is staff incompetence at all levels of logistic management. Under present-day conditions extraction and processing of solid commercial minerals enterprises need highly qualified specialists who can do self-directed researches, develop new and improve present arranging, planning and managing technologies of technical operation and commercial exploitation of transport and transportation and processing facilities based on logistics. Learner-centered approach and individualization of the learning process necessitate the designing of individual learning route (ILR), which can help the students to realize their professional facilities according to requirements for specialists for extraction and processing of solid commercial minerals enterprises.

  16. LiDAR DTMs and anthropogenic feature extraction: testing the feasibility of geomorphometric parameters in floodplains

    NASA Astrophysics Data System (ADS)

    Sofia, G.; Tarolli, P.; Dalla Fontana, G.

    2012-04-01

    In floodplains, massive investments in land reclamation have always played an important role in the past for flood protection. In these contexts, human alteration is reflected by artificial features ('Anthropogenic features'), such as banks, levees or road scarps, that constantly increase and change, in response to the rapid growth of human populations. For these areas, various existing and emerging applications require up-to-date, accurate and sufficiently attributed digital data, but such information is usually lacking, especially when dealing with large-scale applications. More recently, National or Local Mapping Agencies, in Europe, are moving towards the generation of digital topographic information that conforms to reality and are highly reliable and up to date. LiDAR Digital Terrain Models (DTMs) covering large areas are readily available for public authorities, and there is a greater and more widespread interest in the application of such information by agencies responsible for land management for the development of automated methods aimed at solving geomorphological and hydrological problems. Automatic feature recognition based upon DTMs can offer, for large-scale applications, a quick and accurate method that can help in improving topographic databases, and that can overcome some of the problems associated with traditional, field-based, geomorphological mapping, such as restrictions on access, and constraints of time or costs. Although anthropogenic features as levees and road scarps are artificial structures that actually do not belong to what is usually defined as the bare ground surface, they are implicitly embedded in digital terrain models (DTMs). Automatic feature recognition based upon DTMs, therefore, can offer a quick and accurate method that does not require additional data, and that can help in improving flood defense asset information, flood modeling or other applications. In natural contexts, morphological indicators derived from high resolution topography have been proven to be reliable for feasible applications. The use of statistical operators as thresholds for these geomorphic parameters, furthermore, showed a high reliability for feature extraction in mountainous environments. The goal of this research is to test if these morphological indicators and objective thresholds can be feasible also in floodplains, where features assume different characteristics and other artificial disturbances might be present. In the work, three different geomorphic parameters are tested and applied at different scales on a LiDAR DTM of typical alluvial plain's area in the North East of Italy. The box-plot is applied to identify the threshold for feature extraction, and a filtering procedure is proposed, to improve the quality of the final results. The effectiveness of the different geomorphic parameters is analyzed, comparing automatically derived features with the surveyed ones. The results highlight the capability of high resolution topography, geomorphic indicators and statistical thresholds for anthropogenic features extraction and characterization in a floodplains context.

  17. Development of an economic model to assess the cost-effectiveness of hawthorn extract as an adjunct treatment for heart failure in Australia

    PubMed Central

    Ford, Emily; Adams, Jon; Graves, Nicholas

    2012-01-01

    Objective An economic model was developed to evaluate the cost-effectiveness of hawthorn extract as an adjunctive treatment for heart failure in Australia. Methods A Markov model of chronic heart failure was developed to compare the costs and outcomes of standard treatment and standard treatment with hawthorn extract. Health states were defined by the New York Heart Association (NYHA) classification system and death. For any given cycle, patients could remain in the same NYHA class, experience an improvement or deterioration in NYHA class, be hospitalised or die. Model inputs were derived from the published medical literature, and the output was quality-adjusted life years (QALYs). Probabilistic sensitivity analysis was conducted. The expected value of perfect information (EVPI) and the expected value of partial perfect information (EVPPI) were conducted to establish the value of further research and the ideal target for such research. Results Hawthorn extract increased costs by $1866.78 and resulted in a gain of 0.02 QALYs. The incremental cost-effectiveness ratio was $85 160.33 per QALY. The cost-effectiveness acceptability curve indicated that at a threshold of $40 000 the new treatment had a 0.29 probability of being cost-effective. The average incremental net monetary benefit (NMB) was −$1791.64, the average NMB for the standard treatment was $92 067.49, and for hawthorn extract $90 275.84. Additional research is potentially cost-effective if research is not proposed to cost more than $325 million. Utilities form the most important target parameter group for further research. Conclusions Hawthorn extract is not currently considered to be cost-effective in as an adjunctive treatment for heart failure in Australia. Further research in the area of utilities is warranted. PMID:22942231

  18. Development of an economic model to assess the cost-effectiveness of hawthorn extract as an adjunct treatment for heart failure in Australia.

    PubMed

    Ford, Emily; Adams, Jon; Graves, Nicholas

    2012-01-01

    An economic model was developed to evaluate the cost-effectiveness of hawthorn extract as an adjunctive treatment for heart failure in Australia. A Markov model of chronic heart failure was developed to compare the costs and outcomes of standard treatment and standard treatment with hawthorn extract. Health states were defined by the New York Heart Association (NYHA) classification system and death. For any given cycle, patients could remain in the same NYHA class, experience an improvement or deterioration in NYHA class, be hospitalised or die. Model inputs were derived from the published medical literature, and the output was quality-adjusted life years (QALYs). Probabilistic sensitivity analysis was conducted. The expected value of perfect information (EVPI) and the expected value of partial perfect information (EVPPI) were conducted to establish the value of further research and the ideal target for such research. Hawthorn extract increased costs by $1866.78 and resulted in a gain of 0.02 QALYs. The incremental cost-effectiveness ratio was $85 160.33 per QALY. The cost-effectiveness acceptability curve indicated that at a threshold of $40 000 the new treatment had a 0.29 probability of being cost-effective. The average incremental net monetary benefit (NMB) was -$1791.64, the average NMB for the standard treatment was $92 067.49, and for hawthorn extract $90 275.84. Additional research is potentially cost-effective if research is not proposed to cost more than $325 million. Utilities form the most important target parameter group for further research. Hawthorn extract is not currently considered to be cost-effective in as an adjunctive treatment for heart failure in Australia. Further research in the area of utilities is warranted.

  19. Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding

    PubMed Central

    Wang, Xiang; Zheng, Yuan; Zhao, Zhenzhou; Wang, Jinping

    2015-01-01

    Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches. PMID:26153771

  20. Machine Reading for Extraction of Bacteria and Habitat Taxonomies

    PubMed Central

    Kordjamshidi, Parisa; Massa, Wouter; Provoost, Thomas; Moens, Marie-Francine

    2015-01-01

    There is a vast amount of scientific literature available from various resources such as the internet. Automating the extraction of knowledge from these resources is very helpful for biologists to easily access this information. This paper presents a system to extract the bacteria and their habitats, as well as the relations between them. We investigate to what extent current techniques are suited for this task and test a variety of models in this regard. We detect entities in a biological text and map the habitats into a given taxonomy. Our model uses a linear chain Conditional Random Field (CRF). For the prediction of relations between the entities, a model based on logistic regression is built. Designing a system upon these techniques, we explore several improvements for both the generation and selection of good candidates. One contribution to this lies in the extended exibility of our ontology mapper that uses an advanced boundary detection and assigns the taxonomy elements to the detected habitats. Furthermore, we discover value in the combination of several distinct candidate generation rules. Using these techniques, we show results that are significantly improving upon the state of art for the BioNLP Bacteria Biotopes task. PMID:27077141

  1. A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems.

    PubMed

    Peng, Yifan; Torii, Manabu; Wu, Cathy H; Vijay-Shanker, K

    2014-08-23

    Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task. A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations. In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus.

  2. A novel Gravity-FREAK feature extraction and Gravity-KLT tracking registration algorithm based on iPhone MEMS mobile sensor in mobile environment

    PubMed Central

    Lin, Fan; Xiao, Bin

    2017-01-01

    Based on the traditional Fast Retina Keypoint (FREAK) feature description algorithm, this paper proposed a Gravity-FREAK feature description algorithm based on Micro-electromechanical Systems (MEMS) sensor to overcome the limited computing performance and memory resources of mobile devices and further improve the reality interaction experience of clients through digital information added to the real world by augmented reality technology. The algorithm takes the gravity projection vector corresponding to the feature point as its feature orientation, which saved the time of calculating the neighborhood gray gradient of each feature point, reduced the cost of calculation and improved the accuracy of feature extraction. In the case of registration method of matching and tracking natural features, the adaptive and generic corner detection based on the Gravity-FREAK matching purification algorithm was used to eliminate abnormal matches, and Gravity Kaneda-Lucas Tracking (KLT) algorithm based on MEMS sensor can be used for the tracking registration of the targets and robustness improvement of tracking registration algorithm under mobile environment. PMID:29088228

  3. A novel Gravity-FREAK feature extraction and Gravity-KLT tracking registration algorithm based on iPhone MEMS mobile sensor in mobile environment.

    PubMed

    Hong, Zhiling; Lin, Fan; Xiao, Bin

    2017-01-01

    Based on the traditional Fast Retina Keypoint (FREAK) feature description algorithm, this paper proposed a Gravity-FREAK feature description algorithm based on Micro-electromechanical Systems (MEMS) sensor to overcome the limited computing performance and memory resources of mobile devices and further improve the reality interaction experience of clients through digital information added to the real world by augmented reality technology. The algorithm takes the gravity projection vector corresponding to the feature point as its feature orientation, which saved the time of calculating the neighborhood gray gradient of each feature point, reduced the cost of calculation and improved the accuracy of feature extraction. In the case of registration method of matching and tracking natural features, the adaptive and generic corner detection based on the Gravity-FREAK matching purification algorithm was used to eliminate abnormal matches, and Gravity Kaneda-Lucas Tracking (KLT) algorithm based on MEMS sensor can be used for the tracking registration of the targets and robustness improvement of tracking registration algorithm under mobile environment.

  4. Extracting information in spike time patterns with wavelets and information theory.

    PubMed

    Lopes-dos-Santos, Vítor; Panzeri, Stefano; Kayser, Christoph; Diamond, Mathew E; Quian Quiroga, Rodrigo

    2015-02-01

    We present a new method to assess the information carried by temporal patterns in spike trains. The method first performs a wavelet decomposition of the spike trains, then uses Shannon information to select a subset of coefficients carrying information, and finally assesses timing information in terms of decoding performance: the ability to identify the presented stimuli from spike train patterns. We show that the method allows: 1) a robust assessment of the information carried by spike time patterns even when this is distributed across multiple time scales and time points; 2) an effective denoising of the raster plots that improves the estimate of stimulus tuning of spike trains; and 3) an assessment of the information carried by temporally coordinated spikes across neurons. Using simulated data, we demonstrate that the Wavelet-Information (WI) method performs better and is more robust to spike time-jitter, background noise, and sample size than well-established approaches, such as principal component analysis, direct estimates of information from digitized spike trains, or a metric-based method. Furthermore, when applied to real spike trains from monkey auditory cortex and from rat barrel cortex, the WI method allows extracting larger amounts of spike timing information. Importantly, the fact that the WI method incorporates multiple time scales makes it robust to the choice of partly arbitrary parameters such as temporal resolution, response window length, number of response features considered, and the number of available trials. These results highlight the potential of the proposed method for accurate and objective assessments of how spike timing encodes information. Copyright © 2015 the American Physiological Society.

  5. Optical character recognition of handwritten Arabic using hidden Markov models

    NASA Astrophysics Data System (ADS)

    Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.; Olama, Mohammed M.

    2011-04-01

    The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.

  6. Optical character recognition of handwritten Arabic using hidden Markov models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.

    2011-01-01

    The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language ismore » initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.« less

  7. Multi-clues image retrieval based on improved color invariants

    NASA Astrophysics Data System (ADS)

    Liu, Liu; Li, Jian-Xun

    2012-05-01

    At present, image retrieval has a great progress in indexing efficiency and memory usage, which mainly benefits from the utilization of the text retrieval technology, such as the bag-of-features (BOF) model and the inverted-file structure. Meanwhile, because the robust local feature invariants are selected to establish BOF, the retrieval precision of BOF is enhanced, especially when it is applied to a large-scale database. However, these local feature invariants mainly consider the geometric variance of the objects in the images, and thus the color information of the objects fails to be made use of. Because of the development of the information technology and Internet, the majority of our retrieval objects is color images. Therefore, retrieval performance can be further improved through proper utilization of the color information. We propose an improved method through analyzing the flaw of shadow-shading quasi-invariant. The response and performance of shadow-shading quasi-invariant for the object edge with the variance of lighting are enhanced. The color descriptors of the invariant regions are extracted and integrated into BOF based on the local feature. The robustness of the algorithm and the improvement of the performance are verified in the final experiments.

  8. Information extraction during simultaneous motion processing.

    PubMed

    Rideaux, Reuben; Edwards, Mark

    2014-02-01

    When confronted with multiple moving objects the visual system can process them in two stages: an initial stage in which a limited number of signals are processed in parallel (i.e. simultaneously) followed by a sequential stage. We previously demonstrated that during the simultaneous stage, observers could discriminate between presentations containing up to 5 vs. 6 spatially localized motion signals (Edwards & Rideaux, 2013). Here we investigate what information is actually extracted during the simultaneous stage and whether the simultaneous limit varies with the detail of information extracted. This was achieved by measuring the ability of observers to extract varied information from low detail, i.e. the number of signals presented, to high detail, i.e. the actual directions present and the direction of a specific element, during the simultaneous stage. The results indicate that the resolution of simultaneous processing varies as a function of the information which is extracted, i.e. as the information extraction becomes more detailed, from the number of moving elements to the direction of a specific element, the capacity to process multiple signals is reduced. Thus, when assigning a capacity to simultaneous motion processing, this must be qualified by designating the degree of information extraction. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  9. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement.

    PubMed

    Horvath, Monica M; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-04-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction--the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a Guided Query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. Copyright © 2010 Elsevier Inc. All rights reserved.

  10. Landsat 4 results and their implications for agricultural surveys

    NASA Technical Reports Server (NTRS)

    Erickson, J. D.; Bizzell, R. M.; Pitts, D. E.; Thompson, D. R.

    1983-01-01

    Progress on defining the minimum Landsat-4 data characteristics needed for agricultural information in the U.S. and assessing the value-added capability of current technology to extract that level of information is reported. Emphasis is laid on the thematic mapper (TM) data and the ground processing facilities. TM data from all 7 bands for a rural Arkansas scene were examined in terms of radiometric, spatial, and geometric fidelity characteristics. Another scene sensed over Iowa was analyzed using three two-channel data sets. Although the TM data were an improvement over MSS data, no value differential was perceived. However, the development of further analysis techniques is still necessary to determine the actual worth of the improved sensor capabilities available with the TM, which actually has an MSS within itself.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crain, Steven P.; Yang, Shuang-Hong; Zha, Hongyuan

    Access to health information by consumers is ham- pered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, com- mon and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations us- ing consumermore » questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.« less

  12. Textural-Contextual Labeling and Metadata Generation for Remote Sensing Applications

    NASA Technical Reports Server (NTRS)

    Kiang, Richard K.

    1999-01-01

    Despite the extensive research and the advent of several new information technologies in the last three decades, machine labeling of ground categories using remotely sensed data has not become a routine process. Considerable amount of human intervention is needed to achieve a level of acceptable labeling accuracy. A number of fundamental reasons may explain why machine labeling has not become automatic. In addition, there may be shortcomings in the methodology for labeling ground categories. The spatial information of a pixel, whether textural or contextual, relates a pixel to its surroundings. This information should be utilized to improve the performance of machine labeling of ground categories. Landsat-4 Thematic Mapper (TM) data taken in July 1982 over an area in the vicinity of Washington, D.C. are used in this study. On-line texture extraction by neural networks may not be the most efficient way to incorporate textural information into the labeling process. Texture features are pre-computed from cooccurrence matrices and then combined with a pixel's spectral and contextual information as the input to a neural network. The improvement in labeling accuracy with spatial information included is significant. The prospect of automatic generation of metadata consisting of ground categories, textural and contextual information is discussed.

  13. A Framework of Temporal-Spatial Descriptors-Based Feature Extraction for Improved Myoelectric Pattern Recognition.

    PubMed

    Khushaba, Rami N; Al-Timemy, Ali H; Al-Ani, Ahmed; Al-Jumaily, Adel

    2017-10-01

    The extraction of the accurate and efficient descriptors of muscular activity plays an important role in tackling the challenging problem of myoelectric control of powered prostheses. In this paper, we present a new feature extraction framework that aims to give an enhanced representation of muscular activities through increasing the amount of information that can be extracted from individual and combined electromyogram (EMG) channels. We propose to use time-domain descriptors (TDDs) in estimating the EMG signal power spectrum characteristics; a step that preserves the computational power required for the construction of spectral features. Subsequently, TDD is used in a process that involves: 1) representing the temporal evolution of the EMG signals by progressively tracking the correlation between the TDD extracted from each analysis time window and a nonlinearly mapped version of it across the same EMG channel and 2) representing the spatial coherence between the different EMG channels, which is achieved by calculating the correlation between the TDD extracted from the differences of all possible combinations of pairs of channels and their nonlinearly mapped versions. The proposed temporal-spatial descriptors (TSDs) are validated on multiple sparse and high-density (HD) EMG data sets collected from a number of intact-limbed and amputees performing a large number of hand and finger movements. Classification results showed significant reductions in the achieved error rates in comparison to other methods, with the improvement of at least 8% on average across all subjects. Additionally, the proposed TSDs achieved significantly well in problems with HD-EMG with average classification errors of <5% across all subjects using windows lengths of 50 ms only.

  14. NPIDB: Nucleic acid-Protein Interaction DataBase.

    PubMed

    Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

    2013-01-01

    The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.

  15. Milk thistle and its derivative compounds: a review of opportunities for treatment of liver disease.

    PubMed

    Hackett, E S; Twedt, D C; Gustafson, D L

    2013-01-01

    Milk thistle extracts have been used as a "liver tonic" for centuries. In recent years, silibinin, the active ingredient in milk thistle extracts, has been studied both in vitro and in vivo to evaluate the beneficial effects in hepatic disease. Silibinin increases antioxidant concentrations and improves outcomes in hepatic diseases resulting from oxidant injury. Silibinin treatment has been associated with protection against hepatic toxins, and also has resulted in decreased hepatic inflammation and fibrosis. Limited information currently is available regarding silibinin use in veterinary medicine. Future study is justified to evaluate dose, kinetics, and treatment effects in domestic animals. Copyright © 2012 by the American College of Veterinary Internal Medicine.

  16. Selective aqueous extraction of organics coupled with trapping by membrane separation

    DOEpatents

    van Eikeren, Paul; Brose, Daniel J.; Ray, Roderick J.

    1991-01-01

    An improvement to processes for the selective extractation of organic solutes from organic solvents by water-based extractants is disclosed, the improvement comprising coupling various membrane separation processes with the organic extraction process, the membrane separation process being utilized to continuously recycle the water-based extractant and at the same time selectively remove or concentrate organic solute from the water-based extractant.

  17. Pattern recognition applied to seismic signals of Llaima volcano (Chile): An evaluation of station-dependent classifiers

    NASA Astrophysics Data System (ADS)

    Curilem, Millaray; Huenupan, Fernando; Beltrán, Daniel; San Martin, Cesar; Fuentealba, Gustavo; Franco, Luis; Cardona, Carlos; Acuña, Gonzalo; Chacón, Max; Khan, M. Salman; Becerra Yoma, Nestor

    2016-04-01

    Automatic pattern recognition applied to seismic signals from volcanoes may assist seismic monitoring by reducing the workload of analysts, allowing them to focus on more challenging activities, such as producing reports, implementing models, and understanding volcanic behaviour. In a previous work, we proposed a structure for automatic classification of seismic events in Llaima volcano, one of the most active volcanoes in the Southern Andes, located in the Araucanía Region of Chile. A database of events taken from three monitoring stations on the volcano was used to create a classification structure, independent of which station provided the signal. The database included three types of volcanic events: tremor, long period, and volcano-tectonic and a contrast group which contains other types of seismic signals. In the present work, we maintain the same classification scheme, but we consider separately the stations information in order to assess whether the complementary information provided by different stations improves the performance of the classifier in recognising seismic patterns. This paper proposes two strategies for combining the information from the stations: i) combining the features extracted from the signals from each station and ii) combining the classifiers of each station. In the first case, the features extracted from the signals from each station are combined forming the input for a single classification structure. In the second, a decision stage combines the results of the classifiers for each station to give a unique output. The results confirm that the station-dependent strategies that combine the features and the classifiers from several stations improves the classification performance, and that the combination of the features provides the best performance. The results show an average improvement of 9% in the classification accuracy when compared with the station-independent method.

  18. Highway extraction from high resolution aerial photography using a geometric active contour model

    NASA Astrophysics Data System (ADS)

    Niu, Xutong

    Highway extraction and vehicle detection are two of the most important steps in traffic-flow analysis from multi-frame aerial photographs. The traditional method of deriving traffic flow trajectories relies on manual vehicle counting from a sequence of aerial photographs, which is tedious and time-consuming. This research presents a new framework for semi-automatic highway extraction. The basis of the new framework is an improved geometric active contour (GAC) model. This novel model seeks to minimize an objective function that transforms a problem of propagation of regular curves into an optimization problem. The implementation of curve propagation is based on level set theory. By using an implicit representation of a two-dimensional curve, a level set approach can be used to deal with topological changes naturally, and the output is unaffected by different initial positions of the curve. However, the original GAC model, on which the new model is based, only incorporates boundary information into the curve propagation process. An error-producing phenomenon called leakage is inevitable wherever there is an uncertain weak edge. In this research, region-based information is added as a constraint into the original GAC model, thereby, giving this proposed method the ability of integrating both boundary and region-based information during the curve propagation. Adding the region-based constraint eliminates the leakage problem. This dissertation applies the proposed augmented GAC model to the problem of highway extraction from high-resolution aerial photography. First, an optimized stopping criterion is designed and used in the implementation of the GAC model. It effectively saves processing time and computations. Second, a seed point propagation framework is designed and implemented. This framework incorporates highway extraction, tracking, and linking into one procedure. A seed point is usually placed at an end node of highway segments close to the boundary of the image or at a position where possible blocking may occur, such as at an overpass bridge or near vehicle crowds. These seed points can be automatically propagated throughout the entire highway network. During the process, road center points are also extracted, which introduces a search direction for solving possible blocking problems. This new framework has been successfully applied to highway network extraction from a large orthophoto mosaic. In the process, vehicles on the highway extracted from mosaic were detected with an 83% success rate.

  19. Exploiting domain information for Word Sense Disambiguation of medical documents.

    PubMed

    Stevenson, Mark; Agirre, Eneko; Soroa, Aitor

    2012-01-01

    Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.

  20. Exploiting domain information for Word Sense Disambiguation of medical documents

    PubMed Central

    Agirre, Eneko; Soroa, Aitor

    2011-01-01

    Objective Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. Design The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. Measurements A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. Results and discussion The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency. PMID:21900701

  1. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  2. Building Facade Reconstruction by Fusing Terrestrial Laser Points and Images

    PubMed Central

    Pu, Shi; Vosselman, George

    2009-01-01

    Laser data and optical data have a complementary nature for three dimensional feature extraction. Efficient integration of the two data sources will lead to a more reliable and automated extraction of three dimensional features. This paper presents a semiautomatic building facade reconstruction approach, which efficiently combines information from terrestrial laser point clouds and close range images. A building facade's general structure is discovered and established using the planar features from laser data. Then strong lines in images are extracted using Canny extractor and Hough transformation, and compared with current model edges for necessary improvement. Finally, textures with optimal visibility are selected and applied according to accurate image orientations. Solutions to several challenge problems throughout the collaborated reconstruction, such as referencing between laser points and multiple images and automated texturing, are described. The limitations and remaining works of this approach are also discussed. PMID:22408539

  3. A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.

    PubMed Central

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-01-01

    OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792

  4. A randomized control trial comparing the visual and verbal communication methods for reducing fear and anxiety during tooth extraction.

    PubMed

    Gazal, Giath; Tola, Ahmed W; Fareed, Wamiq M; Alnazzawi, Ahmad A; Zafar, Muhammad S

    2016-04-01

    To evaluate the value of using the visual information for reducing the level of dental fear and anxiety in patients undergoing teeth extraction under LA. A total of 64 patients were indiscriminately allotted to solitary of the study groups following reading the information sheet and signing the formal consent. If patient was in the control group, only verbal information and routine warnings were provided. If patient was in the study group, tooth extraction video was showed. The level of dental fear and anxiety was detailed by the patients on customary 100 mm visual analog scales (VAS), with "no dental fear and anxiety" (0 mm) and "severe dental distress and unease" (100 mm). Evaluation of dental apprehension and fretfulness was made pre-operatively, following visual/verbal information and post-extraction. There was a substantial variance among the mean dental fear and anxiety scores for both groups post-extraction (p-value < 0.05). Patients in tooth extraction video group were more comfortable after dental extraction than verbal information and routine warning group. For tooth extraction video group there were major decreases in dental distress and anxiety scores between the pre-operative and either post video information scores or postoperative scores (p-values < 0.05). Younger patients recorded higher dental fear and anxiety scores than older ones (P < 0.05). Dental fear and anxiety associated with dental extractions under local anesthesia can be reduced by showing a tooth extraction video to the patients preoperatively.

  5. Information Extraction from Unstructured Text for the Biodefense Knowledge Center

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samatova, N F; Park, B; Krishnamurthy, R

    2005-04-29

    The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces ofmore » our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.« less

  6. An Effective Approach to Biomedical Information Extraction with Limited Training Data

    ERIC Educational Resources Information Center

    Jonnalagadda, Siddhartha

    2011-01-01

    In the current millennium, extensive use of computers and the internet caused an exponential increase in information. Few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text. Limitations in the size of training data, lack of lexicons and lack of…

  7. Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

    ERIC Educational Resources Information Center

    Finch, Dezon Kile

    2012-01-01

    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…

  8. Rare tradition of the folk medicinal use of Aconitum spp. is kept alive in Solčavsko, Slovenia.

    PubMed

    Povšnar, Marija; Koželj, Gordana; Kreft, Samo; Lumpert, Mateja

    2017-08-08

    Aconitum species are poisonous plants that have been used in Western medicine for centuries. In the nineteenth century, these plants were part of official and folk medicine in the Slovenian territory. According to current ethnobotanical studies, folk use of Aconitum species is rarely reported in Europe. The purpose of this study was to research the folk medicinal use of Aconitum species in Solčavsko, Slovenia; to collect recipes for the preparation of Aconitum spp., indications for use, and dosing; and to investigate whether the folk use of aconite was connected to poisoning incidents. In Solčavsko, a remote alpine area in northern Slovenia, we performed semi-structured interviews with 19 informants in Solčavsko, 3 informants in Luče, and two retired physicians who worked in that area. Three samples of homemade ethanolic extracts were obtained from informants, and the concentration of aconitine was measured. In addition, four extracts were prepared according to reported recipes. All 22 informants knew of Aconitum spp. and their therapeutic use, and 5 of them provided a detailed description of the preparation and use of "voukuc", an ethanolic extract made from aconite roots. Seven informants were unable to describe the preparation in detail, since they knew of the extract only from the narration of others or they remembered it from childhood. Most likely, the roots of Aconitum tauricum and Aconitum napellus were used for the preparation of the extract, and the solvent was homemade spirits. Four informants kept the extract at home; two extracts were prepared recently (1998 and 2015). Three extracts were analyzed, and 2 contained aconitine. Informants reported many indications for the use of the extract; it was used internally and, in some cases, externally as well. The extract was also used in animals. The extract was measured in drops, but the number of drops differed among the informants. The informants reported nine poisonings with Aconitum spp., but none of them occurred as a result of medicinal use of the extract. In this study, we determined that folk knowledge of the medicinal use of Aconitum spp. is still present in Solčavsko, but Aconitum preparations are used only infrequently.

  9. New Method for Knowledge Management Focused on Communication Pattern in Product Development

    NASA Astrophysics Data System (ADS)

    Noguchi, Takashi; Shiba, Hajime

    In the field of manufacturing, the importance of utilizing knowledge and know-how has been growing. To meet this background, there is a need for new methods to efficiently accumulate and extract effective knowledge and know-how. To facilitate the extraction of knowledge and know-how needed by engineers, we first defined business process information which includes schedule/progress information, document data, information about communication among parties concerned, and information which corresponds to these three types of information. Based on our definitions, we proposed an IT system (FlexPIM: Flexible and collaborative Process Information Management) to register and accumulate business process information with the least effort. In order to efficiently extract effective information from huge volumes of accumulated business process information, focusing attention on “actions” and communication patterns, we propose a new extraction method using communication patterns. And the validity of this method has been verified for some communication patterns.

  10. Smart Shop Assistant - Using Semantic Technologies to Improve Online Shopping

    NASA Astrophysics Data System (ADS)

    Niemann, Magnus; Mochol, Malgorzata; Tolksdorf, Robert

    Internet commerce experiences a rising complexity: Not only more and more products become available online but also the amount of information available on a single product has been constantly increasing. Thanks to the Web 2.0 development it is, in the meantime, quite common to involve customers in the creation of product description and extraction of additional product information by offering customers feedback forms and product review sites, users' weblogs and other social web services. To face this situation, one of the main tasks in a future internet will be to aggregate, sort and evaluate this huge amount of information to aid the customers in choosing the "perfect" product for their needs.

  11. Real-time Social Internet Data to Guide Forecasting Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Del Valle, Sara Y.

    Our goal is to improve decision support by monitoring and forecasting events using social media, mathematical models, and quantifying model uncertainty. Our approach is real-time, data-driven forecasts with quantified uncertainty: Not just for weather anymore. Information flow from human observations of events through an Internet system and classification algorithms is used to produce quantitatively uncertain forecast. In summary, we want to develop new tools to extract useful information from Internet data streams, develop new approaches to assimilate real-time information into predictive models, validate approaches by forecasting events, and our ultimate goal is to develop an event forecasting system using mathematicalmore » approaches and heterogeneous data streams.« less

  12. Data warehousing as a tool for quality management in oncology.

    PubMed

    Hölzer, S; Tafazzoli, A G; Altmann, U; Wächter, W; Dudeck, J

    1999-01-01

    At present, physicians are constrained by their limited skills to integrate and understand the growing amount of electronic medical information. To handle, extract, integrate, analyse and take advantage of the gathered information regarding the quality of patient care, the concept of a data warehouse seems to be especially interesting in medicine. Medical data warehousing allows the physicians to take advantage of all the operational data they have been collecting over the years. Our purpose is to build a data warehouse in order to use all available information about cancer patients. We think that with the sensible use of this tool, there are economic benefits for the Society and an improvement of quality of medical care for patients.

  13. Identifying key hospital service quality factors in online health communities.

    PubMed

    Jung, Yuchul; Hur, Cinyoung; Jung, Dain; Kim, Minki

    2015-04-07

    The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. We defined social media-based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea's two biggest online portals were used to test the effectiveness of detection of social media-based key quality factors for hospitals. To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media-based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies.

  14. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.

    PubMed

    Grundmeier, Robert W; Masino, Aaron J; Casper, T Charles; Dean, Jonathan M; Bell, Jamie; Enriquez, Rene; Deakyne, Sara; Chamberlain, James M; Alpern, Elizabeth R

    2016-11-09

    Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English "stop words" and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927). NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.

  15. EXIMS: an improved data analysis pipeline based on a new peak picking method for EXploring Imaging Mass Spectrometry data.

    PubMed

    Wijetunge, Chalini D; Saeed, Isaam; Boughton, Berin A; Spraggins, Jeffrey M; Caprioli, Richard M; Bacic, Antony; Roessner, Ute; Halgamuge, Saman K

    2015-10-01

    Matrix Assisted Laser Desorption Ionization-Imaging Mass Spectrometry (MALDI-IMS) in 'omics' data acquisition generates detailed information about the spatial distribution of molecules in a given biological sample. Various data processing methods have been developed for exploring the resultant high volume data. However, most of these methods process data in the spectral domain and do not make the most of the important spatial information available through this technology. Therefore, we propose a novel streamlined data analysis pipeline specifically developed for MALDI-IMS data utilizing significant spatial information for identifying hidden significant molecular distribution patterns in these complex datasets. The proposed unsupervised algorithm uses Sliding Window Normalization (SWN) and a new spatial distribution based peak picking method developed based on Gray level Co-Occurrence (GCO) matrices followed by clustering of biomolecules. We also use gist descriptors and an improved version of GCO matrices to extract features from molecular images and minimum medoid distance to automatically estimate the number of possible groups. We evaluated our algorithm using a new MALDI-IMS metabolomics dataset of a plant (Eucalypt) leaf. The algorithm revealed hidden significant molecular distribution patterns in the dataset, which the current Component Analysis and Segmentation Map based approaches failed to extract. We further demonstrate the performance of our peak picking method over other traditional approaches by using a publicly available MALDI-IMS proteomics dataset of a rat brain. Although SWN did not show any significant improvement as compared with using no normalization, the visual assessment showed an improvement as compared to using the median normalization. The source code and sample data are freely available at http://exims.sourceforge.net/. awgcdw@student.unimelb.edu.au or chalini_w@live.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Imaging and Analytics: The changing face of Medical Imaging

    NASA Astrophysics Data System (ADS)

    Foo, Thomas

    There have been significant technological advances in imaging capability over the past 40 years. Medical imaging capabilities have developed rapidly, along with technology development in computational processing speed and miniaturization. Moving to all-digital, the number of images that are acquired in a routine clinical examination has increased dramatically from under 50 images in the early days of CT and MRI to more than 500-1000 images today. The staggering number of images that are routinely acquired poses significant challenges for clinicians to interpret the data and to correctly identify the clinical problem. Although the time provided to render a clinical finding has not substantially changed, the amount of data available for interpretation has grown exponentially. In addition, the image quality (spatial resolution) and information content (physiologically-dependent image contrast) has also increased significantly with advances in medical imaging technology. On its current trajectory, medical imaging in the traditional sense is unsustainable. To assist in filtering and extracting the most relevant data elements from medical imaging, image analytics will have a much larger role. Automated image segmentation, generation of parametric image maps, and clinical decision support tools will be needed and developed apace to allow the clinician to manage, extract and utilize only the information that will help improve diagnostic accuracy and sensitivity. As medical imaging devices continue to improve in spatial resolution, functional and anatomical information content, image/data analytics will be more ubiquitous and integral to medical imaging capability.

  17. Assessment of spatial information for hyperspectral imaging of lesion

    NASA Astrophysics Data System (ADS)

    Yang, Xue; Li, Gang; Lin, Ling

    2016-10-01

    Multiple diseases such as breast tumor poses a great threat to women's health and life, while the traditional detection method is complex, costly and unsuitable for frequently self-examination, therefore, an inexpensive, convenient and efficient method for tumor self-inspection is needed urgently, and lesion localization is an important step. This paper proposes an self-examination method for positioning of a lesion. The method adopts transillumination to acquire the hyperspectral images and to assess the spatial information of lesion. Firstly, multi-wavelength sources are modulated with frequency division, which is advantageous to separate images of different wavelength, meanwhile, the source serves as fill light to each other to improve the sensitivity in the low-lightlevel imaging. Secondly, the signal-to-noise ratio of transmitted images after demodulation are improved by frame accumulation technology. Next, gray distributions of transmitted images are analyzed. The gray-level differences is constituted by the actual transmitted images and fitting transmitted images of tissue without lesion, which is to rule out individual differences. Due to scattering effect, there will be transition zones between tissue and lesion, and the zone changes with wavelength change, which will help to identify the structure details of lesion. Finally, image segmentation is adopted to extract the lesion and the transition zones, and the spatial features of lesion are confirmed according to the transition zones and the differences of transmitted light intensity distributions. Experiment using flat-shaped tissue as an example shows that the proposed method can extract the space information of lesion.

  18. Using uncertainty to link and rank evidence from biomedical literature for model curation

    PubMed Central

    Zerva, Chrysoula; Batista-Navarro, Riza; Day, Philip; Ananiadou, Sophia

    2017-01-01

    Abstract Motivation In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models. Results We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research. Availability and implementation The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material. Contact sophia.ananiadou@manchester.ac.uk Supplementary information Supplementary data are available at Bioinformatics online. PMID:29036627

  19. A two-level cache for distributed information retrieval in search engines.

    PubMed

    Zhang, Weizhe; He, Hui; Ye, Jianwei

    2013-01-01

    To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache.

  20. A Two-Level Cache for Distributed Information Retrieval in Search Engines

    PubMed Central

    Zhang, Weizhe; He, Hui; Ye, Jianwei

    2013-01-01

    To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache. PMID:24363621

  1. Information Extraction Using Controlled English to Support Knowledge-Sharing and Decision-Making

    DTIC Science & Technology

    2012-06-01

    or language variants. CE-based information extraction will greatly facilitate the processes in the cognitive and social domains that enable forces...terminology or language variants. CE-based information extraction will greatly facilitate the processes in the cognitive and social domains that...processor is run to turn the atomic CE into a more “ stylistically felicitous” CE, using techniques such as: aggregating all information about an entity

  2. Real-Time Information Extraction from Big Data

    DTIC Science & Technology

    2015-10-01

    I N S T I T U T E F O R D E F E N S E A N A L Y S E S Real-Time Information Extraction from Big Data Robert M. Rolfe...Information Extraction from Big Data Jagdeep Shah Robert M. Rolfe Francisco L. Loaiza-Lemos October 7, 2015 I N S T I T U T E F O R D E F E N S E...AN A LY S E S Abstract We are drowning under the 3 Vs (volume, velocity and variety) of big data . Real-time information extraction from big

  3. New techniques for positron emission tomography in the study of human neurological disorders. Progress report, June 1990--June 1993

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuhl, D.E.

    1993-06-01

    This progress report describes accomplishments of four programs. The four programs are entitled (1) Faster,simpler processing of positron-computing precursors: New physicochemical approaches, (2) Novel solid phase reagents and methods to improve radiosynthesis and isotope production, (3) Quantitative evaluation of the extraction of information from PET images, and (4) Optimization of tracer kinetic methods for radioligand studies in PET.

  4. Joint Force Quarterly. Number 15, Spring 1997

    DTIC Science & Technology

    1997-06-01

    headquarters to extract information from sensors on the vehicle without bothering crew members with extraneous reports. Position loca- tion devices on... change in how they do business. Air Force lean logistics and Army velocity management programs are literal springboards for quantum improvements in...Spring 1997 Victory smiles upon those who anticipate the changes in the character of war, not upon those who wait to adapt themselves after the changes

  5. UWB channel estimation using new generating TR transceivers

    DOEpatents

    Nekoogar, Faranak [San Ramon, CA; Dowla, Farid U [Castro Valley, CA; Spiridon, Alex [Palo Alto, CA; Haugen, Peter C [Livermore, CA; Benzel, Dave M [Livermore, CA

    2011-06-28

    The present invention presents a simple and novel channel estimation scheme for UWB communication systems. As disclosed herein, the present invention maximizes the extraction of information by incorporating a new generation of transmitted-reference (Tr) transceivers that utilize a single reference pulse(s) or a preamble of reference pulses to provide improved channel estimation while offering higher Bit Error Rate (BER) performance and data rates without diluting the transmitter power.

  6. Fourier transform profilometry (FTP) using an innovative band-pass filter for accurate 3-D surface reconstruction

    NASA Astrophysics Data System (ADS)

    Chen, Liang-Chia; Ho, Hsuan-Wei; Nguyen, Xuan-Loc

    2010-02-01

    This article presents a novel band-pass filter for Fourier transform profilometry (FTP) for accurate 3-D surface reconstruction. FTP can be employed to obtain 3-D surface profiles by one-shot images to achieve high-speed measurement. However, its measurement accuracy has been significantly influenced by the spectrum filtering process required to extract the phase information representing various surface heights. Using the commonly applied 2-D Hanning filter, the measurement errors could be up to 5-10% of the overall measuring height and it is unacceptable to various industrial application. To resolve this issue, the article proposes an elliptical band-pass filter for extracting the spectral region possessing essential phase information for reconstructing accurate 3-D surface profiles. The elliptical band-pass filter was developed and optimized to reconstruct 3-D surface models with improved measurement accuracy. Some experimental results verify that the accuracy can be effectively enhanced by using the elliptical filter. The accuracy improvement of 44.1% and 30.4% can be achieved in 3-D and sphericity measurement, respectively, when the elliptical filter replaces the traditional filter as the band-pass filtering method. Employing the developed method, the maximum measured error can be kept within 3.3% of the overall measuring range.

  7. [Cardiac Synchronization Function Estimation Based on ASM Level Set Segmentation Method].

    PubMed

    Zhang, Yaonan; Gao, Yuan; Tang, Liang; He, Ying; Zhang, Huie

    At present, there is no accurate and quantitative methods for the determination of cardiac mechanical synchronism, and quantitative determination of the synchronization function of the four cardiac cavities with medical images has a great clinical value. This paper uses the whole heart ultrasound image sequence, and segments the left & right atriums and left & right ventricles of each frame. After the segmentation, the number of pixels in each cavity and in each frame is recorded, and the areas of the four cavities of the image sequence are therefore obtained. The area change curves of the four cavities are further extracted, and the synchronous information of the four cavities is obtained. Because of the low SNR of Ultrasound images, the boundary lines of cardiac cavities are vague, so the extraction of cardiac contours is still a challenging problem. Therefore, the ASM model information is added to the traditional level set method to force the curve evolution process. According to the experimental results, the improved method improves the accuracy of the segmentation. Furthermore, based on the ventricular segmentation, the right and left ventricular systolic functions are evaluated, mainly according to the area changes. The synchronization of the four cavities of the heart is estimated based on the area changes and the volume changes.

  8. Identification of research hypotheses and new knowledge from scientific literature.

    PubMed

    Shardlow, Matthew; Batista-Navarro, Riza; Thompson, Paul; Nawaz, Raheel; McNaught, John; Ananiadou, Sophia

    2018-06-25

    Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions. We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.

  9. Using latent semantic analysis and the predication algorithm to improve extraction of meanings from a diagnostic corpus.

    PubMed

    Jorge-Botana, Guillermo; Olmos, Ricardo; León, José Antonio

    2009-11-01

    There is currently a widespread interest in indexing and extracting taxonomic information from large text collections. An example is the automatic categorization of informally written medical or psychological diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as an heuristic tool for helping doctors. Vector space models have been successfully used to this end (Lee, Cimino, Zhu, Sable, Shanker, Ely & Yu, 2006; Pakhomov, Buntrock & Chute, 2006). In this study we use a computational model known as Latent Semantic Analysis (LSA) on a diagnostic corpus with the aim of retrieving definitions (in the form of lists of semantic neighbors) of common structures it contains (e.g. "storm phobia", "dog phobia") or less common structures that might be formed by logical combinations of categories and diagnostic symptoms (e.g. "gun personality" or "germ personality"). In the quest to bring definitions into line with the meaning of structures and make them in some way representative, various problems commonly arise while recovering content using vector space models. We propose some approaches which bypass these problems, such as Kintsch's (2001) predication algorithm and some corrections to the way lists of neighbors are obtained, which have already been tested on semantic spaces in a non-specific domain (Jorge-Botana, León, Olmos & Hassan-Montero, under review). The results support the idea that the predication algorithm may also be useful for extracting more precise meanings of certain structures from scientific corpora, and that the introduction of some corrections based on vector length may increases its efficiency on non-representative terms.

  10. ALE: automated label extraction from GEO metadata.

    PubMed

    Giles, Cory B; Brown, Chase A; Ripperger, Michael; Dennis, Zane; Roopnarinesingh, Xiavan; Porter, Hunter; Perz, Aleksandra; Wren, Jonathan D

    2017-12-28

    NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.

  11. Automatic Detection and Recognition of Craters Based on the Spectral Features of Lunar Rocks and Minerals

    NASA Astrophysics Data System (ADS)

    Ye, L.; Xu, X.; Luan, D.; Jiang, W.; Kang, Z.

    2017-07-01

    Crater-detection approaches can be divided into four categories: manual recognition, shape-profile fitting algorithms, machine-learning methods and geological information-based analysis using terrain and spectral data. The mainstream method is Shape-profile fitting algorithms. Many scholars throughout the world use the illumination gradient information to fit standard circles by least square method. Although this method has achieved good results, it is difficult to identify the craters with poor "visibility", complex structure and composition. Moreover, the accuracy of recognition is difficult to be improved due to the multiple solutions and noise interference. Aiming at the problem, we propose a method for the automatic extraction of impact craters based on spectral characteristics of the moon rocks and minerals: 1) Under the condition of sunlight, the impact craters are extracted from MI by condition matching and the positions as well as diameters of the craters are obtained. 2) Regolith is spilled while lunar is impacted and one of the elements of lunar regolith is iron. Therefore, incorrectly extracted impact craters can be removed by judging whether the crater contains "non iron" element. 3) Craters which are extracted correctly, are divided into two types: simple type and complex type according to their diameters. 4) Get the information of titanium and match the titanium distribution of the complex craters with normal distribution curve, then calculate the goodness of fit and set the threshold. The complex craters can be divided into two types: normal distribution curve type of titanium and non normal distribution curve type of titanium. We validated our proposed method with MI acquired by SELENE. Experimental results demonstrate that the proposed method has good performance in the test area.

  12. Research on pre-processing of QR Code

    NASA Astrophysics Data System (ADS)

    Sun, Haixing; Xia, Haojie; Dong, Ning

    2013-10-01

    QR code encodes many kinds of information because of its advantages: large storage capacity, high reliability, full arrange of utter-high-speed reading, small printing size and high-efficient representation of Chinese characters, etc. In order to obtain the clearer binarization image from complex background, and improve the recognition rate of QR code, this paper researches on pre-processing methods of QR code (Quick Response Code), and shows algorithms and results of image pre-processing for QR code recognition. Improve the conventional method by changing the Souvola's adaptive text recognition method. Additionally, introduce the QR code Extraction which adapts to different image size, flexible image correction approach, and improve the efficiency and accuracy of QR code image processing.

  13. Critical analysis of 3-D organoid in vitro cell culture models for high-throughput drug candidate toxicity assessments.

    PubMed

    Astashkina, Anna; Grainger, David W

    2014-04-01

    Drug failure due to toxicity indicators remains among the primary reasons for staggering drug attrition rates during clinical studies and post-marketing surveillance. Broader validation and use of next-generation 3-D improved cell culture models are expected to improve predictive power and effectiveness of drug toxicological predictions. However, after decades of promising research significant gaps remain in our collective ability to extract quality human toxicity information from in vitro data using 3-D cell and tissue models. Issues, challenges and future directions for the field to improve drug assay predictive power and reliability of 3-D models are reviewed. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Extraction of Data from a Hospital Information System to Perform Process Mining.

    PubMed

    Neira, Ricardo Alfredo Quintano; de Vries, Gert-Jan; Caffarel, Jennifer; Stretton, Erin

    2017-01-01

    The aim of this work is to share our experience in relevant data extraction from a hospital information system in preparation for a research study using process mining techniques. The steps performed were: research definition, mapping the normative processes, identification of tables and fields names of the database, and extraction of data. We then offer lessons learned during data extraction phase. Any errors made in the extraction phase will propagate and have implications on subsequent analyses. Thus, it is essential to take the time needed and devote sufficient attention to detail to perform all activities with the goal of ensuring high quality of the extracted data. We hope this work will be informative for other researchers to plan and execute extraction of data for process mining research studies.

  15. A study on building data warehouse of hospital information system.

    PubMed

    Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo

    2011-08-01

    Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

  16. Improved solvent-free microwave extraction of essential oil from dried Cuminum cyminum L. and Zanthoxylum bungeanum Maxim.

    PubMed

    Wang, Ziming; Ding, Lan; Li, Tiechun; Zhou, Xin; Wang, Lu; Zhang, Hanqi; Liu, Li; Li, Ying; Liu, Zhihong; Wang, Hongju; Zeng, Hong; He, Hui

    2006-01-13

    Solvent-free microwave extraction (SFME) is a recently developed green technique which is performed in atmospheric conditions without adding any solvent or water. SFME has already been applied to extraction of essential oil from fresh plant materials or dried materials prior moistened. The essential oil is evaporated by the in situ water in the plant materials. In this paper, it was observed that an improved SFME, in which a kind of microwave absorption solid medium, such as carbonyl iron powders (CIP), was added and mixed with the sample, can be applied to extraction of essential oil from the dried plant materials without any pretreatment. Because the microwave absorption capacity of CIP is much better than that of water, the extraction time while using the improved SFME is no more than 30 min using a microwave power of 85 W. Compared to the conventional SFME, the advantages of improved SFME were to speed up the extraction rate and need no pretreatment. Improved SFME has been compared with conventional SFME, microwave-assisted hydrodistillation (MAHD) and conventional hydrodistillation (HD) for the extraction of essential oil from dried Cuminum cyminum L. and Zanthoxylum bungeanum Maxim. By using GC-MS system the compositions of essential oil extracted by applying four kinds of extraction methods were identified. There was no obvious difference in the quality of essential oils obtained by the four kinds of extraction methods.

  17. Influence of orthodontic treatment with first premolar extraction on the angulation of the mandibular third molar.

    PubMed

    Al Kuwari, Huda M; Talakey, Arwa A; Al-Sahli, Reem M; Albadr, Anisa H

    2013-06-01

    To evaluate the influence of orthodontic treatment that involved first premolars extraction on the angulation of the developing mandibular third molars, and whether this will result in an improvement in it's path of eruption during tooth development. A cross-sectional radiographic study was conducted using 80 panoramic radiographs of 40 orthodontic patients previously treated at the College of Dentistry, King Saud University, Riyadh, Kingdom of Saudi Arabia. The sample consisted of 2 groups, extraction and non-extraction orthodontic therapy group with equal number of patients in each group. The orthodontic treatment of the extraction group involved the extraction of first premolars, whilst non-extraction group had received orthodontic therapy without teeth extraction. The angulation of the right and left third mandibular molars was measured in each patient separately, and the data was analyzed using the non-parametric Mann-Whitney Test. The present data has shown significant improvement in the third molars angulation in the extraction orthodontic therapy group compared to non-extraction group, Although this finding was significant in both genders, females tend to show better response in the improvement of third molar angulation to extraction therapy than males (p=0.001, p=0.006). Orthodontic treatment with first premolars extraction has improved the third molars angulation during their course of eruptions and consequently supports the decision of the orthodontic extraction therapy approach in borderline cases.

  18. Place in Perspective: Extracting Online Information about Points of Interest

    NASA Astrophysics Data System (ADS)

    Alves, Ana O.; Pereira, Francisco C.; Rodrigues, Filipe; Oliveirinha, João

    During the last few years, the amount of online descriptive information about places has reached reasonable dimensions for many cities in the world. Being such information mostly in Natural Language text, Information Extraction techniques are needed for obtaining the meaning of places that underlies these massive amounts of commonsense and user made sources. In this article, we show how we automatically label places using Information Extraction techniques applied to online resources such as Wikipedia, Yellow Pages and Yahoo!.

  19. Comparison of reduced sugar high quality chocolates sweetened with stevioside and crude stevia 'green' extract.

    PubMed

    Torri, Luisa; Frati, Alessandra; Ninfali, Paolino; Mantegna, Stefano; Cravotto, Giancarlo; Morini, Gabriella

    2017-06-01

    The demand for zero and reduced-sugar food products containing cocoa is expanding continuously. The present study was designed to evaluate the feasibility of producing high-quality chocolate sweetened with a crude extract of Stevia rebaudiana (Bertoni) prepared by a green microwave-assisted water-steam extraction procedure. Seven approximately isosweet chocolate formulations were developed, mixing cocoa paste, sucrose, commercial stevioside, crude green extract and maltitol in different proportions. All samples were analyzed for the determination of polyphenol and flavonoid content, antioxidant activity, and sensory acceptability. The use of a crude stevia extract allowed low-sugar, high-quality chocolates to be obtained that were also acceptable by consumers and had a significant increased antioxidant activity. Moreover, consumers' segmentation revealed a cluster of consumers showing the same overall liking for the sample with 50% sucrose replaced by the stevia crude extract as that obtained with the commercial stevioside and the control sample (without sucrose replacement). The results provide information that can contribute to promoting the development of sweet food products, with advantages in terms of an improved nutritional value (reduced sugar content and increased antioxidant activity) and a reduced impact of the production process on the environment. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  20. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature

    PubMed Central

    Jimeno Yepes, Antonio; Verspoor, Karin

    2014-01-01

    As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature. PMID:25285203

  1. Extending the spectrum of DNA sequences retrieved from ancient bones and teeth

    PubMed Central

    Glocke, Isabelle; Meyer, Matthias

    2017-01-01

    The number of DNA fragments surviving in ancient bones and teeth is known to decrease with fragment length. Recent genetic analyses of Middle Pleistocene remains have shown that the recovery of extremely short fragments can prove critical for successful retrieval of sequence information from particularly degraded ancient biological material. Current sample preparation techniques, however, are not optimized to recover DNA sequences from fragments shorter than ∼35 base pairs (bp). Here, we show that much shorter DNA fragments are present in ancient skeletal remains but lost during DNA extraction. We present a refined silica-based DNA extraction method that not only enables efficient recovery of molecules as short as 25 bp but also doubles the yield of sequences from longer fragments due to improved recovery of molecules with single-strand breaks. Furthermore, we present strategies for monitoring inefficiencies in library preparation that may result from co-extraction of inhibitory substances during DNA extraction. The combination of DNA extraction and library preparation techniques described here substantially increases the yield of DNA sequences from ancient remains and provides access to a yet unexploited source of highly degraded DNA fragments. Our work may thus open the door for genetic analyses on even older material. PMID:28408382

  2. Simultaneous Determination of Oleanolic Acid and Ursolic Acid by in Vivo Microdialysis via UHPLC-MS/MS Using Magnetic Dispersive Solid Phase Extraction Coupling with Microwave-Assisted Derivatization and Its Application to a Pharmacokinetic Study of Arctiumlappa L. Root Extract in Rats.

    PubMed

    Zheng, Zhenjia; Zhao, Xian-En; Zhu, Shuyun; Dang, Jun; Qiao, Xuguang; Qiu, Zhichang; Tao, Yanduo

    2018-04-18

    Simultaneous detection of oleanolic acid and ursolic acid in rat blood by in vivo microdialysis can provide important pharmacokinetics information. Microwave-assisted derivatization coupled with magnetic dispersive solid phase extraction was established for the determination of oleanolic acid and ursolic acid by liquid chromatography tandem mass spectrometry. 2'-Carbonyl-piperazine rhodamine B was first designed and synthesized as the derivatization reagent, which was easily adsorbed onto the surface of Fe 3 O 4 /graphene oxide. Simultaneous derivatization and extraction of oleanolic acid and ursolic acid were performed on Fe 3 O 4 /graphene oxide. The permanent positive charge of the derivatization reagent significantly improved the ionization efficiencies. The limits of detection were 0.025 and 0.020 ng/mL for oleanolic acid and ursolic acid, respectively. The validated method was shown to be promising for sensitive, accurate, and simultaneous determination of oleanolic acid and ursolic acid. It was used for their pharmacokinetics study in rat blood after oral administration of Arctiumlappa L. root extract.

  3. Applying different independent component analysis algorithms and support vector regression for IT chain store sales forecasting.

    PubMed

    Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.

  4. Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

    PubMed Central

    Dai, Wensheng

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740

  5. Residual and Destroyed Accessible Information after Measurements

    NASA Astrophysics Data System (ADS)

    Han, Rui; Leuchs, Gerd; Grassl, Markus

    2018-04-01

    When quantum states are used to send classical information, the receiver performs a measurement on the signal states. The amount of information extracted is often not optimal due to the receiver's measurement scheme and experimental apparatus. For quantum nondemolition measurements, there is potentially some residual information in the postmeasurement state, while part of the information has been extracted and the rest is destroyed. Here, we propose a framework to characterize a quantum measurement by how much information it extracts and destroys, and how much information it leaves in the residual postmeasurement state. The concept is illustrated for several receivers discriminating coherent states.

  6. Question analysis for Indonesian comparative question

    NASA Astrophysics Data System (ADS)

    Saelan, A.; Purwarianti, A.; Widyantoro, D. H.

    2017-01-01

    Information seeking is one of human needs today. Comparing things using search engine surely take more times than search only one thing. In this paper, we analyzed comparative questions for comparative question answering system. Comparative question is a question that comparing two or more entities. We grouped comparative questions into 5 types: selection between mentioned entities, selection between unmentioned entities, selection between any entity, comparison, and yes or no question. Then we extracted 4 types of information from comparative questions: entity, aspect, comparison, and constraint. We built classifiers for classification task and information extraction task. Features used for classification task are bag of words, whether for information extraction, we used lexical, 2 previous and following words lexical, and previous label as features. We tried 2 scenarios: classification first and extraction first. For classification first, we used classification result as a feature for extraction. Otherwise, for extraction first, we used extraction result as features for classification. We found that the result would be better if we do extraction first before classification. For the extraction task, classification using SMO gave the best result (88.78%), while for classification, it is better to use naïve bayes (82.35%).

  7. Semantic Preview Benefit in English: Individual Differences in the Extraction and Use of Parafoveal Semantic Information

    ERIC Educational Resources Information Center

    Veldre, Aaron; Andrews, Sally

    2016-01-01

    Although there is robust evidence that skilled readers of English extract and use orthographic and phonological information from the parafovea to facilitate word identification, semantic preview benefits have been elusive. We sought to establish whether individual differences in the extraction and/or use of parafoveal semantic information could…

  8. LiveWire interactive boundary extraction algorithm based on Haar wavelet transform and control point set direction search

    NASA Astrophysics Data System (ADS)

    Cheng, Jun; Zhang, Jun; Tian, Jinwen

    2015-12-01

    Based on deep analysis of the LiveWire interactive boundary extraction algorithm, a new algorithm focusing on improving the speed of LiveWire algorithm is proposed in this paper. Firstly, the Haar wavelet transform is carried on the input image, and the boundary is extracted on the low resolution image obtained by the wavelet transform of the input image. Secondly, calculating LiveWire shortest path is based on the control point set direction search by utilizing the spatial relationship between the two control points users provide in real time. Thirdly, the search order of the adjacent points of the starting node is set in advance. An ordinary queue instead of a priority queue is taken as the storage pool of the points when optimizing their shortest path value, thus reducing the complexity of the algorithm from O[n2] to O[n]. Finally, A region iterative backward projection method based on neighborhood pixel polling has been used to convert dual-pixel boundary of the reconstructed image to single-pixel boundary after Haar wavelet inverse transform. The algorithm proposed in this paper combines the advantage of the Haar wavelet transform and the advantage of the optimal path searching method based on control point set direction search. The former has fast speed of image decomposition and reconstruction and is more consistent with the texture features of the image and the latter can reduce the time complexity of the original algorithm. So that the algorithm can improve the speed in interactive boundary extraction as well as reflect the boundary information of the image more comprehensively. All methods mentioned above have a big role in improving the execution efficiency and the robustness of the algorithm.

  9. A sentence sliding window approach to extract protein annotations from biomedical articles

    PubMed Central

    Krallinger, Martin; Padron, Maria; Valencia, Alfonso

    2005-01-01

    Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831

  10. Extraction of liver volumetry based on blood vessel from the portal phase CT dataset

    NASA Astrophysics Data System (ADS)

    Maklad, Ahmed S.; Matsuhiro, Mikio; Suzuki, Hidenobu; Kawata, Yoshiki; Niki, Noboru; Utsunomiya, Tohru; Shimada, Mitsuo

    2012-02-01

    At liver surgery planning stage, the liver volumetry would be essential for surgeons. Main problem at liver extraction is the wide variability of livers in shapes and sizes. Since, hepatic blood vessels structure varies from a person to another and covers liver region, the present method uses that information for extraction of liver in two stages. The first stage is to extract abdominal blood vessels in the form of hepatic and nonhepatic blood vessels. At the second stage, extracted vessels are used to control extraction of liver region automatically. Contrast enhanced CT datasets at only the portal phase of 50 cases is used. Those data include 30 abnormal livers. A reference for all cases is done through a comparison of two experts labeling results and correction of their inter-reader variability. Results of the proposed method agree with the reference at an average rate of 97.8%. Through application of different metrics mentioned at MICCAI workshop for liver segmentation, it is found that: volume overlap error is 4.4%, volume difference is 0.3%, average symmetric distance is 0.7 mm, Root mean square symmetric distance is 0.8 mm, and maximum distance is 15.8 mm. These results represent the average of overall data and show an improved accuracy compared to current liver segmentation methods. It seems to be a promising method for extraction of liver volumetry of various shapes and sizes.

  11. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes

    PubMed Central

    Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D.; Day, Michele E.; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan

    2016-01-01

    Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort. PMID:28269947

  12. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

    PubMed

    Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D; Day, Michele E; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan

    2016-01-01

    Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.

  13. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  14. [Extraction of buildings three-dimensional information from high-resolution satellite imagery based on Barista software].

    PubMed

    Zhang, Pei-feng; Hu, Yuan-man; He, Hong-shi

    2010-05-01

    The demand for accurate and up-to-date spatial information of urban buildings is becoming more and more important for urban planning, environmental protection, and other vocations. Today's commercial high-resolution satellite imagery offers the potential to extract the three-dimensional information of urban buildings. This paper extracted the three-dimensional information of urban buildings from QuickBird imagery, and validated the precision of the extraction based on Barista software. It was shown that the extraction of three-dimensional information of the buildings from high-resolution satellite imagery based on Barista software had the advantages of low professional level demand, powerful universality, simple operation, and high precision. One pixel level of point positioning and height determination accuracy could be achieved if the digital elevation model (DEM) and sensor orientation model had higher precision and the off-Nadir View Angle was relatively perfect.

  15. Multi-Temporal Multi-Sensor Analysis of Urbanization and Environmental/Climate Impact in China for Sustainable Urban Development

    NASA Astrophysics Data System (ADS)

    Ban, Yifang; Gong, Peng; Gamba, Paolo; Taubenbock, Hannes; Du, Peijun

    2016-08-01

    The overall objective of this research is to investigate multi-temporal, multi-scale, multi-sensor satellite data for analysis of urbanization and environmental/climate impact in China to support sustainable planning. Multi- temporal multi-scale SAR and optical data have been evaluated for urban information extraction using innovative methods and algorithms, including KTH- Pavia Urban Extractor, Pavia UEXT, and an "exclusion- inclusion" framework for urban extent extraction, and KTH-SEG, a novel object-based classification method for detailed urban land cover mapping. Various pixel- based and object-based change detection algorithms were also developed to extract urban changes. Several Chinese cities including Beijing, Shanghai and Guangzhou are selected as study areas. Spatio-temporal urbanization patterns and environmental impact at regional, metropolitan and city core were evaluated through ecosystem service, landscape metrics, spatial indices, and/or their combinations. The relationship between land surface temperature and land-cover classes was also analyzed.The urban extraction results showed that urban areas and small towns could be well extracted using multitemporal SAR data with the KTH-Pavia Urban Extractor and UEXT. The fusion of SAR data at multiple scales from multiple sensors was proven to improve urban extraction. For urban land cover mapping, the results show that the fusion of multitemporal SAR and optical data could produce detailed land cover maps with improved accuracy than that of SAR or optical data alone. Pixel-based and object-based change detection algorithms developed with the project were effective to extract urban changes. Comparing the urban land cover results from mulitemporal multisensor data, the environmental impact analysis indicates major losses for food supply, noise reduction, runoff mitigation, waste treatment and global climate regulation services through landscape structural changes in terms of decreases in service area, edge contamination and fragmentation. In terms ofclimate impact, the results indicate that land surface temperature can be related to land use/land cover classes.

  16. Maximal use of kinematic information for the extraction of the mass of the top quark in single-lepton tt bar events at DO

    NASA Astrophysics Data System (ADS)

    Estrada Vigil, Juan Cruz

    The mass of the top (t) quark has been measured in the lepton+jets channel of tt¯ final states studied by the DØ and CDF experiments at Fermilab using data from Run I of the Tevatron pp¯ collider. The result published by DØ is 173.3 +/- 5.6(stat) +/- 5.5(syst) GeV. We present a different method to perform this measurement using the existing data. The new technique uses all available kinematic information in an event, and provides a significantly smaller statistical uncertainty than achieved in previous analyses. The preliminary results presented in this thesis indicate a statistical uncertainty for the extracted mass of the top quark of 3.5 GeV, which represents a significant improvement over the previous value of 5.6 GeV. The method of analysis is very general, and may be particularly useful in situations where there is a small signal and a large background.

  17. Pattern recognition of satellite cloud imagery for improved weather prediction

    NASA Technical Reports Server (NTRS)

    Gautier, Catherine; Somerville, Richard C. J.; Volfson, Leonid B.

    1986-01-01

    The major accomplishment was the successful development of a method for extracting time derivative information from geostationary meteorological satellite imagery. This research is a proof-of-concept study which demonstrates the feasibility of using pattern recognition techniques and a statistical cloud classification method to estimate time rate of change of large-scale meteorological fields from remote sensing data. The cloud classification methodology is based on typical shape function analysis of parameter sets characterizing the cloud fields. The three specific technical objectives, all of which were successfully achieved, are as follows: develop and test a cloud classification technique based on pattern recognition methods, suitable for the analysis of visible and infrared geostationary satellite VISSR imagery; develop and test a methodology for intercomparing successive images using the cloud classification technique, so as to obtain estimates of the time rate of change of meteorological fields; and implement this technique in a testbed system incorporating an interactive graphics terminal to determine the feasibility of extracting time derivative information suitable for comparison with numerical weather prediction products.

  18. Human fatigue expression recognition through image-based dynamic multi-information and bimodal deep learning

    NASA Astrophysics Data System (ADS)

    Zhao, Lei; Wang, Zengcai; Wang, Xiaojin; Qi, Yazhou; Liu, Qing; Zhang, Guoxin

    2016-09-01

    Human fatigue is an important cause of traffic accidents. To improve the safety of transportation, we propose, in this paper, a framework for fatigue expression recognition using image-based facial dynamic multi-information and a bimodal deep neural network. First, the landmark of face region and the texture of eye region, which complement each other in fatigue expression recognition, are extracted from facial image sequences captured by a single camera. Then, two stacked autoencoder neural networks are trained for landmark and texture, respectively. Finally, the two trained neural networks are combined by learning a joint layer on top of them to construct a bimodal deep neural network. The model can be used to extract a unified representation that fuses landmark and texture modalities together and classify fatigue expressions accurately. The proposed system is tested on a human fatigue dataset obtained from an actual driving environment. The experimental results demonstrate that the proposed method performs stably and robustly, and that the average accuracy achieves 96.2%.

  19. A hybrid sales forecasting scheme by combining independent component analysis with K-means clustering and support vector regression.

    PubMed

    Lu, Chi-Jie; Chang, Chi-Chang

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.

  20. A Hybrid Sales Forecasting Scheme by Combining Independent Component Analysis with K-Means Clustering and Support Vector Regression

    PubMed Central

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738

  1. Electronic health information quality challenges and interventions to improve public health surveillance data and practice.

    PubMed

    Dixon, Brian E; Siegel, Jason A; Oemig, Tanya V; Grannis, Shaun J

    2013-01-01

    We examined completeness, an attribute of data quality, in the context of electronic laboratory reporting (ELR) of notifiable disease information to public health agencies. We extracted more than seven million ELR messages from multiple clinical information systems in two states. We calculated and compared the completeness of various data fields within the messages that were identified to be important to public health reporting processes. We compared unaltered, original messages from source systems with similar messages from another state as well as messages enriched by a health information exchange (HIE). Our analysis focused on calculating completeness (i.e., the number of nonmissing values) for fields deemed important for inclusion in notifiable disease case reports. The completeness of data fields for laboratory transactions varied across clinical information systems and jurisdictions. Fields identifying the patient and test results were usually complete (97%-100%). Fields containing patient demographics, patient contact information, and provider contact information were suboptimal (6%-89%). Transactions enhanced by the HIE were found to be more complete (increases ranged from 2% to 25%) than the original messages. ELR data from clinical information systems can be of suboptimal quality. Public health monitoring of data sources and augmentation of ELR message content using HIE services can improve data quality.

  2. a Statistical Texture Feature for Building Collapse Information Extraction of SAR Image

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, H.; Chen, Q.; Liu, X.

    2018-04-01

    Synthetic Aperture Radar (SAR) has become one of the most important ways to extract post-disaster collapsed building information, due to its extreme versatility and almost all-weather, day-and-night working capability, etc. In view of the fact that the inherent statistical distribution of speckle in SAR images is not used to extract collapsed building information, this paper proposed a novel texture feature of statistical models of SAR images to extract the collapsed buildings. In the proposed feature, the texture parameter of G0 distribution from SAR images is used to reflect the uniformity of the target to extract the collapsed building. This feature not only considers the statistical distribution of SAR images, providing more accurate description of the object texture, but also is applied to extract collapsed building information of single-, dual- or full-polarization SAR data. The RADARSAT-2 data of Yushu earthquake which acquired on April 21, 2010 is used to present and analyze the performance of the proposed method. In addition, the applicability of this feature to SAR data with different polarizations is also analysed, which provides decision support for the data selection of collapsed building information extraction.

  3. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD.

    PubMed

    Bullinaria, John A; Levy, Joseph P

    2012-09-01

    In a previous article, we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of pointwise mutual information values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This article extends that study by investigating the use of three further factors--namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)--that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD-based methods for generating semantic representations (that provide new state-of-the-art performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.

  4. An enhanced narrow-band imaging method for the microvessel detection

    NASA Astrophysics Data System (ADS)

    Yu, Feng; Song, Enmin; Liu, Hong; Wan, Youming; Zhu, Jun; Hung, Chih-Cheng

    2018-02-01

    A medical endoscope system combined with the narrow-band imaging (NBI), has been shown to be a superior diagnostic tool for early cancer detection. The NBI can reveal the morphologic changes of microvessels in the superficial cancer. In order to improve the conspicuousness of microvessel texture, we propose an enhanced NBI method to improve the conspicuousness of endoscopic images. To obtain the more conspicuous narrow-band images, we use the edge operator to extract the edge information of the narrow-band blue and green images, and give a weight to the extracted edges. Then, the weighted edges are fused with the narrow-band blue and green images. Finally, the displayed endoscopic images are reconstructed with the enhanced narrow-band images. In addition, we evaluate the performance of enhanced narrow-band images with different edge operators. Experimental results indicate that the Sobel and Canny operators achieve the best performance of all. Compared with traditional NBI method of Olympus company, our proposed method has more conspicuous texture of microvessel.

  5. A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports.

    PubMed

    Liu, Xiao; Chen, Hsinchun

    2015-12-01

    Social media offer insights of patients' medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work. Published by Elsevier Inc.

  6. Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing

    PubMed Central

    Wen, Tailai; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi

    2018-01-01

    The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors’ responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose’s classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods. PMID:29382146

  7. Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing.

    PubMed

    Wen, Tailai; Yan, Jia; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi

    2018-01-29

    The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors' responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose's classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods.

  8. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

    PubMed Central

    Hunter, Lawrence; Lu, Zhiyong; Firby, James; Baumgartner, William A; Johnson, Helen L; Ogren, Philip V; Cohen, K Bretonnel

    2008-01-01

    Background Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at PMID:18237434

  9. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information.

    PubMed

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Yan, Bin; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition.

  10. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information

    PubMed Central

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. PMID:26380294

  11. Inverse scattering approach to improving pattern recognition

    NASA Astrophysics Data System (ADS)

    Chapline, George; Fu, Chi-Yung

    2005-05-01

    The Helmholtz machine provides what may be the best existing model for how the mammalian brain recognizes patterns. Based on the observation that the "wake-sleep" algorithm for training a Helmholtz machine is similar to the problem of finding the potential for a multi-channel Schrodinger equation, we propose that the construction of a Schrodinger potential using inverse scattering methods can serve as a model for how the mammalian brain learns to extract essential information from sensory data. In particular, inverse scattering theory provides a conceptual framework for imagining how one might use EEG and MEG observations of brain-waves together with sensory feedback to improve human learning and pattern recognition. Longer term, implementation of inverse scattering algorithms on a digital or optical computer could be a step towards mimicking the seamless information fusion of the mammalian brain.

  12. -Omic and Electronic Health Record Big Data Analytics for Precision Medicine.

    PubMed

    Wu, Po-Yen; Cheng, Chih-Wen; Kaddi, Chanchala D; Venugopalan, Janani; Hoffman, Ryan; Wang, May D

    2017-02-01

    Rapid advances of high-throughput technologies and wide adoption of electronic health records (EHRs) have led to fast accumulation of -omic and EHR data. These voluminous complex data contain abundant information for precision medicine, and big data analytics can extract such knowledge to improve the quality of healthcare. In this paper, we present -omic and EHR data characteristics, associated challenges, and data analytics including data preprocessing, mining, and modeling. To demonstrate how big data analytics enables precision medicine, we provide two case studies, including identifying disease biomarkers from multi-omic data and incorporating -omic information into EHR. Big data analytics is able to address -omic and EHR data challenges for paradigm shift toward precision medicine. Big data analytics makes sense of -omic and EHR data to improve healthcare outcome. It has long lasting societal impact.

  13. Inverse Scattering Approach to Improving Pattern Recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapline, G; Fu, C

    2005-02-15

    The Helmholtz machine provides what may be the best existing model for how the mammalian brain recognizes patterns. Based on the observation that the ''wake-sleep'' algorithm for training a Helmholtz machine is similar to the problem of finding the potential for a multi-channel Schrodinger equation, we propose that the construction of a Schrodinger potential using inverse scattering methods can serve as a model for how the mammalian brain learns to extract essential information from sensory data. In particular, inverse scattering theory provides a conceptual framework for imagining how one might use EEG and MEG observations of brain-waves together with sensorymore » feedback to improve human learning and pattern recognition. Longer term, implementation of inverse scattering algorithms on a digital or optical computer could be a step towards mimicking the seamless information fusion of the mammalian brain.« less

  14. Extracting remaining information from an inconclusive result in optimal unambiguous state discrimination

    NASA Astrophysics Data System (ADS)

    Zhang, Gang; Yu, Long-Bao; Zhang, Wen-Hai; Cao, Zhuo-Liang

    2014-12-01

    In unambiguous state discrimination, the measurement results consist of the error-free results and an inconclusive result, and an inconclusive result is conventionally regarded as a useless remainder from which no information about initial states is extracted. In this paper, we investigate the problem of extracting remaining information from an inconclusive result, provided that the optimal total success probability is determined. We present three simple examples. An inconclusive answer in the first two examples can be extracted partial information, while an inconclusive answer in the third one cannot be. The initial states in the third example are defined as the highly symmetric states.

  15. Construction of Green Tide Monitoring System and Research on its Key Techniques

    NASA Astrophysics Data System (ADS)

    Xing, B.; Li, J.; Zhu, H.; Wei, P.; Zhao, Y.

    2018-04-01

    As a kind of marine natural disaster, Green Tide has been appearing every year along the Qingdao Coast, bringing great loss to this region, since the large-scale bloom in 2008. Therefore, it is of great value to obtain the real time dynamic information about green tide distribution. In this study, methods of optical remote sensing and microwave remote sensing are employed in Green Tide Monitoring Research. A specific remote sensing data processing flow and a green tide information extraction algorithm are designed, according to the optical and microwave data of different characteristics. In the aspect of green tide spatial distribution information extraction, an automatic extraction algorithm of green tide distribution boundaries is designed based on the principle of mathematical morphology dilation/erosion. And key issues in information extraction, including the division of green tide regions, the obtaining of basic distributions, the limitation of distribution boundary, and the elimination of islands, have been solved. The automatic generation of green tide distribution boundaries from the results of remote sensing information extraction is realized. Finally, a green tide monitoring system is built based on IDL/GIS secondary development in the integrated environment of RS and GIS, achieving the integration of RS monitoring and information extraction.

  16. Automatic information extraction from unstructured mammography reports using distributed semantics.

    PubMed

    Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L

    2018-02-01

    To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. Improving the signal subtle feature extraction performance based on dual improved fractal box dimension eigenvectors

    NASA Astrophysics Data System (ADS)

    Chen, Xiang; Li, Jingchao; Han, Hui; Ying, Yulong

    2018-05-01

    Because of the limitations of the traditional fractal box-counting dimension algorithm in subtle feature extraction of radiation source signals, a dual improved generalized fractal box-counting dimension eigenvector algorithm is proposed. First, the radiation source signal was preprocessed, and a Hilbert transform was performed to obtain the instantaneous amplitude of the signal. Then, the improved fractal box-counting dimension of the signal instantaneous amplitude was extracted as the first eigenvector. At the same time, the improved fractal box-counting dimension of the signal without the Hilbert transform was extracted as the second eigenvector. Finally, the dual improved fractal box-counting dimension eigenvectors formed the multi-dimensional eigenvectors as signal subtle features, which were used for radiation source signal recognition by the grey relation algorithm. The experimental results show that, compared with the traditional fractal box-counting dimension algorithm and the single improved fractal box-counting dimension algorithm, the proposed dual improved fractal box-counting dimension algorithm can better extract the signal subtle distribution characteristics under different reconstruction phase space, and has a better recognition effect with good real-time performance.

  18. Weak characteristic information extraction from early fault of wind turbine generator gearbox

    NASA Astrophysics Data System (ADS)

    Xu, Xiaoli; Liu, Xiuli

    2017-09-01

    Given the weak early degradation characteristic information during early fault evolution in gearbox of wind turbine generator, traditional singular value decomposition (SVD)-based denoising may result in loss of useful information. A weak characteristic information extraction based on μ-SVD and local mean decomposition (LMD) is developed to address this problem. The basic principle of the method is as follows: Determine the denoising order based on cumulative contribution rate, perform signal reconstruction, extract and subject the noisy part of signal to LMD and μ-SVD denoising, and obtain denoised signal through superposition. Experimental results show that this method can significantly weaken signal noise, effectively extract the weak characteristic information of early fault, and facilitate the early fault warning and dynamic predictive maintenance.

  19. An information extraction framework for cohort identification using electronic health records.

    PubMed

    Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G

    2013-01-01

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.

  20. Recognition of chemical entities: combining dictionary-based and grammar-based approaches.

    PubMed

    Akhondi, Saber A; Hettne, Kristina M; van der Horst, Eelke; van Mulligen, Erik M; Kors, Jan A

    2015-01-01

    The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance.

  1. Recognition of chemical entities: combining dictionary-based and grammar-based approaches

    PubMed Central

    2015-01-01

    Background The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. Results The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. Conclusions We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance. PMID:25810767

  2. Geoparsing text for characterizing urban operational environments through machine learning techniques

    NASA Astrophysics Data System (ADS)

    Garfinkle, Noah W.; Selig, Lucas; Perkins, Timothy K.; Calfas, George W.

    2017-05-01

    Increasing worldwide internet connectivity and access to sources of print and open social media has increased near realtime availability of textual information. Capabilities to structure and integrate textual data streams can contribute to more meaningful representations of operational environment factors (i.e., Political, Military, Economic, Social, Infrastructure, Information, Physical Environment, and Time [PMESII-PT]) and tactical civil considerations (i.e., Areas, Structures, Capabilities, Organizations, People and Events [ASCOPE]). However, relying upon human analysts to encode this information as it arrives quickly proves intractable. While human analysts possess an ability to comprehend context in unstructured text far beyond that of computers, automated geoparsing (the extraction of locations from unstructured text) can empower analysts to automate sifting through datasets for areas of interest. This research evaluates existing approaches to geoprocessing as well as initiating the research and development of locally-improved methods of tagging parts of text as possible locations, resolving possible locations into coordinates, and interfacing such results with human analysts. The objective of this ongoing research is to develop a more contextually-complete picture of an area of interest (AOI) including human-geographic context for events. In particular, our research is working to make improvements to geoparsing (i.e., the extraction of spatial context from documents), which requires development, integration, and validation of named-entity recognition (NER) tools, gazetteers, and entity-attribution. This paper provides an overview of NER models and methodologies as applied to geoparsing, explores several challenges encountered, presents preliminary results from the creation of a flexible geoparsing research pipeline, and introduces ongoing and future work with the intention of contributing to the efficient geocoding of information containing valuable insights into human activities in space.

  3. Mapping ephemeral stream networks in desert environments using very-high-spatial-resolution multispectral remote sensing

    DOE PAGES

    Hamada, Yuki; O'Connor, Ben L.; Orr, Andrew B.; ...

    2016-03-26

    In this paper, understanding the spatial patterns of ephemeral streams is crucial for understanding how hydrologic processes influence the abundance and distribution of wildlife habitats in desert regions. Available methods for mapping ephemeral streams at the watershed scale typically underestimate the size of channel networks. Although remote sensing is an effective means of collecting data and obtaining information on large, inaccessible areas, conventional techniques for extracting channel features are not sufficient in regions that have small topographic gradients and subtle target-background spectral contrast. By using very high resolution multispectral imagery, we developed a new algorithm that applies landscape information tomore » map ephemeral channels in desert regions of the Southwestern United States where utility-scale solar energy development is occurring. Knowledge about landscape features and structures was integrated into the algorithm using a series of spectral transformation and spatial statistical operations to integrate information about landscape features and structures. The algorithm extracted ephemeral stream channels at a local scale, with the result that approximately 900% more ephemeral streams was identified than what were identified by using the U.S. Geological Survey’s National Hydrography Dataset. The accuracy of the algorithm in detecting channel areas was as high as 92%, and its accuracy in delineating channel center lines was 91% when compared to a subset of channel networks that were digitized by using the very high resolution imagery. Although the algorithm captured stream channels in desert landscapes across various channel sizes and forms, it often underestimated stream headwaters and channels obscured by bright soils and sparse vegetation. While further improvement is warranted, the algorithm provides an effective means of obtaining detailed information about ephemeral streams, and it could make a significant contribution toward improving the hydrological modelling of desert environments.« less

  4. Mapping ephemeral stream networks in desert environments using very-high-spatial-resolution multispectral remote sensing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hamada, Yuki; O'Connor, Ben L.; Orr, Andrew B.

    In this paper, understanding the spatial patterns of ephemeral streams is crucial for understanding how hydrologic processes influence the abundance and distribution of wildlife habitats in desert regions. Available methods for mapping ephemeral streams at the watershed scale typically underestimate the size of channel networks. Although remote sensing is an effective means of collecting data and obtaining information on large, inaccessible areas, conventional techniques for extracting channel features are not sufficient in regions that have small topographic gradients and subtle target-background spectral contrast. By using very high resolution multispectral imagery, we developed a new algorithm that applies landscape information tomore » map ephemeral channels in desert regions of the Southwestern United States where utility-scale solar energy development is occurring. Knowledge about landscape features and structures was integrated into the algorithm using a series of spectral transformation and spatial statistical operations to integrate information about landscape features and structures. The algorithm extracted ephemeral stream channels at a local scale, with the result that approximately 900% more ephemeral streams was identified than what were identified by using the U.S. Geological Survey’s National Hydrography Dataset. The accuracy of the algorithm in detecting channel areas was as high as 92%, and its accuracy in delineating channel center lines was 91% when compared to a subset of channel networks that were digitized by using the very high resolution imagery. Although the algorithm captured stream channels in desert landscapes across various channel sizes and forms, it often underestimated stream headwaters and channels obscured by bright soils and sparse vegetation. While further improvement is warranted, the algorithm provides an effective means of obtaining detailed information about ephemeral streams, and it could make a significant contribution toward improving the hydrological modelling of desert environments.« less

  5. Ultrasonic guided wave tomography of pipes: A development of new techniques for the nondestructive evaluation of cylindrical geometries and guided wave multi-mode analysis

    NASA Astrophysics Data System (ADS)

    Leonard, Kevin Raymond

    This dissertation concentrates on the development of two new tomographic techniques that enable wide-area inspection of pipe-like structures. By envisioning a pipe as a plate wrapped around upon itself, the previous Lamb Wave Tomography (LWT) techniques are adapted to cylindrical structures. Helical Ultrasound Tomography (HUT) uses Lamb-like guided wave modes transmitted and received by two circumferential arrays in a single crosshole geometry. Meridional Ultrasound Tomography (MUT) creates the same crosshole geometry with a linear array of transducers along the axis of the cylinder. However, even though these new scanning geometries are similar to plates, additional complexities arise because they are cylindrical structures. First, because it is a single crosshole geometry, the wave vector coverage is poorer than in the full LWT system. Second, since waves can travel in both directions around the circumference of the pipe, modes can also constructively and destructively interfere with each other. These complexities necessitate improved signal processing algorithms to produce accurate and unambiguous tomographic reconstructions. Consequently, this work also describes a new algorithm for improving the extraction of multi-mode arrivals from guided wave signals. Previous work has relied solely on the first arriving mode for the time-of-flight measurements. In order to improve the LWT, HUT and MUT systems reconstructions, improved signal processing methods are needed to extract information about the arrival times of the later arriving modes. Because each mode has different through-thickness displacement values, they are sensitive to different types of flaws, and the information gained from the multi-mode analysis improves understanding of the structural integrity of the inspected material. Both tomographic frequency compounding and mode sorting algorithms are introduced. It is also shown that each of these methods improve the reconstructed images both qualitatively and quantitatively.

  6. Interactive access to LP DAAC satellite data archives through a combination of open-source and custom middleware web services

    USGS Publications Warehouse

    Davis, Brian N.; Werpy, Jason; Friesz, Aaron M.; Impecoven, Kevin; Quenzer, Robert; Maiersperger, Tom; Meyer, David J.

    2015-01-01

    Current methods of searching for and retrieving data from satellite land remote sensing archives do not allow for interactive information extraction. Instead, Earth science data users are required to download files over low-bandwidth networks to local workstations and process data before science questions can be addressed. New methods of extracting information from data archives need to become more interactive to meet user demands for deriving increasingly complex information from rapidly expanding archives. Moving the tools required for processing data to computer systems of data providers, and away from systems of the data consumer, can improve turnaround times for data processing workflows. The implementation of middleware services was used to provide interactive access to archive data. The goal of this middleware services development is to enable Earth science data users to access remote sensing archives for immediate answers to science questions instead of links to large volumes of data to download and process. Exposing data and metadata to web-based services enables machine-driven queries and data interaction. Also, product quality information can be integrated to enable additional filtering and sub-setting. Only the reduced content required to complete an analysis is then transferred to the user.

  7. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  8. Building change detection via a combination of CNNs using only RGB aerial imageries

    NASA Astrophysics Data System (ADS)

    Nemoto, Keisuke; Hamaguchi, Ryuhei; Sato, Masakazu; Fujita, Aito; Imaizumi, Tomoyuki; Hikosaka, Shuhei

    2017-10-01

    Building change information extracted from remote sensing imageries is important for various applications such as urban management and marketing planning. The goal of this work is to develop a methodology for automatically capturing building changes from remote sensing imageries. Recent studies have addressed this goal by exploiting 3-D information as a proxy for building height. In contrast, because in practice it is expensive or impossible to prepare 3-D information, we do not rely on 3-D data but focus on using only RGB aerial imageries. Instead, we employ deep convolutional neural networks (CNNs) to extract effective features, and improve change detection accuracy in RGB remote sensing imageries. We consider two aspects of building change detection, building detection and subsequent change detection. Our proposed methodology was tested on several areas, which has some differences such as dominant building characteristics and varying brightness values. On all over the tested areas, the proposed method provides good results for changed objects, with recall values over 75 % with a strict overlap requirement of over 50% in intersection-over-union (IoU). When the IoU threshold was relaxed to over 10%, resulting recall values were over 81%. We conclude that use of CNNs enables accurate detection of building changes without employing 3-D information.

  9. Extractions of High Quality RNA from the Seeds of Jerusalem Artichoke and Other Plant Species with High Levels of Starch and Lipid.

    PubMed

    Mornkham, Tanupat; Wangsomnuk, Preeya Puangsomlee; Fu, Yong-Bi; Wangsomnuk, Pinich; Jogloy, Sanun; Patanothai, Aran

    2013-04-29

    Jerusalem artichoke (Helianthus tuberosus L.) is an important tuber crop. However, Jerusalem artichoke seeds contain high levels of starch and lipid, making the extraction of high-quality RNA extremely difficult and the gene expression analysis challenging. This study was aimed to improve existing methods for extracting total RNA from Jerusalem artichoke dry seeds and to assess the applicability of the improved method in other plant species. Five RNA extraction methods were evaluated on Jerusalem artichoke seeds and two were modified. One modified method with the significant improvement was applied to assay seeds of diverse Jerusalem artichoke accessions, sunflower, rice, maize, peanut and marigold. The effectiveness of the improved method to extract total RNA from seeds was assessed using qPCR analysis of four selected genes. The improved method of Ma and Yang (2011) yielded a maximum RNA solubility and removed most interfering substances. The improved protocol generated 29 to 41 µg RNA/30 mg fresh weight. An A260/A280 ratio of 1.79 to 2.22 showed their RNA purity. Extracted RNA was effective for downstream applications such as first-stranded cDNA synthesis, cDNA cloning and qPCR. The improved method was also effective to extract total RNA from seeds of sunflower, rice, maize and peanut that are rich in polyphenols, lipids and polysaccharides.

  10. Conceptualizing and assessing improvement capability: a review

    PubMed Central

    Boaden, Ruth; Walshe, Kieran

    2017-01-01

    Abstract Purpose The literature is reviewed to examine how ‘improvement capability’ is conceptualized and assessed and to identify future areas for research. Data sources An iterative and systematic search of the literature was carried out across all sectors including healthcare. The search was limited to literature written in English. Data extraction The study identifies and analyses 70 instruments and frameworks for assessing or measuring improvement capability. Information about the source of the instruments, the sectors in which they were developed or used, the measurement constructs or domains they employ, and how they were tested was extracted. Results of data synthesis The instruments and framework constructs are very heterogeneous, demonstrating the ambiguity of improvement capability as a concept, and the difficulties involved in its operationalisation. Two-thirds of the instruments and frameworks have been subject to tests of reliability and half to tests of validity. Many instruments have little apparent theoretical basis and do not seem to have been used widely. Conclusion The assessment and development of improvement capability needs clearer and more consistent conceptual and terminological definition, used consistently across disciplines and sectors. There is scope to learn from existing instruments and frameworks, and this study proposes a synthetic framework of eight dimensions of improvement capability. Future instruments need robust testing for reliability and validity. This study contributes to practice and research by presenting the first review of the literature on improvement capability across all sectors including healthcare. PMID:28992146

  11. Contribution of the New WORLDVIEW-2 Spectral Bands for Urban Mapping in Coastal Areas: Case Study SÃO LUÍS ( MARANHÃO State, Brazil)

    NASA Astrophysics Data System (ADS)

    Souza, U. D. V.; kux, H. J. H.

    2012-07-01

    The objective of this study is to verify the contribution of the spectral bands from the new WorldView-2 satellite for the extraction of urban targets aiming a detailed mapping from the city of São Luis, at the coastal zone of Maranhão State, Brazil. This satellite system has 3 bands in the visible portion of the spectrum and also the following 4 new bands: Coastal (400-450 nm), Yellow (585- 625 nm), Red Edge (705-745 nm), and Near Infrared 2 (860-1040 nm). As for the methodology used, initially a fusion was made among the panchromatic and the multispectral bands, combining the spectral information of the multispectral bands with the geometric information of the panchromatic band. Following the ortho-rectification of the dataset was done, using ground control points (GCPs) obtained during field survey. The classification reached high values of Kappa indices. The use of the new bands Red Edge and Near Infrared 2, allowed the improvement of discriminations at tidal flats, mangrove and other vegetation types. The Yellow band improved the discrimination of bare soils - very important information for urban planning - and ceramic roofs. The Coastal band allowed to map the tidal channels which cross the urban area of São Luis, a typical feature of this coastal area. The functionalities of software GEODMA used, allowed an efficient attribute selection which improved the land cover classification from the test sites. The new WorldView-2 bands permit the identification and extraction of the features mentioned, because these bands are positioned at important parts of the electromagnetic spectrum, such as band Red Edge, which strongly improves the discrimination of vegetation conditions. Combining both higher spatial and spectral resolutions, WorldView-2 data allows an improvement on the discrimination of physical characteristics of the targets of interest, thus permitting a higher precision of land use/land cover maps, contributing to urban planning. The test sites of this study represent the main problem areas involving the city of São Luis and the entire region of the Maranhão Island.

  12. Crowdsourcing the Measurement of Interstate Conflict

    PubMed Central

    2016-01-01

    Much of the data used to measure conflict is extracted from news reports. This is typically accomplished using either expert coders to quantify the relevant information or machine coders to automatically extract data from documents. Although expert coding is costly, it produces quality data. Machine coding is fast and inexpensive, but the data are noisy. To diminish the severity of this tradeoff, we introduce a method for analyzing news documents that uses crowdsourcing, supplemented with computational approaches. The new method is tested on documents about Militarized Interstate Disputes, and its accuracy ranges between about 68 and 76 percent. This is shown to be a considerable improvement over automated coding, and to cost less and be much faster than expert coding. PMID:27310427

  13. [Scientific bases for the development of functional meat products with combined biological activity].

    PubMed

    Palanca, V; Rodríguez, E; Señoráns, J; Reglero, G

    2006-01-01

    The scientific evidences on the relationship between food and health have given place to a new food market of rapid growth in the last years: the market of the functional food. Though the interest of maintaining or improving the state of health by means of the consumption of traditional food with bioactive ingredients added is undoubtedly high, the Spanish population, increasingly formed and informed, is unwilling to consume functional food, until these possess a scientific rigorous base. This article presents a review of the scientific bases that support the development of functional meat products with balanced ratio omega-6/omega-3 and a combination of synergic antioxidants, among them an extract of rosemary obtained by means of extraction with supercritical CO2.

  14. Depth extraction method with high accuracy in integral imaging based on moving array lenslet technique

    NASA Astrophysics Data System (ADS)

    Wang, Yao-yao; Zhang, Juan; Zhao, Xue-wei; Song, Li-pei; Zhang, Bo; Zhao, Xing

    2018-03-01

    In order to improve depth extraction accuracy, a method using moving array lenslet technique (MALT) in pickup stage is proposed, which can decrease the depth interval caused by pixelation. In this method, the lenslet array is moved along the horizontal and vertical directions simultaneously for N times in a pitch to get N sets of elemental images. Computational integral imaging reconstruction method for MALT is taken to obtain the slice images of the 3D scene, and the sum modulus (SMD) blur metric is taken on these slice images to achieve the depth information of the 3D scene. Simulation and optical experiments are carried out to verify the feasibility of this method.

  15. Structured prediction models for RNN based sequence labeling in clinical text.

    PubMed

    Jagannatha, Abhyuday N; Yu, Hong

    2016-11-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.

  16. Structured prediction models for RNN based sequence labeling in clinical text

    PubMed Central

    Jagannatha, Abhyuday N; Yu, Hong

    2016-01-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040

  17. The Extraction of Post-Earthquake Building Damage Informatiom Based on Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Chen, M.; Wang, X.; Dou, A.; Wu, X.

    2018-04-01

    The seismic damage information of buildings extracted from remote sensing (RS) imagery is meaningful for supporting relief and effective reduction of losses caused by earthquake. Both traditional pixel-based and object-oriented methods have some shortcoming in extracting information of object. Pixel-based method can't make fully use of contextual information of objects. Object-oriented method faces problem that segmentation of image is not ideal, and the choice of feature space is difficult. In this paper, a new stratage is proposed which combines Convolution Neural Network (CNN) with imagery segmentation to extract building damage information from remote sensing imagery. the key idea of this method includes two steps. First to use CNN to predicate the probability of each pixel and then integrate the probability within each segmentation spot. The method is tested through extracting the collapsed building and uncollapsed building from the aerial image which is acquired in Longtoushan Town after Ms 6.5 Ludian County, Yunnan Province earthquake. The results show that the proposed method indicates its effectiveness in extracting damage information of buildings after earthquake.

  18. Uncovering the information core in recommender systems

    NASA Astrophysics Data System (ADS)

    Zeng, Wei; Zeng, An; Liu, Hao; Shang, Ming-Sheng; Zhou, Tao

    2014-08-01

    With the rapid growth of the Internet and overwhelming amount of information that people are confronted with, recommender systems have been developed to effectively support users' decision-making process in online systems. So far, much attention has been paid to designing new recommendation algorithms and improving existent ones. However, few works considered the different contributions from different users to the performance of a recommender system. Such studies can help us improve the recommendation efficiency by excluding irrelevant users. In this paper, we argue that in each online system there exists a group of core users who carry most of the information for recommendation. With them, the recommender systems can already generate satisfactory recommendation. Our core user extraction method enables the recommender systems to achieve 90% of the accuracy of the top-L recommendation by taking only 20% of the users into account. A detailed investigation reveals that these core users are not necessarily the large-degree users. Moreover, they tend to select high quality objects and their selections are well diversified.

  19. Testing a Flexible Method to Reduce False Monsoon Onsets

    PubMed Central

    Stiller-Reeve, Mathew Alexander; Spengler, Thomas; Chu, Pao-Shin

    2014-01-01

    To generate information about the monsoon onset and withdrawal we have to choose a monsoon definition and apply it to data. One problem that arises is that false monsoon onsets can hamper our analysis, which is often alleviated by smoothing the data in time or space. Another problem is that local communities or stakeholder groups may define the monsoon differently. We therefore aim to develop a technique that reduces false onsets for high-resolution gridded data, while also being flexible for different requirements that can be tailored to particular end-users. In this study, we explain how we developed our technique and demonstrate how it successfully reduces false onsets and withdrawals. The presented results yield improved information about the monsoon length and its interannual variability. Due to this improvement, we are able to extract information from higher resolution data sets. This implies that we can potentially get a more detailed picture of local climate variations that can be used in more local climate application projects such as community-based adaptations. PMID:25105900

  20. U-EXTRACTION--IMPROVEMENTS IN ELIMINATION OF Mo BY USE OF FERRIC ION

    DOEpatents

    Clark, H.M.; Duffey, D.

    1958-06-10

    An improved solvent extraction process is described whereby U may be extracted by a water immiscible organic solvent from an aqueous solution of uranyl nitrate. It has been found that Mo in the presence of phosphate ions appears to form a complex with the phosphate which extracts along with the U. This extraction of Mo may be suppressed by providing ferric ion in the solution prior to the extraction step. The ferric ion is preferably provided in the form of ferric nitrate.

  1. Reproducible Tissue Homogenization and Protein Extraction for Quantitative Proteomics Using MicroPestle-Assisted Pressure-Cycling Technology.

    PubMed

    Shao, Shiying; Guo, Tiannan; Gross, Vera; Lazarev, Alexander; Koh, Ching Chiek; Gillessen, Silke; Joerger, Markus; Jochum, Wolfram; Aebersold, Ruedi

    2016-06-03

    The reproducible and efficient extraction of proteins from biopsy samples for quantitative analysis is a critical step in biomarker and translational research. Recently, we described a method consisting of pressure-cycling technology (PCT) and sequential windowed acquisition of all theoretical fragment ions-mass spectrometry (SWATH-MS) for the rapid quantification of thousands of proteins from biopsy-size tissue samples. As an improvement of the method, we have incorporated the PCT-MicroPestle into the PCT-SWATH workflow. The PCT-MicroPestle is a novel, miniaturized, disposable mechanical tissue homogenizer that fits directly into the microTube sample container. We optimized the pressure-cycling conditions for tissue lysis with the PCT-MicroPestle and benchmarked the performance of the system against the conventional PCT-MicroCap method using mouse liver, heart, brain, and human kidney tissues as test samples. The data indicate that the digestion of the PCT-MicroPestle-extracted proteins yielded 20-40% more MS-ready peptide mass from all tissues tested with a comparable reproducibility when compared to the conventional PCT method. Subsequent SWATH-MS analysis identified a higher number of biologically informative proteins from a given sample. In conclusion, we have developed a new device that can be seamlessly integrated into the PCT-SWATH workflow, leading to increased sample throughput and improved reproducibility at both the protein extraction and proteomic analysis levels when applied to the quantitative proteomic analysis of biopsy-level samples.

  2. How the variance of some extraction variables may affect the quality of espresso coffees served in coffee shops.

    PubMed

    Severini, Carla; Derossi, Antonio; Fiore, Anna G; De Pilli, Teresa; Alessandrino, Ofelia; Del Mastro, Arcangela

    2016-07-01

    To improve the quality of espresso coffee, the variables under the control of the barista, such as grinding grade, coffee quantity and pressure applied to the coffee cake, as well as their variance, are of great importance. A nonlinear mixed effect modeling was used to obtain information on the changes in chemical attributes of espresso coffee (EC) as a function of the variability of extraction conditions. During extraction, the changes in volume were well described by a logistic model, whereas the chemical attributes were better fit by a first-order kinetic. The major source of information was contained in the grinding grade, which accounted for 87-96% of the variance of the experimental data. The variability of the grinding produced changes in caffeine content in the range of 80.03 mg and 130.36 mg when using a constant grinding grade of 6.5. The variability in volume and chemical attributes of EC is large. Grinding had the most important effect as the variability in particle size distribution observed for each grinding level had a profound effect on the quality of EC. Standardization of grinding would be of crucial importance for obtaining all espresso coffees with a high quality. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  3. Content Analysis of Student Essays after Attending a Problem-Based Learning Course: Facilitating the Development of Critical Thinking and Communication Skills in Japanese Nursing Students.

    PubMed

    Itatani, Tomoya; Nagata, Kyoko; Yanagihara, Kiyoko; Tabuchi, Noriko

    2017-08-22

    The importance of active learning has continued to increase in Japan. The authors conducted classes for first-year students who entered the nursing program using the problem-based learning method which is a kind of active learning. Students discussed social topics in classes. The purposes of this study were to analyze the post-class essay, describe logical and critical thinking after attended a Problem-Based Learning (PBL) course. The authors used Mayring's methodology for qualitative content analysis and text mining. In the description about the skills required to resolve social issues, seven categories were extracted: (recognition of diverse social issues), (attitudes about resolving social issues), (discerning the root cause), (multi-lateral information processing skills), (making a path to resolve issues), (processivity in dealing with issues), and (reflecting). In the description about communication, five categories were extracted: (simple statement), (robust theories), (respecting the opponent), (communication skills), and (attractive presentations). As the result of text mining, the words extracted more than 100 times included "issue," "society," "resolve," "myself," "ability," "opinion," and "information." Education using PBL could be an effective means of improving skills that students described, and communication in general. Some students felt difficulty of communication resulting from characteristics of Japanese.

  4. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

    PubMed

    Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi

    2015-09-01

    Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. [Determination of soluble solids content in Nanfeng Mandarin by Vis/NIR spectroscopy and UVE-ICA-LS-SVM].

    PubMed

    Sun, Tong; Xu, Wen-Li; Hu, Tian; Liu, Mu-Hua

    2013-12-01

    The objective of the present research was to assess soluble solids content (SSC) of Nanfeng mandarin by visible/near infrared (Vis/NIR) spectroscopy combined with new variable selection method, simplify prediction model and improve the performance of prediction model for SSC of Nanfeng mandarin. A total of 300 Nanfeng mandarin samples were used, the numbers of Nanfeng mandarin samples in calibration, validation and prediction sets were 150, 75 and 75, respectively. Vis/NIR spectra of Nanfeng mandarin samples were acquired by a QualitySpec spectrometer in the wavelength range of 350-1000 nm. Uninformative variables elimination (UVE) was used to eliminate wavelength variables that had few information of SSC, then independent component analysis (ICA) was used to extract independent components (ICs) from spectra that eliminated uninformative wavelength variables. At last, least squares support vector machine (LS-SVM) was used to develop calibration models for SSC of Nanfeng mandarin using extracted ICs, and 75 prediction samples that had not been used for model development were used to evaluate the performance of SSC model of Nanfeng mandarin. The results indicate t hat Vis/NIR spectroscopy combinedwith UVE-ICA-LS-SVM is suitable for assessing SSC o f Nanfeng mandarin, and t he precision o f prediction ishigh. UVE--ICA is an effective method to eliminate uninformative wavelength variables, extract important spectral information, simplify prediction model and improve the performance of prediction model. The SSC model developed by UVE-ICA-LS-SVM is superior to that developed by PLS, PCA-LS-SVM or ICA-LS-SVM, and the coefficient of determination and root mean square error in calibration, validation and prediction sets were 0.978, 0.230%, 0.965, 0.301% and 0.967, 0.292%, respectively.

  6. Requirement of scientific documentation for the development of Naturopathy.

    PubMed

    Rastogi, Rajiv

    2006-01-01

    Past few decades have witnessed explosion of knowledge in almost every field. This has resulted not only in the advancement of the subjects in particular but also have influenced the growth of various allied subjects. The present paper explains about the advancement of science through efforts made in specific areas and also through discoveries in different allied fields having an indirect influence upon the subject in proper. In Naturopathy this seems that though nothing particular is added to the basic thoughts or fundamental principles of the subject yet the entire treatment understanding is revolutionised under the influence of scientific discoveries of past few decades. Advent of information technology has further added to the boom of knowledge and many times this seems impossible to utilize these informations for the good of human being because these are not logically arranged in our minds. In the above background, the author tries to define documentation stating that we have today ocean of information and knowledge about various things- living or dead, plants, animals or human beings; the geographical conditions or changing weather and environment. What required to be done is to extract the relevant knowledge and information required to enrich the subject. The author compares documentation with churning of milk to extract butter. Documentation, in fact, is churning of ocean of information to extract the specific, most appropriate, relevant and defined information and knowledge related to the particular subject . The paper besides discussing the definition of documentation, highlights the areas of Naturopathy requiring an urgent necessity to make proper documentations. Paper also discusses the present status of Naturopathy in India, proposes short-term and long-term goals to be achieved and plans the strategies for achieving them. The most important aspect of the paper is due understanding of the limitations of Naturopathy but a constant effort to improve the same with the growth made in various discipline of science so far.

  7. The research of road and vehicle information extraction algorithm based on high resolution remote sensing image

    NASA Astrophysics Data System (ADS)

    Zhou, Tingting; Gu, Lingjia; Ren, Ruizhi; Cao, Qiong

    2016-09-01

    With the rapid development of remote sensing technology, the spatial resolution and temporal resolution of satellite imagery also have a huge increase. Meanwhile, High-spatial-resolution images are becoming increasingly popular for commercial applications. The remote sensing image technology has broad application prospects in intelligent traffic. Compared with traditional traffic information collection methods, vehicle information extraction using high-resolution remote sensing image has the advantages of high resolution and wide coverage. This has great guiding significance to urban planning, transportation management, travel route choice and so on. Firstly, this paper preprocessed the acquired high-resolution multi-spectral and panchromatic remote sensing images. After that, on the one hand, in order to get the optimal thresholding for image segmentation, histogram equalization and linear enhancement technologies were applied into the preprocessing results. On the other hand, considering distribution characteristics of road, the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) were used to suppress water and vegetation information of preprocessing results. Then, the above two processing result were combined. Finally, the geometric characteristics were used to completed road information extraction. The road vector extracted was used to limit the target vehicle area. Target vehicle extraction was divided into bright vehicles extraction and dark vehicles extraction. Eventually, the extraction results of the two kinds of vehicles were combined to get the final results. The experiment results demonstrated that the proposed algorithm has a high precision for the vehicle information extraction for different high resolution remote sensing images. Among these results, the average fault detection rate was about 5.36%, the average residual rate was about 13.60% and the average accuracy was approximately 91.26%.

  8. Application of Mls Data to the Assessment of Safety-Related Features in the Surrounding Area of Automatically Detected Pedestrian Crossings

    NASA Astrophysics Data System (ADS)

    Soilán, M.; Riveiro, B.; Sánchez-Rodríguez, A.; González-deSantos, L. M.

    2018-05-01

    During the last few years, there has been a huge methodological development regarding the automatic processing of 3D point cloud data acquired by both terrestrial and aerial mobile mapping systems, motivated by the improvement of surveying technologies and hardware performance. This paper presents a methodology that, in a first place, extracts geometric and semantic information regarding the road markings within the surveyed area from Mobile Laser Scanning (MLS) data, and then employs it to isolate street areas where pedestrian crossings are found and, therefore, pedestrians are more likely to cross the road. Then, different safety-related features can be extracted in order to offer information about the adequacy of the pedestrian crossing regarding its safety, which can be displayed in a Geographical Information System (GIS) layer. These features are defined in four different processing modules: Accessibility analysis, traffic lights classification, traffic signs classification, and visibility analysis. The validation of the proposed methodology has been carried out in two different cities in the northwest of Spain, obtaining both quantitative and qualitative results for pedestrian crossing classification and for each processing module of the safety assessment on pedestrian crossing environments.

  9. Infrared Spectroscopic Imaging: The Next Generation

    PubMed Central

    Bhargava, Rohit

    2013-01-01

    Infrared (IR) spectroscopic imaging seemingly matured as a technology in the mid-2000s, with commercially successful instrumentation and reports in numerous applications. Recent developments, however, have transformed our understanding of the recorded data, provided capability for new instrumentation, and greatly enhanced the ability to extract more useful information in less time. These developments are summarized here in three broad areas— data recording, interpretation of recorded data, and information extraction—and their critical review is employed to project emerging trends. Overall, the convergence of selected components from hardware, theory, algorithms, and applications is one trend. Instead of similar, general-purpose instrumentation, another trend is likely to be diverse and application-targeted designs of instrumentation driven by emerging component technologies. The recent renaissance in both fundamental science and instrumentation will likely spur investigations at the confluence of conventional spectroscopic analyses and optical physics for improved data interpretation. While chemometrics has dominated data processing, a trend will likely lie in the development of signal processing algorithms to optimally extract spectral and spatial information prior to conventional chemometric analyses. Finally, the sum of these recent advances is likely to provide unprecedented capability in measurement and scientific insight, which will present new opportunities for the applied spectroscopist. PMID:23031693

  10. Multiplexed Sequence Encoding: A Framework for DNA Communication

    PubMed Central

    Zakeri, Bijan; Carr, Peter A.; Lu, Timothy K.

    2016-01-01

    Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication—data encoding, data transfer & data extraction—and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system—Multiplexed Sequence Encoding (MuSE)—that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA. PMID:27050646

  11. A comparison of machine learning techniques for detection of drug target articles.

    PubMed

    Danger, Roxana; Segura-Bedmar, Isabel; Martínez, Paloma; Rosso, Paolo

    2010-12-01

    Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure. Copyright © 2010 Elsevier Inc. All rights reserved.

  12. Improved collagen extraction from jellyfish (Acromitus hardenbergi) with increased physical-induced solubilization processes.

    PubMed

    Khong, Nicholas M H; Yusoff, Fatimah Md; Jamilah, B; Basri, Mahiran; Maznah, I; Chan, Kim Wei; Armania, Nurdin; Nishikawa, Jun

    2018-06-15

    Efficiency and effectiveness of collagen extraction process contribute to huge impacts to the quality, supply and cost of the collagen produced. Jellyfish is a potential sustainable source of collagen where their applications are not limited by religious constraints and threats of transmittable diseases. The present study compared the extraction yield, physico-chemical properties and toxicology in vitro of collagens obtained by the conventional acid-assisted and pepsin-assisted extraction to an improved physical-aided extraction process. By increasing physical intervention, the production yield increased significantly compared to the conventional extraction processes (p < .05). Collagen extracted using the improved process was found to possess similar proximate and amino acids composition to those extracted using pepsin (p > .05) while retaining high molecular weight distributions and polypeptide profiles similar to those extracted using only acid. Moreover, they exhibited better appearance, instrumental colour and were found to be non-toxic in vitro and free of heavy metal contamination. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Study on Landslide Disaster Extraction Method Based on Spaceborne SAR Remote Sensing Images - Take Alos Palsar for AN Example

    NASA Astrophysics Data System (ADS)

    Xue, D.; Yu, X.; Jia, S.; Chen, F.; Li, X.

    2018-04-01

    In this paper, sequence ALOS PALSAR data and airborne SAR data of L-band from June 5, 2008 to September 8, 2015 are used. Based on the research of SAR data preprocessing and core algorithms, such as geocode, registration, filtering, unwrapping and baseline estimation, the improved Goldstein filtering algorithm and the branch-cut path tracking algorithm are used to unwrap the phase. The DEM and surface deformation information of the experimental area were extracted. Combining SAR-specific geometry and differential interferometry, on the basis of composite analysis of multi-source images, a method of detecting landslide disaster combining coherence of SAR image is developed, which makes up for the deficiency of single SAR and optical remote sensing acquisition ability. Especially in bad weather and abnormal climate areas, the speed of disaster emergency and the accuracy of extraction are improved. It is found that the deformation in this area is greatly affected by faults, and there is a tendency of uplift in the southeast plain and western mountainous area, while in the southwest part of the mountain area there is a tendency to sink. This research result provides a basis for decision-making for local disaster prevention and control.

  14. Research on Crowdsourcing Emergency Information Extraction of Based on Events' Frame

    NASA Astrophysics Data System (ADS)

    Yang, Bo; Wang, Jizhou; Ma, Weijun; Mao, Xi

    2018-01-01

    At present, the common information extraction method cannot extract the structured emergency event information accurately; the general information retrieval tool cannot completely identify the emergency geographic information; these ways also do not have an accurate assessment of these results of distilling. So, this paper proposes an emergency information collection technology based on event framework. This technique is to solve the problem of emergency information picking. It mainly includes emergency information extraction model (EIEM), complete address recognition method (CARM) and the accuracy evaluation model of emergency information (AEMEI). EIEM can be structured to extract emergency information and complements the lack of network data acquisition in emergency mapping. CARM uses a hierarchical model and the shortest path algorithm and allows the toponomy pieces to be joined as a full address. AEMEI analyzes the results of the emergency event and summarizes the advantages and disadvantages of the event framework. Experiments show that event frame technology can solve the problem of emergency information drawing and provides reference cases for other applications. When the emergency disaster is about to occur, the relevant departments query emergency's data that has occurred in the past. They can make arrangements ahead of schedule which defense and reducing disaster. The technology decreases the number of casualties and property damage in the country and world. This is of great significance to the state and society.

  15. Exploring perceptions of policymakers about main strategies to enhance fertility rate: A qualitative study in Iran

    PubMed Central

    Haghdoost, Ali Akbar; Safari-Faramani, Roya; Baneshi, Mohammad Reza; Dehnavieh, Reza; Dehghan, Mahlegha

    2017-01-01

    Background Total fertility rate in Iran has declined unprecedentedly over the past thirty years. However, debating on proper strategies to increase fertility is still a matter of discussion among experts. Objective To explain the main strategies to increase fertility from the viewpoints of the policy makers. Methods This is a qualitative study using content analysis. Purposeful sampling approach was used to gather data. The data were collected via semi-structured interviews. Eight experts participated in the study and the main criteria were executive experience related to public health, scientific publication in these areas and availability as well as their own interest. Content analysis was used to extract the codes. Results The main theme extracted was improving the infrastructures. Almost all participants agreed on interventions around removing marriage obstacles, improving working conditions for women, improving the quality of the educational system, training and consultation, research, and improving services to increase fertility rate. Conclusions The government should formulate long-term instead of short-term policies, and note that improving the economic conditions along with the promotion of social welfare, and enabling women in balancing work and family, are highly influential in childbearing decision-making, as they ensure a better future for the next generation. In addition, people should touch on the potential risk of future fertility reduction, so it is suggested to inform the public through free discussions. PMID:29238499

  16. Managing interoperability and complexity in health systems.

    PubMed

    Bouamrane, M-M; Tao, C; Sarkar, I N

    2015-01-01

    In recent years, we have witnessed substantial progress in the use of clinical informatics systems to support clinicians during episodes of care, manage specialised domain knowledge, perform complex clinical data analysis and improve the management of health organisations' resources. However, the vision of fully integrated health information eco-systems, which provide relevant information and useful knowledge at the point-of-care, remains elusive. This journal Focus Theme reviews some of the enduring challenges of interoperability and complexity in clinical informatics systems. Furthermore, a range of approaches are proposed in order to address, harness and resolve some of the many remaining issues towards a greater integration of health information systems and extraction of useful or new knowledge from heterogeneous electronic data repositories.

  17. Blood perfusion construction for infrared face recognition based on bio-heat transfer.

    PubMed

    Xie, Zhihua; Liu, Guodong

    2014-01-01

    To improve the performance of infrared face recognition for time-lapse data, a new construction of blood perfusion is proposed based on bio-heat transfer. Firstly, by quantifying the blood perfusion based on Pennes equation, the thermal information is converted into blood perfusion rate, which is stable facial biological feature of face image. Then, the separability discriminant criterion in Discrete Cosine Transform (DCT) domain is applied to extract the discriminative features of blood perfusion information. Experimental results demonstrate that the features of blood perfusion are more concentrative and discriminative for recognition than those of thermal information. The infrared face recognition based on the proposed blood perfusion is robust and can achieve better recognition performance compared with other state-of-the-art approaches.

  18. Learning to research in first grade: Bridging the transition from narrative to expository texts and tasks

    NASA Astrophysics Data System (ADS)

    Weise, Richard

    Decades of research indicate that students at all academic grade and performance levels perform poorly with informational texts and tasks and particularly with locating assignment-relevant information in expository texts. Students have little understanding of the individual tasks required, the arc of the activity, the hierarchical structure of the information they seek, or how to reconstitute and interpret the information they extract. Poor performance begins with the introduction of textbooks and research assignments in fourth grade and continues into adulthood. However, to date, neither educators nor researchers have substantially addressed this problem. In this quasi-experimental study, we ask if first-grade children can perform essential tasks in identifying, extracting, and integrating assignment-relevant information and if instruction improves their performance. To answer this question, we conducted a 15-week, teacher-led, intervention in two first-grade classrooms in an inner-city Nashville elementary school. We created a computer learning environment (NoteTaker) to facilitate children's creation of a mental model of the research process and a narrative/expository bridge curriculum to support the children's transition from all narrative to all expository texts and tasks. We also created a new scaffolding taxonomy and a reading-to-research model to focus our research. Teachers participated in weekly professional development workshops. The results of this quasi-experimental study indicate that at-risk, first-grade children are able to (a) identify relevant information in an expository text, (b) categorize the information they identify, and (c) justify their choice of category. Children's performance in the first and last tasks significantly improved with instruction, and low-performing readers showed the greatest benefits from instruction. We find that the children's performance in categorizing information depended upon content-specific knowledge that was not taught, as well as on the process knowledge that was taught. We also find that children's narrative reading performance predicted their initial-performance for each assessment measure. We argue that first-grade children are developmentally ready to read expository texts and to learn reading-to-research tasks and that primary-school literacy instruction should not be limited to reading and writing stories.

  19. An Information Extraction Framework for Cohort Identification Using Electronic Health Records

    PubMed Central

    Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255

  20. Improving the two-step remediation process for CCA-treated wood. Part I, Evaluating oxalic acid extraction

    Treesearch

    Carol Clausen

    2004-01-01

    In this study, three possible improvements to a remediation process for chromated-copper-arsenate (CCA) treated wood were evaluated. The process involves two steps: oxalic acid extraction of wood fiber followed by bacterial culture with Bacillus licheniformis CC01. The three potential improvements to the oxalic acid extraction step were (1) reusing oxalic acid for...

  1. Table Extraction from Web Pages Using Conditional Random Fields to Extract Toponym Related Data

    NASA Astrophysics Data System (ADS)

    Luthfi Hanifah, Hayyu'; Akbar, Saiful

    2017-01-01

    Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.

  2. FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.

    PubMed

    Bachman, John A; Gyori, Benjamin M; Sorger, Peter K

    2018-06-28

    For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.

  3. Sortal anaphora resolution to enhance relation extraction from biomedical literature.

    PubMed

    Kilicoglu, Halil; Rosemblat, Graciela; Fiszman, Marcelo; Rindflesch, Thomas C

    2016-04-14

    Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.

  4. Evaluation of petroleum heterogeneities in the Eldfish Chalk reservoir using thermal extraction and pyrolysis GC and thermal extraction GC-MS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hall, P.B.; Bjory, M.; Stoddart, D.

    1993-09-01

    Petroleum geochemical research is being applied more and more to production-oriented problems such as improving recovery of oil from existing fields and identifying satellite fields. Recent studies by ourselves and others have suggested that compositional heterogeneities within the petroleum column of reservoirs can provide information relevant to these objectives. Some of the best data for distinguishing petroleum [open quotes]populations[close quotes] in reservoirs comes from conventional geochemical techniques, some of which are very time consuming (e.g., solvent extraction, EOM fractionation, and GC and GC-MS of saturated and aromatic hydrocarbon fractions). Furthermore, a large number of samples need to be analyzed onmore » order to detect the often slight heterogeneities present in a reservoir. The techniques employed in this study (particularly thermal extraction and pyrolysis GC and thermal extraction GC-MS) can be used to generate large amounts of relevant data quickly. In this study, data were acquired on core samples ([approximately]750) from the Paleocene/Cretaceous Eldfish Chalk reservoir (2/7 block Norwegian North Sea). The acquired data reveal distinct maturity gradients and also small, but still significant, variations in source of the oil within the different Eldfisk oil pools (Alpha, Bravo, and East). This indicates incomplete mixing of the hydrocarbons. These variations have been used to delineate lateral and vertical barriers to petroleum mixing. These observations, when integrated with geological data, can lead to an improved understanding of the possible field filling directions and the intrareservoir compartmentalization. The conclusions from this may also aid in exploration for satellite fields as well as influencing the production plan for the Eldfisk field.« less

  5. Comparison of the application of B-mode and strain elastography ultrasound in the estimation of lymph node metastasis of papillary thyroid carcinoma based on a radiomics approach.

    PubMed

    Liu, Tongtong; Ge, Xifeng; Yu, Jinhua; Guo, Yi; Wang, Yuanyuan; Wang, Wenping; Cui, Ligang

    2018-06-21

    B-mode ultrasound (B-US) and strain elastography ultrasound (SE-US) images have a potential to distinguish thyroid tumor with different lymph node (LN) status. The purpose of our study is to investigate whether the application of multi-modality images including B-US and SE-US can improve the discriminability of thyroid tumor with LN metastasis based on a radiomics approach. Ultrasound (US) images including B-US and SE-US images of 75 papillary thyroid carcinoma (PTC) cases were retrospectively collected. A radiomics approach was developed in this study to estimate LNs status of PTC patients. The approach included image segmentation, quantitative feature extraction, feature selection and classification. Three feature sets were extracted from B-US, SE-US, and multi-modality containing B-US and SE-US. They were used to evaluate the contribution of different modalities. A total of 684 radiomics features have been extracted in our study. We used sparse representation coefficient-based feature selection method with 10-bootstrap to reduce the dimension of feature sets. Support vector machine with leave-one-out cross-validation was used to build the model for estimating LN status. Using features extracted from both B-US and SE-US, the radiomics-based model produced an area under the receiver operating characteristic curve (AUC) [Formula: see text] 0.90, accuracy (ACC) [Formula: see text] 0.85, sensitivity (SENS) [Formula: see text] 0.77 and specificity (SPEC) [Formula: see text] 0.88, which was better than using features extracted from B-US or SE-US separately. Multi-modality images provided more information in radiomics study. Combining use of B-US and SE-US could improve the LN metastasis estimation accuracy for PTC patients.

  6. [A customized method for information extraction from unstructured text data in the electronic medical records].

    PubMed

    Bao, X Y; Huang, W J; Zhang, K; Jin, M; Li, Y; Niu, C Z

    2018-04-18

    There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing. Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method. For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94. This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.

  7. Building a diabetes screening population data repository using electronic medical records.

    PubMed

    Tuan, Wen-Jan; Sheehy, Ann M; Smith, Maureen A

    2011-05-01

    There has been a rapid advancement of information technology in the area of clinical and population health data management since 2000. However, with the fast growth of electronic medical records (EMRs) and the increasing complexity of information systems, it has become challenging for researchers to effectively access, locate, extract, and analyze information critical to their research. This article introduces an outpatient encounter data framework designed to construct an EMR-based population data repository for diabetes screening research. The outpatient encounter data framework is developed on a hybrid data structure of entity-attribute-value models, dimensional models, and relational models. This design preserves a small number of subject-specific tables essential to key clinical constructs in the data repository. It enables atomic information to be maintained in a transparent and meaningful way to researchers and health care practitioners who need to access data and still achieve the same performance level as conventional data warehouse models. A six-layer information processing strategy is developed to extract and transform EMRs to the research data repository. The data structure also complies with both Health Insurance Portability and Accountability Act regulations and the institutional review board's requirements. Although developed for diabetes screening research, the design of the outpatient encounter data framework is suitable for other types of health service research. It may also provide organizations a tool to improve health care quality and efficiency, consistent with the "meaningful use" objectives of the Health Information Technology for Economic and Clinical Health Act. © 2011 Diabetes Technology Society.

  8. [The application of spectral geological profile in the alteration mapping].

    PubMed

    Li, Qing-Ting; Lin, Qi-Zhong; Zhang, Bing; Lu, Lin-Lin

    2012-07-01

    Geological section can help validating and understanding of the alteration information which is extracted from remote sensing images. In the paper, the concept of spectral geological profile was introduced based on the principle of geological section and the method of spectral information extraction. The spectral profile can realize the storage and vision of spectra along the geological profile, but the spectral geological spectral profile includes more information besides the information of spectral profile. The main object of spectral geological spectral profile is to obtain the distribution of alteration types and content of minerals along the profile which can be extracted from spectra measured by field spectrometer, especially for the spatial distribution and mode of alteration association. Technical method and work flow of alteration information extraction was studied for the spectral geological profile. The spectral geological profile was set up using the ground reflectance spectra and the alteration information was extracted from the remote sensing image with the help of typical spectra geological profile. At last the meaning and effect of the spectral geological profile was discussed.

  9. The Patient-Reported Information Multidimensional Exploration (PRIME) Framework for Investigating Emotions and Other Factors of Prostate Cancer Patients with Low Intermediate Risk Based on Online Cancer Support Group Discussions.

    PubMed

    Bandaragoda, Tharindu; Ranasinghe, Weranja; Adikari, Achini; de Silva, Daswin; Lawrentschuk, Nathan; Alahakoon, Damminda; Persad, Raj; Bolton, Damien

    2018-06-01

    This study aimed to use the Patient Reported Information Multidimensional Exploration (PRIME) framework, a novel ensemble of machine-learning and deep-learning algorithms, to extract, analyze, and correlate self-reported information from Online Cancer Support Groups (OCSG) by patients (and partners of patients) with low intermediate-risk prostate cancer (PCa) undergoing radical prostatectomy (RP), external beam radiotherapy (EBRT), and active surveillance (AS), and to investigate its efficacy in quality-of-life (QoL) and emotion measures. From patient-reported information on 10 OCSG, the PRIME framework automatically filtered and extracted conversations on low intermediate-risk PCa with active user participation. Side effects as well as emotional and QoL outcomes for 6084 patients were analyzed. Side-effect profiles differed between the methods analyzed, with men after RP having more urinary and sexual side effects and men after EBRT having more bowel symptoms. Key findings from the analysis of emotional expressions showed that PCa patients younger than 40 years expressed significantly high positive and negative emotions compared with other age groups, that partners of patients expressed more negative emotions than the patients, and that selected cohorts (< 40 years, > 70 years, partners of patients) have frequently used the same terms to express their emotions, which is indicative of QoL issues specific to those cohorts. Despite recent advances in patient-centerd care, patient emotions are largely overlooked, especially in younger men with a diagnosis of PCa and their partners. The authors present a novel approach, the PRIME framework, to extract, analyze, and correlate key patient factors. This framework improves understanding of QoL and identifies low intermediate-risk PCa patients who require additional support.

  10. Feasibility of extracting data from electronic medical records for research: an international comparative study.

    PubMed

    van Velthoven, Michelle Helena; Mastellos, Nikolaos; Majeed, Azeem; O'Donoghue, John; Car, Josip

    2016-07-13

    Electronic medical records (EMR) offer a major potential for secondary use of data for research which can improve the safety, quality and efficiency of healthcare. They also enable the measurement of disease burden at the population level. However, the extent to which this is feasible in different countries is not well known. This study aimed to: 1) assess information governance procedures for extracting data from EMR in 16 countries; and 2) explore the extent of EMR adoption and the quality and consistency of EMR data in 7 countries, using management of diabetes type 2 patients as an exemplar. We included 16 countries from Australia, Asia, the Middle East, and Europe to the Americas. We undertook a multi-method approach including both an online literature review and structured interviews with 59 stakeholders, including 25 physicians, 23 academics, 7 EMR providers, and 4 information commissioners. Data were analysed and synthesised thematically considering the most relevant issues. We found that procedures for information governance, levels of adoption and data quality varied across the countries studied. The required time and ease of obtaining approval also varies widely. While some countries seem ready for secondary uses of data from EMR, in other countries several barriers were found, including limited experience with using EMR data for research, lack of standard policies and procedures, bureaucracy, confidentiality, data security concerns, technical issues and costs. This is the first international comparative study to shed light on the feasibility of extracting EMR data across a number of countries. The study will inform future discussions and development of policies that aim to accelerate the adoption of EMR systems in high and middle income countries and seize the rich potential for secondary use of data arising from the use of EMR solutions.

  11. Analysis of crude oil markets with improved multiscale weighted permutation entropy

    NASA Astrophysics Data System (ADS)

    Niu, Hongli; Wang, Jun; Liu, Cheng

    2018-03-01

    Entropy measures are recently extensively used to study the complexity property in nonlinear systems. Weighted permutation entropy (WPE) can overcome the ignorance of the amplitude information of time series compared with PE and shows a distinctive ability to extract complexity information from data having abrupt changes in magnitude. Improved (or sometimes called composite) multi-scale (MS) method possesses the advantage of reducing errors and improving the accuracy when applied to evaluate multiscale entropy values of not enough long time series. In this paper, we combine the merits of WPE and improved MS to propose the improved multiscale weighted permutation entropy (IMWPE) method for complexity investigation of a time series. Then it is validated effective through artificial data: white noise and 1 / f noise, and real market data of Brent and Daqing crude oil. Meanwhile, the complexity properties of crude oil markets are explored respectively of return series, volatility series with multiple exponents and EEMD-produced intrinsic mode functions (IMFs) which represent different frequency components of return series. Moreover, the instantaneous amplitude and frequency of Brent and Daqing crude oil are analyzed by the Hilbert transform utilized to each IMF.

  12. Nutraceuticals to promote neuronal plasticity in response to corticosterone-induced stress in human neuroblastoma cells.

    PubMed

    Gite, Snehal; Ross, R Paul; Kirke, Dara; Guihéneuf, Freddy; Aussant, Justine; Stengel, Dagmar B; Dinan, Timothy G; Cryan, John F; Stanton, Catherine

    2018-01-29

    To search for novel compounds that will protect neuronal cells under stressed conditions that may help to restore neuronal plasticity. A model of corticosterone (CORT)-induced stress in human neuroblastoma cells (SH-SY5Y) was used to compare the efficacy of 6 crude extracts and 10 pure compounds (6 polyphenols, 2 carotenoids, 1 amino acid analogue, and 1 known antidepressant drug) to increase neuronal plasticity and to decrease cytotoxicity. Astaxanthin (among pure compounds) and phlorotannin extract of Fucus vesiculosus (among crude extracts) showed a maximum increase in cell viability in the presence of excess CORT. BDNF-VI mRNA expression in SH-SY5Y cells was significantly improved by pretreatment with quercetine, astaxanthin, curcumin, fisetin, and resveratrol. Among crude extracts, xanthohumol, phlorotannin extract (Ecklonia cava), petroleum ether extract (Nannochloropsis oculata), and phlorotannin extract (F. vesiculosus) showed a significant increase in BDNF-VI mRNA expression. CREB1 mRNA expression was significantly improved by astaxanthin, β-carotene, curcumin, and fluoxetine whereas none of the crude extracts caused significant improvement. As an adjunct of fluoxetine, phlorotannin extract (F. vesiculosus), β-carotene, and xanthohumol have resulted in significant improvement in BDNF-VI mRNA expression and CREB1 mRNA expression was significantly improved by phlorotannin extract (F. vesiculosus). Significant improvement in mature BDNF protein expression by phlorotannin extract (F. vesiculosus) and β-carotene as an adjunct of fluoxetine confirm their potential to promote neuronal plasticity against CORT-induced stress. The carotenoids, flavonoids, namely quercetine, curcumin, and low molecular weight phlorotannin-enriched extract of F. vesiculosus may serve as potential neuroprotective agents promoting neuronal plasticity in vitro. Graphical abstract: Cascade of events associated with disturbed homeostatic balance of glucocorticoids and impact of phlorotannin extract (F. vesiculosus) and β-carotene in restoring neuronal plasticity. Abbreviation: TrKB, tropomyosin receptor kinase B; P-ERK, phosphorylated extracellular signal-related kinase; PI3K, phosphatidylinositol 3-kinase; Akt, protein kinase B; Ca++/CaMK, calcium/calmodulin-dependent protein kinase; pCREB, phosphorylated cAMP response element-binding protein; CRE, cAMP response elements, CORT, corticosterone; and BDNF; brain-derived neurotrophic factor.

  13. Considering context: reliable entity networks through contextual relationship extraction

    NASA Astrophysics Data System (ADS)

    David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.

    2016-05-01

    Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.

  14. Evaluation of multiband photography for rock discrimination

    NASA Technical Reports Server (NTRS)

    Raines, G. L.

    1974-01-01

    An evaluation is presented of the multiband photography concept that tonal differences between rock formations on aerial photography can be improved through the selection of the appropriate bands. The concept involves: (1) acquiring band reference data for the rocks being considered; (2) selecting the best combination of bands to discriminate the rocks using these reference data; (3) acquiring aerial photography using these selected bands; and (4) extracting the desired geologic information in an optimum manner. The test site geology and rock reflectance are discussed in detail. The evaluation found that the differences in contrast ratios are not statistically significant, and the spectral information in different bands is not advantageous.

  15. Development of a Mobile Application for Disaster Information and Response

    NASA Astrophysics Data System (ADS)

    Stollberg, B.

    2012-04-01

    The Joint Research Centre (JRC) of the European Commission (EC) started exploring current technology and internet trends in order to answer the question if post-disaster situation awareness can be improved by community involvement. An exploratory research project revolves around the development of an iPhone App to provide users with real-time information about disasters and give them the possibility to send information in the form of a geo-located image and/or text back. Targeted users include professional emergency responders of the Global Disaster Alert and Coordination System (GDACS), as well as general users affected by disasters. GDACS provides global multi-hazard disaster monitoring and alerting for earthquakes, tsunamis, tropical cyclones, floods and volcanoes. It serves to consolidate and improve the dissemination of disaster-related information, in order to improve the coordination of international relief efforts. The goal of the exploratory research project is to extract and feedback useful information from reports shared by the community for improving situation awareness and providing ground truth for rapid satellite-based mapping. From a technological point of view, JRC is focusing on interoperability of field reporting software and is working with several organizations to develop standards and reference implementations of an interoperable mobile information platform. The iPhone App developed by JRC provides on one hand information about GDACS alerts and on the other hand the possibility for the users to send reports about a selected disaster back to JRC. iPhones are equipped with a camera and (apart from the very first model) a GPS receiver. This offers the possibility to transmit pictures and also the location for every sent report. A test has shown that the accuracy of the location can be expected to be in the range of 50 meters (iPhone 3GS) and respectively 5 meters (iPhone 4). For this reason pictures sent by the new iPhone generation can be very well geo-located. Sent reports are automatically integrated into the Spatial Data Infrastructure (SDI) at JRC. The data are stored in a PostGIS database and shared through GeoServer. GeoServer allows users to view and edit geospatial data using open standards like Web Map Server (WMS), Web Feature Server (WFS), GeoRSS, KML and so on. For the visualization of the submitted data the KML format is used and displayed in the JRC Web Map Viewer. Since GeoServer has an integrated filter possibility, the reports can here easily be filtered by event, date or user. So far the App was used internally during an international emergency field exercise (Carpathex 2011, Poland). Based on the feedback provided by the participants the App was further improved, especially for usability. The public launch of the App is planned for the beginning of 2012. The next important step is to develop the application for other platforms like Android. The aftermath of the next disaster will then show if users will send back useful information. Further work will address the processing of information for extracting added value information, including spatio-temporal clustering, moderation and sense-making algorithms.

  16. Automated Information Extraction on Treatment and Prognosis for Non-Small Cell Lung Cancer Radiotherapy Patients: Clinical Study.

    PubMed

    Zheng, Shuai; Jabbour, Salma K; O'Reilly, Shannon E; Lu, James J; Dong, Lihua; Ding, Lijuan; Xiao, Ying; Yue, Ning; Wang, Fusheng; Zou, Wei

    2018-02-01

    In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non-small cell lung cancer (NSCLC). We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies. ©Shuai Zheng, Salma K Jabbour, Shannon E O'Reilly, James J Lu, Lihua Dong, Lijuan Ding, Ying Xiao, Ning Yue, Fusheng Wang, Wei Zou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.02.2018.

  17. Optimizing Crawler4j using MapReduce Programming Model

    NASA Astrophysics Data System (ADS)

    Siddesh, G. M.; Suresh, Kavya; Madhuri, K. Y.; Nijagal, Madhushree; Rakshitha, B. R.; Srinivasa, K. G.

    2017-06-01

    World wide web is a decentralized system that consists of a repository of information on the basis of web pages. These web pages act as a source of information or data in the present analytics world. Web crawlers are used for extracting useful information from web pages for different purposes. Firstly, it is used in web search engines where the web pages are indexed to form a corpus of information and allows the users to query on the web pages. Secondly, it is used for web archiving where the web pages are stored for later analysis phases. Thirdly, it can be used for web mining where the web pages are monitored for copyright purposes. The amount of information processed by the web crawler needs to be improved by using the capabilities of modern parallel processing technologies. In order to solve the problem of parallelism and the throughput of crawling this work proposes to optimize the Crawler4j using the Hadoop MapReduce programming model by parallelizing the processing of large input data. Crawler4j is a web crawler that retrieves useful information about the pages that it visits. The crawler Crawler4j coupled with data and computational parallelism of Hadoop MapReduce programming model improves the throughput and accuracy of web crawling. The experimental results demonstrate that the proposed solution achieves significant improvements with respect to performance and throughput. Hence the proposed approach intends to carve out a new methodology towards optimizing web crawling by achieving significant performance gain.

  18. Clinical exome sequencing reports: current informatics practice and future opportunities.

    PubMed

    Swaminathan, Rajeswari; Huang, Yungui; Astbury, Caroline; Fitzgerald-Butt, Sara; Miller, Katherine; Cole, Justin; Bartlett, Christopher; Lin, Simon

    2017-11-01

    The increased adoption of clinical whole exome sequencing (WES) has improved the diagnostic yield for patients with complex genetic conditions. However, the informatics practice for handling information contained in whole exome reports is still in its infancy, as evidenced by the lack of a common vocabulary within clinical sequencing reports generated across genetic laboratories. Genetic testing results are mostly transmitted using portable document format, which can make secondary analysis and data extraction challenging. This paper reviews a sample of clinical exome reports generated by Clinical Laboratory Improvement Amendments-certified genetic testing laboratories at tertiary-care facilities to assess and identify common data elements. Like structured radiology reports, which enable faster information retrieval and reuse, structuring genetic information within clinical WES reports would help facilitate integration of genetic information into electronic health records and enable retrospective research on the clinical utility of WES. We identify elements listed as mandatory according to practice guidelines but are currently missing from some of the clinical reports, which might help to organize the data when stored within structured databases. We also highlight elements, such as patient consent, that, although they do not appear within any of the current reports, may help in interpreting some of the information within the reports. Integrating genetic and clinical information would assist the adoption of personalized medicine for improved patient care and outcomes. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus

    PubMed Central

    2015-01-01

    Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus. PMID:26099853

  20. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    PubMed

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.

Top