Sample records for extracting source information

  1. Information Extraction from Unstructured Text for the Biodefense Knowledge Center

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samatova, N F; Park, B; Krishnamurthy, R

    2005-04-29

    The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces ofmore » our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.« less

  2. Information Extraction for System-Software Safety Analysis: Calendar Year 2008 Year-End Report

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.

    2009-01-01

    This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.

  3. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

    PubMed Central

    Hunter, Lawrence; Lu, Zhiyong; Firby, James; Baumgartner, William A; Johnson, Helen L; Ogren, Philip V; Cohen, K Bretonnel

    2008-01-01

    Background Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at PMID:18237434

  4. The role of cognitive switching in head-up displays. [to determine pilot ability to accurately extract information from either of two sources

    NASA Technical Reports Server (NTRS)

    Fischer, E.

    1979-01-01

    The pilot's ability to accurately extract information from either one or both of two superimposed sources of information was determined. Static, aerial, color 35 mm slides of external runway environments and slides of corresponding static head-up display (HUD) symbology were used as the sources. A three channel tachistoscope was utilized to show either the HUD alone, the scene alone, or the two slides superimposed. Cognitive performance of the pilots was assessed by determining the percentage of correct answers given to two HUD related questions, two scene related questions, or one HUD and one scene related question.

  5. Place in Perspective: Extracting Online Information about Points of Interest

    NASA Astrophysics Data System (ADS)

    Alves, Ana O.; Pereira, Francisco C.; Rodrigues, Filipe; Oliveirinha, João

    During the last few years, the amount of online descriptive information about places has reached reasonable dimensions for many cities in the world. Being such information mostly in Natural Language text, Information Extraction techniques are needed for obtaining the meaning of places that underlies these massive amounts of commonsense and user made sources. In this article, we show how we automatically label places using Information Extraction techniques applied to online resources such as Wikipedia, Yellow Pages and Yahoo!.

  6. Comparison of Three Information Sources for Smoking Information in Electronic Health Records

    PubMed Central

    Wang, Liwei; Ruan, Xiaoyang; Yang, Ping; Liu, Hongfang

    2016-01-01

    OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. PMID:27980387

  7. Information extraction from full text scientific articles: where are the keywords?

    PubMed

    Shah, Parantu K; Perez-Iratxeta, Carolina; Bork, Peer; Andrade, Miguel A

    2003-05-29

    To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant. In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous. Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.

  8. Review of Extracting Information From the Social Web for Health Personalization

    PubMed Central

    Karlsen, Randi; Bonander, Jason

    2011-01-01

    In recent years the Web has come into its own as a social platform where health consumers are actively creating and consuming Web content. Moreover, as the Web matures, consumers are gaining access to personalized applications adapted to their health needs and interests. The creation of personalized Web applications relies on extracted information about the users and the content to personalize. The Social Web itself provides many sources of information that can be used to extract information for personalization apart from traditional Web forms and questionnaires. This paper provides a review of different approaches for extracting information from the Social Web for health personalization. We reviewed research literature across different fields addressing the disclosure of health information in the Social Web, techniques to extract that information, and examples of personalized health applications. In addition, the paper includes a discussion of technical and socioethical challenges related to the extraction of information for health personalization. PMID:21278049

  9. Automation for System Safety Analysis

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Fleming, Land; Throop, David; Thronesbery, Carroll; Flores, Joshua; Bennett, Ted; Wennberg, Paul

    2009-01-01

    This presentation describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.

  10. Feasibility of approaches combining sensor and source features in brain-computer interface.

    PubMed

    Ahn, Minkyu; Hong, Jun Hee; Jun, Sung Chan

    2012-02-15

    Brain-computer interface (BCI) provides a new channel for communication between brain and computers through brain signals. Cost-effective EEG provides good temporal resolution, but its spatial resolution is poor and sensor information is blurred by inherent noise. To overcome these issues, spatial filtering and feature extraction techniques have been developed. Source imaging, transformation of sensor signals into the source space through source localizer, has gained attention as a new approach for BCI. It has been reported that the source imaging yields some improvement of BCI performance. However, there exists no thorough investigation on how source imaging information overlaps with, and is complementary to, sensor information. Information (visible information) from the source space may overlap as well as be exclusive to information from the sensor space is hypothesized. Therefore, we can extract more information from the sensor and source spaces if our hypothesis is true, thereby contributing to more accurate BCI systems. In this work, features from each space (sensor or source), and two strategies combining sensor and source features are assessed. The information distribution among the sensor, source, and combined spaces is discussed through a Venn diagram for 18 motor imagery datasets. Additional 5 motor imagery datasets from the BCI Competition III site were examined. The results showed that the addition of source information yielded about 3.8% classification improvement for 18 motor imagery datasets and showed an average accuracy of 75.56% for BCI Competition data. Our proposed approach is promising, and improved performance may be possible with better head model. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. Typicality and Misinformation: Two Sources of Distortion

    ERIC Educational Resources Information Center

    Luna, Karlos; Migueles, Malen

    2008-01-01

    This study examined the effect of two sources of memory error: exposure to post-event information and extracting typical contents from schemata. Participants were shown a video of a bank robbery and presented with high-and low-typicality misinformation extracted from two normative studies. The misleading suggestions consisted of either changes in…

  12. Use of Information--LMC Connection

    ERIC Educational Resources Information Center

    Darrow, Rob

    2005-01-01

    Note taking plays an important part in the correct extracting of information from reference sources. The "Cornell Note Taking Method" initially developed as a method of taking notes during a lecture is well suited for taking notes from print sources and is one of the best "Use of Information" methods.

  13. Information extraction from multi-institutional radiology reports.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system.

    PubMed

    Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C; Xu, Hua

    2014-01-01

    Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.

  15. A New Self-Constrained Inversion Method of Potential Fields Based on Probability Tomography

    NASA Astrophysics Data System (ADS)

    Sun, S.; Chen, C.; WANG, H.; Wang, Q.

    2014-12-01

    The self-constrained inversion method of potential fields uses a priori information self-extracted from potential field data. Differing from external a priori information, the self-extracted information are generally parameters derived exclusively from the analysis of the gravity and magnetic data (Paoletti et al., 2013). Here we develop a new self-constrained inversion method based on probability tomography. Probability tomography doesn't need any priori information, as well as large inversion matrix operations. Moreover, its result can describe the sources, especially the distribution of which is complex and irregular, entirely and clearly. Therefore, we attempt to use the a priori information extracted from the probability tomography results to constrain the inversion for physical properties. The magnetic anomaly data was taken as an example in this work. The probability tomography result of magnetic total field anomaly(ΔΤ) shows a smoother distribution than the anomalous source and cannot display the source edges exactly. However, the gradients of ΔΤ are with higher resolution than ΔΤ in their own direction, and this characteristic is also presented in their probability tomography results. So we use some rules to combine the probability tomography results of ∂ΔΤ⁄∂x, ∂ΔΤ⁄∂y and ∂ΔΤ⁄∂z into a new result which is used for extracting a priori information, and then incorporate the information into the model objective function as spatial weighting functions to invert the final magnetic susceptibility. Some magnetic synthetic examples incorporated with and without a priori information extracted from the probability tomography results were made to do comparison, results of which show that the former are more concentrated and with higher resolution of the source body edges. This method is finally applied in an iron mine in China with field measured ΔΤ data and performs well. ReferencesPaoletti, V., Ialongo, S., Florio, G., Fedi, M. & Cella, F., 2013. Self-constrained inversion of potential fields, Geophys J Int.This research is supported by the Fundamental Research Funds for Institute for Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences (Grant Nos. WHS201210 and WHS201211).

  16. Knowledge Acquisition of Generic Queries for Information Retrieval

    PubMed Central

    Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.

    2002-01-01

    Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.

  17. Integration of forward-looking infrared (FLIR) and traffic information for moving obstacle detection with integrity

    NASA Astrophysics Data System (ADS)

    Zhu, Zhen; Vana, Sudha; Bhattacharya, Sumit; Uijt de Haag, Maarten

    2009-05-01

    This paper discusses the integration of Forward-looking Infrared (FLIR) and traffic information from, for example, the Automatic Dependent Surveillance - Broadcast (ADS-B) or the Traffic Information Service-Broadcast (TIS-B). The goal of this integration method is to obtain an improved state estimate of a moving obstacle within the Field-of-View of the FLIR with added integrity. The focus of the paper will be on the approach phase of the flight. The paper will address methods to extract moving objects from the FLIR imagery and geo-reference these objects using outputs of both the onboard Global Positioning System (GPS) and the Inertial Navigation System (INS). The proposed extraction method uses a priori airport information and terrain databases. Furthermore, state information from the traffic information sources will be extracted and integrated with the state estimates from the FLIR. Finally, a method will be addressed that performs a consistency check between both sources of traffic information. The methods discussed in this paper will be evaluated using flight test data collected with a Gulfstream V in Reno, NV (GVSITE) and simulated ADS-B.

  18. Extracting and standardizing medication information in clinical text – the MedEx-UIMA system

    PubMed Central

    Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C.; Xu, Hua

    2014-01-01

    Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/. PMID:25954575

  19. Bioactive phytochemicals in wheat: Extraction, analysis, processing, and functional properties

    USDA-ARS?s Scientific Manuscript database

    Whole wheat provides a rich source of bioactive phytochemicals namely, phenolic acids, carotenoids, tocopherols, alkylresorcinols, arabinoxylans, benzoxazinoids, phytosterols, and lignans. This review provides information on the distribution, extractability, analysis, and nutraceutical properties of...

  20. Road and Roadside Feature Extraction Using Imagery and LIDAR Data for Transportation Operation

    NASA Astrophysics Data System (ADS)

    Ural, S.; Shan, J.; Romero, M. A.; Tarko, A.

    2015-03-01

    Transportation agencies require up-to-date, reliable, and feasibly acquired information on road geometry and features within proximity to the roads as input for evaluating and prioritizing new or improvement road projects. The information needed for a robust evaluation of road projects includes road centerline, width, and extent together with the average grade, cross-sections, and obstructions near the travelled way. Remote sensing is equipped with a large collection of data and well-established tools for acquiring the information and extracting aforementioned various road features at various levels and scopes. Even with many remote sensing data and methods available for road extraction, transportation operation requires more than the centerlines. Acquiring information that is spatially coherent at the operational level for the entire road system is challenging and needs multiple data sources to be integrated. In the presented study, we established a framework that used data from multiple sources, including one-foot resolution color infrared orthophotos, airborne LiDAR point clouds, and existing spatially non-accurate ancillary road networks. We were able to extract 90.25% of a total of 23.6 miles of road networks together with estimated road width, average grade along the road, and cross sections at specified intervals. Also, we have extracted buildings and vegetation within a predetermined proximity to the extracted road extent. 90.6% of 107 existing buildings were correctly identified with 31% false detection rate.

  1. Ontology-Based Information Extraction for Business Intelligence

    NASA Astrophysics Data System (ADS)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  2. A Tale of Two Paradigms: Disambiguating Extracted Entities with Applications to a Digital Library and the Web

    ERIC Educational Resources Information Center

    Huang, Jian

    2010-01-01

    With the increasing wealth of information on the Web, information integration is ubiquitous as the same real-world entity may appear in a variety of forms extracted from different sources. This dissertation proposes supervised and unsupervised algorithms that are naturally integrated in a scalable framework to solve the entity resolution problem,…

  3. Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease

    PubMed Central

    Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.

    1998-01-01

    The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.

  4. aCGH-MAS: Analysis of aCGH by means of Multiagent System

    PubMed Central

    Benito, Rocío; Bajo, Javier; Rodríguez, Ana Eugenia; Abáigar, María

    2015-01-01

    There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. PMID:25874203

  5. Exploring the Power of Heterogeneous Information Sources

    DTIC Science & Technology

    2011-01-01

    Individual movies are classified as being of one or more of 18 genres , such as Comedy and Thriller , which can be treated as binary vectors. 2) User... genres , from different sources, in different formats, and with different types of representation. Many interesting patterns cannot be extracted from a...provide better web services or help film distributors in decision making, we need to conduct integrative analysis of all the information sources. For

  6. A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

    PubMed Central

    Ortega-Martorell, Sandra; Ruiz, Héctor; Vellido, Alfredo; Olier, Iván; Romero, Enrique; Julià-Sapé, Margarida; Martín, José D.; Jarman, Ian H.; Arús, Carles; Lisboa, Paulo J. G.

    2013-01-01

    Background The clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyzes single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of tumor type classification from the spectroscopic signal. Methodology/Principal Findings Non-negative matrix factorization techniques have recently shown their potential for the identification of meaningful sources from brain tissue spectroscopy data. In this study, we use a convex variant of these methods that is capable of handling negatively-valued data and generating sources that can be interpreted as tumor class prototypes. A novel approach to convex non-negative matrix factorization is proposed, in which prior knowledge about class information is utilized in model optimization. Class-specific information is integrated into this semi-supervised process by setting the metric of a latent variable space where the matrix factorization is carried out. The reported experimental study comprises 196 cases from different tumor types drawn from two international, multi-center databases. The results indicate that the proposed approach outperforms a purely unsupervised process by achieving near perfect correlation of the extracted sources with the mean spectra of the tumor types. It also improves tissue type classification. Conclusions/Significance We show that source extraction by unsupervised matrix factorization benefits from the integration of the available class information, so operating in a semi-supervised learning manner, for discriminative source identification and brain tumor labeling from single-voxel spectroscopy data. We are confident that the proposed methodology has wider applicability for biomedical signal processing. PMID:24376744

  7. Investigation of automated feature extraction using multiple data sources

    NASA Astrophysics Data System (ADS)

    Harvey, Neal R.; Perkins, Simon J.; Pope, Paul A.; Theiler, James P.; David, Nancy A.; Porter, Reid B.

    2003-04-01

    An increasing number and variety of platforms are now capable of collecting remote sensing data over a particular scene. For many applications, the information available from any individual sensor may be incomplete, inconsistent or imprecise. However, other sources may provide complementary and/or additional data. Thus, for an application such as image feature extraction or classification, it may be that fusing the mulitple data sources can lead to more consistent and reliable results. Unfortunately, with the increased complexity of the fused data, the search space of feature-extraction or classification algorithms also greatly increases. With a single data source, the determination of a suitable algorithm may be a significant challenge for an image analyst. With the fused data, the search for suitable algorithms can go far beyond the capabilities of a human in a realistic time frame, and becomes the realm of machine learning, where the computational power of modern computers can be harnessed to the task at hand. We describe experiments in which we investigate the ability of a suite of automated feature extraction tools developed at Los Alamos National Laboratory to make use of multiple data sources for various feature extraction tasks. We compare and contrast this software's capabilities on 1) individual data sets from different data sources 2) fused data sets from multiple data sources and 3) fusion of results from multiple individual data sources.

  8. Toward a complete dataset of drug-drug interaction information from publicly available sources.

    PubMed

    Ayvaz, Serkan; Horn, John; Hassanzadeh, Oktie; Zhu, Qian; Stan, Johann; Tatonetti, Nicholas P; Vilar, Santiago; Brochhausen, Mathias; Samwald, Matthias; Rastegar-Mojarad, Majid; Dumontier, Michel; Boyce, Richard D

    2015-06-01

    Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Text mining for adverse drug events: the promise, challenges, and state of the art.

    PubMed

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H

    2014-10-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

  10. Automated extraction and semantic analysis of mutation impacts from the biomedical literature

    PubMed Central

    2012-01-01

    Background Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions. PMID:22759648

  11. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction.

    PubMed

    Jonnalagadda, Siddhartha; Gonzalez, Graciela

    2010-11-13

    BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net.

  12. Single-trial event-related potential extraction through one-unit ICA-with-reference

    NASA Astrophysics Data System (ADS)

    Lih Lee, Wee; Tan, Tele; Falkmer, Torbjörn; Leung, Yee Hong

    2016-12-01

    Objective. In recent years, ICA has been one of the more popular methods for extracting event-related potential (ERP) at the single-trial level. It is a blind source separation technique that allows the extraction of an ERP without making strong assumptions on the temporal and spatial characteristics of an ERP. However, the problem with traditional ICA is that the extraction is not direct and is time-consuming due to the need for source selection processing. In this paper, the application of an one-unit ICA-with-Reference (ICA-R), a constrained ICA method, is proposed. Approach. In cases where the time-region of the desired ERP is known a priori, this time information is utilized to generate a reference signal, which is then used for guiding the one-unit ICA-R to extract the source signal of the desired ERP directly. Main results. Our results showed that, as compared to traditional ICA, ICA-R is a more effective method for analysing ERP because it avoids manual source selection and it requires less computation thus resulting in faster ERP extraction. Significance. In addition to that, since the method is automated, it reduces the risks of any subjective bias in the ERP analysis. It is also a potential tool for extracting the ERP in online application.

  13. Single-trial event-related potential extraction through one-unit ICA-with-reference.

    PubMed

    Lee, Wee Lih; Tan, Tele; Falkmer, Torbjörn; Leung, Yee Hong

    2016-12-01

    In recent years, ICA has been one of the more popular methods for extracting event-related potential (ERP) at the single-trial level. It is a blind source separation technique that allows the extraction of an ERP without making strong assumptions on the temporal and spatial characteristics of an ERP. However, the problem with traditional ICA is that the extraction is not direct and is time-consuming due to the need for source selection processing. In this paper, the application of an one-unit ICA-with-Reference (ICA-R), a constrained ICA method, is proposed. In cases where the time-region of the desired ERP is known a priori, this time information is utilized to generate a reference signal, which is then used for guiding the one-unit ICA-R to extract the source signal of the desired ERP directly. Our results showed that, as compared to traditional ICA, ICA-R is a more effective method for analysing ERP because it avoids manual source selection and it requires less computation thus resulting in faster ERP extraction. In addition to that, since the method is automated, it reduces the risks of any subjective bias in the ERP analysis. It is also a potential tool for extracting the ERP in online application.

  14. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

    PubMed Central

    Pathak, Jyotishman; Bailey, Kent R; Beebe, Calvin E; Bethard, Steven; Carrell, David S; Chen, Pei J; Dligach, Dmitriy; Endle, Cory M; Hart, Lacey A; Haug, Peter J; Huff, Stanley M; Kaggal, Vinod C; Li, Dingcheng; Liu, Hongfang; Marchant, Kyle; Masanz, James; Miller, Timothy; Oniki, Thomas A; Palmer, Martha; Peterson, Kevin J; Rea, Susan; Savova, Guergana K; Stancl, Craig R; Sohn, Sunghwan; Solbrig, Harold R; Suesse, Dale B; Tao, Cui; Taylor, David P; Westberg, Les; Wu, Stephen; Zhuo, Ning; Chute, Christopher G

    2013-01-01

    Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems—Mayo Clinic and Intermountain Healthcare—were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines—namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)—we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts. PMID:24190931

  15. Social network extraction based on Web: 3. the integrated superficial method

    NASA Astrophysics Data System (ADS)

    Nasution, M. K. M.; Sitompul, O. S.; Noah, S. A.

    2018-03-01

    The Web as a source of information has become part of the social behavior information. Although, by involving only the limitation of information disclosed by search engines in the form of: hit counts, snippets, and URL addresses of web pages, the integrated extraction method produces a social network not only trusted but enriched. Unintegrated extraction methods may produce social networks without explanation, resulting in poor supplemental information, or resulting in a social network of durmise laden, consequently unrepresentative social structures. The integrated superficial method in addition to generating the core social network, also generates an expanded network so as to reach the scope of relation clues, or number of edges computationally almost similar to n(n - 1)/2 for n social actors.

  16. Automated concept-level information extraction to reduce the need for custom software and rules development.

    PubMed

    D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D

    2011-01-01

    Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.

  17. Early Warning and Outbreak Detection Using Social Networking Websites: The Potential of Twitter

    NASA Astrophysics Data System (ADS)

    de Quincey, Ed; Kostkova, Patty

    Epidemic Intelligence is being used to gather information about potential diseases outbreaks from both formal and increasingly informal sources. A potential addition to these informal sources are social networking sites such as Facebook and Twitter. In this paper we describe a method for extracting messages, called "tweets" from the Twitter website and the results of a pilot study which collected over 135,000 tweets in a week during the current Swine Flu pandemic.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gydesen, S.P.

    The purpose of this letter report is to reconstruct from available information that data which can be used to develop daily reactor operating history for 1960--1964. The information needed for source team calculations (as determined by the Source Terms Task Leader) were extracted and included in this report. The data on the amount of uranium dissolved by the separations plants (expressed both as tons and as MW) is also included in this compilation.

  19. Supercritical fluid extraction of the non-polar organic compounds in meteorites

    NASA Astrophysics Data System (ADS)

    Sephton, M. A.; Pillinger, C. T.; Gilmour, I.

    2001-01-01

    The carbonaceous chondrite meteorites contain a variety of extraterrestrial organic molecules. These organic components provide a valuable insight into the formation and evolution of the solar system. Attempts at obtaining and interpreting this information source are hampered by the small sample sizes available for study and the interferences from terrestrial contamination. Supercritical fluid extraction represents an efficient and contamination-free means of isolating extraterrestrial molecules. Gas chromatography-mass spectrometry analyses of extracts from Orgueil and Cold Bokkeveld reveal a complex mixture of free non-polar organic molecules which include normal alkanes, isoprenoid alkanes, tetrahydronaphthalenes and aromatic hydrocarbons. These organic assemblages imply contributions from both terrestrial and extraterrestrial sources.

  20. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations

    PubMed Central

    2017-01-01

    Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations. PMID:28644863

  1. Automatic generation of Web mining environments

    NASA Astrophysics Data System (ADS)

    Cibelli, Maurizio; Costagliola, Gennaro

    1999-02-01

    The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.

  2. The Role of Mother in Informing Girls About Puberty: A Meta-Analysis Study

    PubMed Central

    Sooki, Zahra; Shariati, Mohammad; Chaman, Reza; Khosravi, Ahmad; Effatpanah, Mohammad; Keramat, Afsaneh

    2016-01-01

    Context Family, especially the mother, has the most important role in the education, transformation of information, and health behaviors of girls in order for them to have a healthy transition from the critical stage of puberty, but there are different views in this regard. Objectives Considering the various findings about the source of information about puberty, a meta-analysis study was conducted to investigate the extent of the mother’s role in informing girls about puberty. Data Sources This meta-analysis study was based on English articles published from 2000 to February 2015 in the Scopus, PubMed, and Science direct databases and on Persian articles in the SID, Magiran, and Iran Medex databases with determined key words and their MeSH equivalent. Study Selection Quantitative cross-sectional articles were extracted by two independent researchers and finally 46 articles were selected based on inclusion criteria. STROBE list were used for evaluation of studies. Data Extraction The percent of mothers as the current and preferred source of gaining information about the process of puberty, menarche, and menstruation from the perspective of adolescent girls was extracted from the articles. The results of studies were analyzed using meta-analysis (random effects model) and the studies’ heterogeneity was analyzed using the I2 calculation index. Variance between studies was analyzed using tau squared (Tau2) and review manager 5 software. Results The results showed that, from the perspective of teenage girls in Iran and other countries, in 56% of cases, the mother was the current source of information about the process of puberty, menarche, and menstruation. The preferred source of information about the process of puberty, menarche, and menstruation was the mother in all studies at 60% (Iran 57%, and other countries 66%). Conclusions According to the findings of this study, it is essential that health professionals and officials of the ministry of health train mothers about the time, trends, and factors affecting the start of puberty using a multi-dimensional approach that involves religious organizations, community groups, and peer groups. PMID:27331056

  3. Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art

    PubMed Central

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H.

    2014-01-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. Text mining is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text-mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance. PMID:25151493

  4. Considering context: reliable entity networks through contextual relationship extraction

    NASA Astrophysics Data System (ADS)

    David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.

    2016-05-01

    Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.

  5. Noninvasive Electromagnetic Source Imaging and Granger Causality Analysis: An Electrophysiological Connectome (eConnectome) Approach

    PubMed Central

    Sohrabpour, Abbas; Ye, Shuai; Worrell, Gregory A.; Zhang, Wenbo

    2016-01-01

    Objective Combined source imaging techniques and directional connectivity analysis can provide useful information about the underlying brain networks in a non-invasive fashion. Source imaging techniques have been used successfully to either determine the source of activity or to extract source time-courses for Granger causality analysis, previously. In this work, we utilize source imaging algorithms to both find the network nodes (regions of interest) and then extract the activation time series for further Granger causality analysis. The aim of this work is to find network nodes objectively from noninvasive electromagnetic signals, extract activation time-courses and apply Granger analysis on the extracted series to study brain networks under realistic conditions. Methods Source imaging methods are used to identify network nodes and extract time-courses and then Granger causality analysis is applied to delineate the directional functional connectivity of underlying brain networks. Computer simulations studies where the underlying network (nodes and connectivity pattern) is known were performed; additionally, this approach has been evaluated in partial epilepsy patients to study epilepsy networks from inter-ictal and ictal signals recorded by EEG and/or MEG. Results Localization errors of network nodes are less than 5 mm and normalized connectivity errors of ~20% in estimating underlying brain networks in simulation studies. Additionally, two focal epilepsy patients were studied and the identified nodes driving the epileptic network were concordant with clinical findings from intracranial recordings or surgical resection. Conclusion Our study indicates that combined source imaging algorithms with Granger causality analysis can identify underlying networks precisely (both in terms of network nodes location and internodal connectivity). Significance The combined source imaging and Granger analysis technique is an effective tool for studying normal or pathological brain conditions. PMID:27740473

  6. Noninvasive Electromagnetic Source Imaging and Granger Causality Analysis: An Electrophysiological Connectome (eConnectome) Approach.

    PubMed

    Sohrabpour, Abbas; Ye, Shuai; Worrell, Gregory A; Zhang, Wenbo; He, Bin

    2016-12-01

    Combined source-imaging techniques and directional connectivity analysis can provide useful information about the underlying brain networks in a noninvasive fashion. Source-imaging techniques have been used successfully to either determine the source of activity or to extract source time-courses for Granger causality analysis, previously. In this work, we utilize source-imaging algorithms to both find the network nodes [regions of interest (ROI)] and then extract the activation time series for further Granger causality analysis. The aim of this work is to find network nodes objectively from noninvasive electromagnetic signals, extract activation time-courses, and apply Granger analysis on the extracted series to study brain networks under realistic conditions. Source-imaging methods are used to identify network nodes and extract time-courses and then Granger causality analysis is applied to delineate the directional functional connectivity of underlying brain networks. Computer simulations studies where the underlying network (nodes and connectivity pattern) is known were performed; additionally, this approach has been evaluated in partial epilepsy patients to study epilepsy networks from interictal and ictal signals recorded by EEG and/or Magnetoencephalography (MEG). Localization errors of network nodes are less than 5 mm and normalized connectivity errors of ∼20% in estimating underlying brain networks in simulation studies. Additionally, two focal epilepsy patients were studied and the identified nodes driving the epileptic network were concordant with clinical findings from intracranial recordings or surgical resection. Our study indicates that combined source-imaging algorithms with Granger causality analysis can identify underlying networks precisely (both in terms of network nodes location and internodal connectivity). The combined source imaging and Granger analysis technique is an effective tool for studying normal or pathological brain conditions.

  7. Extracting spatial information from large aperture exposures of diffuse sources

    NASA Technical Reports Server (NTRS)

    Clarke, J. T.; Moos, H. W.

    1981-01-01

    The spatial properties of large aperture exposures of diffuse emission can be used both to investigate spatial variations in the emission and to filter out camera noise in exposures of weak emission sources. Spatial imaging can be accomplished both parallel and perpendicular to dispersion with a resolution of 5-6 arc sec, and a narrow median filter running perpendicular to dispersion across a diffuse image selectively filters out point source features, such as reseaux marks and fast particle hits. Spatial information derived from observations of solar system objects is presented.

  8. High-resolution extraction of particle size via Fourier Ptychography

    NASA Astrophysics Data System (ADS)

    Li, Shengfu; Zhao, Yu; Chen, Guanghua; Luo, Zhenxiong; Ye, Yan

    2017-11-01

    This paper proposes a method which can extract the particle size information with a resolution beyond λ/NA. This is achieved by applying Fourier Ptychographic (FP) ideas to the present problem. In a typical FP imaging platform, a 2D LED array is used as light sources for angle-varied illuminations, a series of low-resolution images was taken by a full sequential scan of the array of LEDs. Here, we demonstrate the particle size information is extracted by turning on each single LED on a circle. The simulated results show that the proposed method can reduce the total number of images, without loss of reliability in the results.

  9. Limited Role of Contextual Information in Adult Word Recognition. Technical Report No. 411.

    ERIC Educational Resources Information Center

    Durgunoglu, Aydin Y.

    Recognizing a word in a meaningful text involves processes that combine information from many different sources, and both bottom-up processes (such as feature extraction and letter recognition) and top-down processes (contextual information) are thought to interact when skilled readers recognize words. Two similar experiments investigated word…

  10. The Gender Puzzle: Toddlers' Use of Articles to Access Noun Information

    ERIC Educational Resources Information Center

    Arias-Trejo, Natalia; Falcon, Alberto; Alva-Canto, Elda A.

    2013-01-01

    Grammatical gender embedded in determiners, nouns and adjectives allows indirect and more rapid processing of the referents implied in sentences. However in a language such as Spanish, this useful information cannot be reliably retrieved from a single source of information. Instead, noun gender may be extracted either from phono-morphological,…

  11. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

    PubMed Central

    AAlAbdulsalam, Abdulrahman K.; Garvin, Jennifer H.; Redd, Andrew; Carter, Marjorie E.; Sweeny, Carol; Meystre, Stephane M.

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). PMID:29888032

  12. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry.

    PubMed

    AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

  13. Functional source separation and hand cortical representation for a brain–computer interface feature extraction

    PubMed Central

    Tecchio, Franca; Porcaro, Camillo; Barbati, Giulia; Zappasodi, Filippo

    2007-01-01

    A brain–computer interface (BCI) can be defined as any system that can track the person's intent which is embedded in his/her brain activity and, from it alone, translate the intention into commands of a computer. Among the brain signal monitoring systems best suited for this challenging task, electroencephalography (EEG) and magnetoencephalography (MEG) are the most realistic, since both are non-invasive, EEG is portable and MEG could provide more specific information that could be later exploited also through EEG signals. The first two BCI steps require set up of the appropriate experimental protocol while recording the brain signal and then to extract interesting features from the recorded cerebral activity. To provide information useful in these BCI stages, our aim is to provide an overview of a new procedure we recently developed, named functional source separation (FSS). As it comes from the blind source separation algorithms, it exploits the most valuable information provided by the electrophysiological techniques, i.e. the waveform signal properties, remaining blind to the biophysical nature of the signal sources. FSS returns the single trial source activity, estimates the time course of a neuronal pool along different experimental states on the basis of a specific functional requirement in a specific time period, and uses the simulated annealing as the optimization procedure allowing the exploit of functional constraints non-differentiable. Moreover, a minor section is included, devoted to information acquired by MEG in stroke patients, to guide BCI applications aiming at sustaining motor behaviour in these patients. Relevant BCI features – spatial and time-frequency properties – are in fact altered by a stroke in the regions devoted to hand control. Moreover, a method to investigate the relationship between sensory and motor hand cortical network activities is described, providing information useful to develop BCI feedback control systems. This review provides a description of the FSS technique, a promising tool for the BCI community for online electrophysiological feature extraction, and offers interesting information to develop BCI applications to sustain hand control in stroke patients. PMID:17331989

  14. Clinical records anonymisation and text extraction (CRATE): an open-source software system.

    PubMed

    Cardinal, Rudolf N

    2017-04-26

    Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.

  15. Profiling of poorly stratified smoky atmospheres with scanning lidar

    Treesearch

    Vladimir Kovalev; Cyle Wold; Alexander Petkov; Wei Min Hao

    2012-01-01

    The multiangle data processing technique is considered based on using the signal measured in zenith (or close to zenith) as a core source for extracting the information about the vertical atmospheric aerosol loading. The multiangle signals are used as the auxiliary data to extract the vertical transmittance profile from the zenith signal. Simulated and experimental...

  16. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

    PubMed Central

    Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G

    2010-01-01

    We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853

  17. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association

    PubMed Central

    Ma, Jian; Casey, Cameron P.; Zheng, Xueyun; Ibrahim, Yehia M.; Wilkins, Christopher S.; Renslow, Ryan S.; Thomas, Dennis G.; Payne, Samuel H.; Monroe, Matthew E.; Smith, Richard D.; Teeguarden, Justin G.; Baker, Erin S.; Metz, Thomas O.

    2017-01-01

    Abstract Motivation: Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. Results: We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. Availability and implementation: PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library. Contact: erin.baker@pnnl.gov or thomas.metz@pnnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28505286

  18. Combined rule extraction and feature elimination in supervised classification.

    PubMed

    Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E

    2012-09-01

    There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.

  19. Apparatus And Method For Osl-Based, Remote Radiation Monitoring And Spectrometry

    DOEpatents

    Miller, Steven D.; Smith, Leon Eric; Skorpik, James R.

    2006-03-07

    Compact, OSL-based devices for long-term, unattended radiation detection and spectroscopy are provided. In addition, a method for extracting spectroscopic information from these devices is taught. The devices can comprise OSL pixels and at least one radiation filter surrounding at least a portion of the OSL pixels. The filter can modulate an incident radiation flux. The devices can further comprise a light source and a detector, both proximally located to the OSL pixels, as well as a power source and a wireless communication device, each operably connected to the light source and the detector. Power consumption of the device ranges from ultra-low to zero. The OSL pixels can retain data regarding incident radiation events as trapped charges. The data can be extracted wirelessly or manually. The method for extracting spectroscopic data comprises optically stimulating the exposed OSL pixels, detecting a readout luminescence, and reconstructing an incident-energy spectrum from the luminescence.

  20. Apparatus and method for OSL-based, remote radiation monitoring and spectrometry

    DOEpatents

    Smith, Leon Eric [Richland, WA; Miller, Steven D [Richland, WA; Bowyer, Theodore W [Oakton, VA

    2008-05-20

    Compact, OSL-based devices for long-term, unattended radiation detection and spectroscopy are provided. In addition, a method for extracting spectroscopic information from these devices is taught. The devices can comprise OSL pixels and at least one radiation filter surrounding at least a portion of the OSL pixels. The filter can modulate an incident radiation flux. The devices can further comprise a light source and a detector, both proximally located to the OSL pixels, as well as a power source and a wireless communication device, each operably connected to the light source and the detector. Power consumption of the device ranges from ultra-low to zero. The OSL pixels can retain data regarding incident radiation events as trapped charges. The data can be extracted wirelessly or manually. The method for extracting spectroscopic data comprises optically stimulating the exposed OSL pixels, detecting a readout luminescence, and reconstructing an incident-energy spectrum from the luminescence.

  1. A Semantic Graph Query Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  2. Knowledge guided information fusion for segmentation of multiple sclerosis lesions in MRI images

    NASA Astrophysics Data System (ADS)

    Zhu, Chaozhe; Jiang, Tianzi

    2003-05-01

    In this work, T1-, T2- and PD-weighted MR images of multiple sclerosis (MS) patients, providing information on the properties of tissues from different aspects, are treated as three independent information sources for the detection and segmentation of MS lesions. Based on information fusion theory, a knowledge guided information fusion framework is proposed to accomplish 3-D segmentation of MS lesions. This framework consists of three parts: (1) information extraction, (2) information fusion, and (3) decision. Information provided by different spectral images is extracted and modeled separately in each spectrum using fuzzy sets, aiming at managing the uncertainty and ambiguity in the images due to noise and partial volume effect. In the second part, the possible fuzzy map of MS lesions in each spectral image is constructed from the extracted information under the guidance of experts' knowledge, and then the final fuzzy map of MS lesions is constructed through the fusion of the fuzzy maps obtained from different spectrum. Finally, 3-D segmentation of MS lesions is derived from the final fuzzy map. Experimental results show that this method is fast and accurate.

  3. The Herschel-SPIRE Point Source Catalog Version 2

    NASA Astrophysics Data System (ADS)

    Schulz, Bernhard; Marton, Gábor; Valtchanov, Ivan; María Pérez García, Ana; Pintér, Sándor; Appleton, Phil; Kiss, Csaba; Lim, Tanya; Lu, Nanyao; Papageorgiou, Andreas; Pearson, Chris; Rector, John; Sánchez Portal, Miguel; Shupe, David; Tóth, Viktor L.; Van Dyk, Schuyler; Varga-Verebélyi, Erika; Xu, Kevin

    2018-01-01

    The Herschel-SPIRE instrument mapped about 8% of the sky in Submillimeter broad-band filters centered at 250, 350, and 500 microns (1199, 857, 600 GHz) with spatial resolutions of 17.9”, 24.2”, and 35.4” respectively. We present here the 2nd version of the SPIRE Point Source Catalog (SPSC). Stacking on WISE 22 micron catalog sources led to the identification of 108 maps, out of 6878, that had astrometry offsets of greater than 5”. After fixing these deviations and re-derivation of all affected map-mosaics, we repeated the systematic and homogeneous source extraction performed on all maps, using an improved version of the 4 different photometry extraction methods that were already employed in the generation of the first version catalog. Only regions affected by strong Galactic emission, mostly in the Galactic Plane, were excluded, as they exceeded the limits of the available source extraction methods. Aimed primarily at point sources, that allow for the best photometric accuracy, the catalog contains also significant fractions of slightly extended sources. With most SPIRE maps being confusion limited, uncertainties in flux densities were established as a function of structure noise and flux density, based on the results of artificial source insertion experiments into real data along a range of celestial backgrounds. Many sources have been rejected that do not pass the imposed SNR threshold, especially at flux densities approaching the extragalactic confusion limit. A range of additional flags provide information on the reliability of the flux information, as well as the spatial extent and orientation of a source. The catalog should be particularly helpful for determining cold dust content in extragalactic and galactic sources with low to moderate background confusion. We present an overview of catalog construction, detailed content, and validation results, with focus on the improvements achieved in the second version that is soon to be released.

  4. SIDECACHE: Information access, management and dissemination framework for web services.

    PubMed

    Doderer, Mark S; Burkhardt, Cory; Robbins, Kay A

    2011-06-14

    Many bioinformatics algorithms and data sets are deployed using web services so that the results can be explored via the Internet and easily integrated into other tools and services. These services often include data from other sites that is accessed either dynamically or through file downloads. Developers of these services face several problems because of the dynamic nature of the information from the upstream services. Many publicly available repositories of bioinformatics data frequently update their information. When such an update occurs, the developers of the downstream service may also need to update. For file downloads, this process is typically performed manually followed by web service restart. Requests for information obtained by dynamic access of upstream sources is sometimes subject to rate restrictions. SideCache provides a framework for deploying web services that integrate information extracted from other databases and from web sources that are periodically updated. This situation occurs frequently in biotechnology where new information is being continuously generated and the latest information is important. SideCache provides several types of services including proxy access and rate control, local caching, and automatic web service updating. We have used the SideCache framework to automate the deployment and updating of a number of bioinformatics web services and tools that extract information from remote primary sources such as NCBI, NCIBI, and Ensembl. The SideCache framework also has been used to share research results through the use of a SideCache derived web service.

  5. VizieR Online Data Catalog: The Chandra Source Catalog, Release 1.1 (Evans+ 2012)

    NASA Astrophysics Data System (ADS)

    Evans, I. N.; Primini, F. A.; Glotfelty, C. S.; Anderson, C. S.; Bonaventura, N. R.; Chen, J. C.; Davis, J. E.; Doe, S. M.; Evans, J. D.; Fabbiano, G.; Galle, E. C.; Gibbs, D. G.; Grier, J. D.; Hain, R. M.; Hall, D. M.; Harbo, P. N.; He, X.; Houck, J. C.; Karovska, M.; Kashyap, V. L.; Lauer, J.; McCollough, M. L.; McDowell, J. C.; Miller, J. B.; Mitschang, A. W.; Morgan, D. L.; Mossman, A. E.; Nichols, J. S.; Nowak, M. A.; Plummer, D. A.; Refsdal, B. L.; Rots, A. H.; Siemiginowska, A.; Sundheim, B. A.; Tibbetts, M. S.; van Stone, D. W.; Winkelman, S. L.; Zografou, P.

    2014-01-01

    This version of the catalog is release 1.1. It includes the information contained in release 1.0.1, plus point and compact source data extracted from HRC imaging observations, and catch-up ACIS observations released publicly prior to the end of 2009. (1 data file).

  6. Data Fusion for Enhanced Aircraft Engine Prognostics and Health Management

    NASA Technical Reports Server (NTRS)

    Volponi, Al

    2005-01-01

    Aircraft gas-turbine engine data is available from a variety of sources, including on-board sensor measurements, maintenance histories, and component models. An ultimate goal of Propulsion Health Management (PHM) is to maximize the amount of meaningful information that can be extracted from disparate data sources to obtain comprehensive diagnostic and prognostic knowledge regarding the health of the engine. Data fusion is the integration of data or information from multiple sources for the achievement of improved accuracy and more specific inferences than can be obtained from the use of a single sensor alone. The basic tenet underlying the data/ information fusion concept is to leverage all available information to enhance diagnostic visibility, increase diagnostic reliability and reduce the number of diagnostic false alarms. This report describes a basic PHM data fusion architecture being developed in alignment with the NASA C-17 PHM Flight Test program. The challenge of how to maximize the meaningful information extracted from disparate data sources to obtain enhanced diagnostic and prognostic information regarding the health and condition of the engine is the primary goal of this endeavor. To address this challenge, NASA Glenn Research Center, NASA Dryden Flight Research Center, and Pratt & Whitney have formed a team with several small innovative technology companies to plan and conduct a research project in the area of data fusion, as it applies to PHM. Methodologies being developed and evaluated have been drawn from a wide range of areas including artificial intelligence, pattern recognition, statistical estimation, and fuzzy logic. This report will provide a chronology and summary of the work accomplished under this research contract.

  7. The Effects of Age and Set Size on the Fast Extraction of Egocentric Distance

    PubMed Central

    Gajewski, Daniel A.; Wallin, Courtney P.; Philbeck, John W.

    2016-01-01

    Angular direction is a source of information about the distance to floor-level objects that can be extracted from brief glimpses (near one's threshold for detection). Age and set size are two factors known to impact the viewing time needed to directionally localize an object, and these were posited to similarly govern the extraction of distance. The question here was whether viewing durations sufficient to support object detection (controlled for age and set size) would also be sufficient to support well-constrained judgments of distance. Regardless of viewing duration, distance judgments were more accurate (less biased towards underestimation) when multiple potential targets were presented, suggesting that the relative angular declinations between the objects are an additional source of useful information. Distance judgments were more precise with additional viewing time, but the benefit did not depend on set size and accuracy did not improve with longer viewing durations. The overall pattern suggests that distance can be efficiently derived from direction for floor-level objects. Controlling for age-related differences in the viewing time needed to support detection was sufficient to support distal localization but only when brief and longer glimpse trials were interspersed. Information extracted from longer glimpse trials presumably supported performance on subsequent trials when viewing time was more limited. This outcome suggests a particularly important role for prior visual experience in distance judgments for older observers. PMID:27398065

  8. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    ERIC Educational Resources Information Center

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  9. Markov Logic Networks for Adverse Drug Event Extraction from Text.

    PubMed

    Natarajan, Sriraam; Bangera, Vishal; Khot, Tushar; Picado, Jose; Wazalwar, Anurag; Costa, Vitor Santos; Page, David; Caldwell, Michael

    2017-05-01

    Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society. A diverse set of techniques from epidemiology, statistics, and computer science are being proposed and studied for ADE discovery from observational health data (e.g., EHR and claims data), social network data (e.g., Google and Twitter posts), and other information sources. Methodologies are needed for evaluating, quantitatively measuring, and comparing the ability of these various approaches to accurately discover ADEs. This work is motivated by the observation that text sources such as the Medline/Medinfo library provide a wealth of information on human health. Unfortunately, ADEs often result from unexpected interactions, and the connection between conditions and drugs is not explicit in these sources. Thus, in this work we address the question of whether we can quantitatively estimate relationships between drugs and conditions from the medical literature. This paper proposes and studies a state-of-the-art NLP-based extraction of ADEs from text.

  10. Text Detection, Tracking and Recognition in Video: A Comprehensive Survey.

    PubMed

    Yin, Xu-Cheng; Zuo, Ze-Yu; Tian, Shu; Liu, Cheng-Lin

    2016-04-14

    Intelligent analysis of video data is currently in wide demand because video is a major source of sensory data in our lives. Text is a prominent and direct source of information in video, while recent surveys of text detection and recognition in imagery [1], [2] focus mainly on text extraction from scene images. Here, this paper presents a comprehensive survey of text detection, tracking and recognition in video with three major contributions. First, a generic framework is proposed for video text extraction that uniformly describes detection, tracking, recognition, and their relations and interactions. Second, within this framework, a variety of methods, systems and evaluation protocols of video text extraction are summarized, compared, and analyzed. Existing text tracking techniques, tracking based detection and recognition techniques are specifically highlighted. Third, related applications, prominent challenges, and future directions for video text extraction (especially from scene videos and web videos) are also thoroughly discussed.

  11. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).

    PubMed

    Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S

    2016-10-01

    Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

  12. Development of an Information Fusion System for Engine Diagnostics and Health Management

    NASA Technical Reports Server (NTRS)

    Volponi, Allan J.; Brotherton, Tom; Luppold, Robert; Simon, Donald L.

    2004-01-01

    Aircraft gas-turbine engine data are available from a variety of sources including on-board sensor measurements, maintenance histories, and component models. An ultimate goal of Propulsion Health Management (PHM) is to maximize the amount of meaningful information that can be extracted from disparate data sources to obtain comprehensive diagnostic and prognostic knowledge regarding the health of the engine. Data Fusion is the integration of data or information from multiple sources, to achieve improved accuracy and more specific inferences than can be obtained from the use of a single sensor alone. The basic tenet underlying the data/information fusion concept is to leverage all available information to enhance diagnostic visibility, increase diagnostic reliability and reduce the number of diagnostic false alarms. This paper describes a basic PHM Data Fusion architecture being developed in alignment with the NASA C17 Propulsion Health Management (PHM) Flight Test program. The challenge of how to maximize the meaningful information extracted from disparate data sources to obtain enhanced diagnostic and prognostic information regarding the health and condition of the engine is the primary goal of this endeavor. To address this challenge, NASA Glenn Research Center (GRC), NASA Dryden Flight Research Center (DFRC) and Pratt & Whitney (P&W) have formed a team with several small innovative technology companies to plan and conduct a research project in the area of data fusion as applied to PHM. Methodologies being developed and evaluated have been drawn from a wide range of areas including artificial intelligence, pattern recognition, statistical estimation, and fuzzy logic. This paper will provide a broad overview of this work, discuss some of the methodologies employed and give some illustrative examples.

  13. Extracting Information from Narratives: An Application to Aviation Safety Reports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Posse, Christian; Matzke, Brett D.; Anderson, Catherine M.

    2005-05-12

    Aviation safety reports are the best available source of information about why a flight incident happened. However, stream of consciousness permeates the narratives making difficult the automation of the information extraction task. We propose an approach and infrastructure based on a common pattern specification language to capture relevant information via normalized template expression matching in context. Template expression matching handles variants of multi-word expressions. Normalization improves the likelihood of correct hits by standardizing and cleaning the vocabulary used in narratives. Checking for the presence of negative modifiers in the proximity of a potential hit reduces the chance of false hits.more » We present the above approach in the context of a specific application, which is the extraction of human performance factors from NASA ASRS reports. While knowledge infusion from experts plays a critical role during the learning phase, early results show that in a production mode, the automated process provides information that is consistent with analyses by human subjects.« less

  14. Information Retrieval Using Hadoop Big Data Analysis

    NASA Astrophysics Data System (ADS)

    Motwani, Deepak; Madan, Madan Lal

    This paper concern on big data analysis which is the cognitive operation of probing huge amounts of information in an attempt to get uncovers unseen patterns. Through Big Data Analytics Applications such as public and private organization sectors have formed a strategic determination to turn big data into cut throat benefit. The primary occupation of extracting value from big data give rise to a process applied to pull information from multiple different sources; this process is known as extract transforms and lode. This paper approach extract information from log files and Research Paper, awareness reduces the efforts for blueprint finding and summarization of document from several positions. The work is able to understand better Hadoop basic concept and increase the user experience for research. In this paper, we propose an approach for analysis log files for finding concise information which is useful and time saving by using Hadoop. Our proposed approach will be applied on different research papers on a specific domain and applied for getting summarized content for further improvement and make the new content.

  15. Multivariate EMD and full spectrum based condition monitoring for rotating machinery

    NASA Astrophysics Data System (ADS)

    Zhao, Xiaomin; Patel, Tejas H.; Zuo, Ming J.

    2012-02-01

    Early assessment of machinery health condition is of paramount importance today. A sensor network with sensors in multiple directions and locations is usually employed for monitoring the condition of rotating machinery. Extraction of health condition information from these sensors for effective fault detection and fault tracking is always challenging. Empirical mode decomposition (EMD) is an advanced signal processing technology that has been widely used for this purpose. Standard EMD has the limitation in that it works only for a single real-valued signal. When dealing with data from multiple sensors and multiple health conditions, standard EMD faces two problems. First, because of the local and self-adaptive nature of standard EMD, the decomposition of signals from different sources may not match in either number or frequency content. Second, it may not be possible to express the joint information between different sensors. The present study proposes a method of extracting fault information by employing multivariate EMD and full spectrum. Multivariate EMD can overcome the limitations of standard EMD when dealing with data from multiple sources. It is used to extract the intrinsic mode functions (IMFs) embedded in raw multivariate signals. A criterion based on mutual information is proposed for selecting a sensitive IMF. A full spectral feature is then extracted from the selected fault-sensitive IMF to capture the joint information between signals measured from two orthogonal directions. The proposed method is first explained using simple simulated data, and then is tested for the condition monitoring of rotating machinery applications. The effectiveness of the proposed method is demonstrated through monitoring damage on the vane trailing edge of an impeller and rotor-stator rub in an experimental rotor rig.

  16. EDGE COMPUTING AND CONTEXTUAL INFORMATION FOR THE INTERNET OF THINGS SENSORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, Levente

    Interpreting sensor data require knowledge about sensor placement and the surrounding environment. For a single sensor measurement, it is easy to document the context by visual observation, however for millions of sensors reporting data back to a server, the contextual information needs to be automatically extracted from either data analysis or leveraging complimentary data sources. Data layers that overlap spatially or temporally with sensor locations, can be used to extract the context and to validate the measurement. To minimize the amount of data transmitted through the internet, while preserving signal information content, two methods are explored; computation at the edgemore » and compressed sensing. We validate the above methods on wind and chemical sensor data (1) eliminate redundant measurement from wind sensors and (2) extract peak value of a chemical sensor measuring a methane plume. We present a general cloud based framework to validate sensor data based on statistical and physical modeling and contextual data extracted from geospatial data.« less

  17. XID+: Next generation XID development

    NASA Astrophysics Data System (ADS)

    Hurley, Peter

    2017-04-01

    XID+ is a prior-based source extraction tool which carries out photometry in the Herschel SPIRE (Spectral and Photometric Imaging Receiver) maps at the positions of known sources. It uses a probabilistic Bayesian framework that provides a natural framework in which to include prior information, and uses the Bayesian inference tool Stan to obtain the full posterior probability distribution on flux estimates.

  18. PDF text classification to leverage information extraction from publication reports.

    PubMed

    Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

    2016-06-01

    Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Information-Based Analysis of Data Assimilation (Invited)

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.; Gupta, H. V.; Crow, W. T.; Gong, W.

    2013-12-01

    Data assimilation is defined as the Bayesian conditioning of uncertain model simulations on observations for the purpose of reducing uncertainty about model states. Practical data assimilation methods make the application of Bayes' law tractable either by employing assumptions about the prior, posterior and likelihood distributions (e.g., the Kalman family of filters) or by using resampling methods (e.g., bootstrap filter). We propose to quantify the efficiency of these approximations in an OSSE setting using information theory and, in an OSSE or real-world validation setting, to measure the amount - and more importantly, the quality - of information extracted from observations during data assimilation. To analyze DA assumptions, uncertainty is quantified as the Shannon-type entropy of a discretized probability distribution. The maximum amount of information that can be extracted from observations about model states is the mutual information between states and observations, which is equal to the reduction in entropy in our estimate of the state due to Bayesian filtering. The difference between this potential and the actual reduction in entropy due to Kalman (or other type of) filtering measures the inefficiency of the filter assumptions. Residual uncertainty in DA posterior state estimates can be attributed to three sources: (i) non-injectivity of the observation operator, (ii) noise in the observations, and (iii) filter approximations. The contribution of each of these sources is measurable in an OSSE setting. The amount of information extracted from observations by data assimilation (or system identification, including parameter estimation) can also be measured by Shannon's theory. Since practical filters are approximations of Bayes' law, it is important to know whether the information that is extracted form observations by a filter is reliable. We define information as either good or bad, and propose to measure these two types of information using partial Kullback-Leibler divergences. Defined this way, good and bad information sum to total information. This segregation of information into good and bad components requires a validation target distribution; in a DA OSSE setting, this can be the true Bayesian posterior, but in a real-world setting the validation target might be determined by a set of in situ observations.

  20. Automated extraction of radiation dose information for CT examinations.

    PubMed

    Cook, Tessa S; Zimmerman, Stefan; Maidment, Andrew D A; Kim, Woojin; Boonn, William W

    2010-11-01

    Exposure to radiation as a result of medical imaging is currently in the spotlight, receiving attention from Congress as well as the lay press. Although scanner manufacturers are moving toward including effective dose information in the Digital Imaging and Communications in Medicine headers of imaging studies, there is a vast repository of retrospective CT data at every imaging center that stores dose information in an image-based dose sheet. As such, it is difficult for imaging centers to participate in the ACR's Dose Index Registry. The authors have designed an automated extraction system to query their PACS archive and parse CT examinations to extract the dose information stored in each dose sheet. First, an open-source optical character recognition program processes each dose sheet and converts the information to American Standard Code for Information Interchange (ASCII) text. Each text file is parsed, and radiation dose information is extracted and stored in a database which can be queried using an existing pathology and radiology enterprise search tool. Using this automated extraction pipeline, it is possible to perform dose analysis on the >800,000 CT examinations in the PACS archive and generate dose reports for all of these patients. It is also possible to more effectively educate technologists, radiologists, and referring physicians about exposure to radiation from CT by generating report cards for interpreted and performed studies. The automated extraction pipeline enables compliance with the ACR's reporting guidelines and greater awareness of radiation dose to patients, thus resulting in improved patient care and management. Copyright © 2010 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  1. Database integration in a multimedia-modeling environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dorow, Kevin E.

    2002-09-02

    Integration of data from disparate remote sources has direct applicability to modeling, which can support Brownfield assessments. To accomplish this task, a data integration framework needs to be established. A key element in this framework is the metadata that creates the relationship between the pieces of information that are important in the multimedia modeling environment and the information that is stored in the remote data source. The design philosophy is to allow modelers and database owners to collaborate by defining this metadata in such a way that allows interaction between their components. The main parts of this framework include toolsmore » to facilitate metadata definition, database extraction plan creation, automated extraction plan execution / data retrieval, and a central clearing house for metadata and modeling / database resources. Cross-platform compatibility (using Java) and standard communications protocols (http / https) allow these parts to run in a wide variety of computing environments (Local Area Networks, Internet, etc.), and, therefore, this framework provides many benefits. Because of the specific data relationships described in the metadata, the amount of data that have to be transferred is kept to a minimum (only the data that fulfill a specific request are provided as opposed to transferring the complete contents of a data source). This allows for real-time data extraction from the actual source. Also, the framework sets up collaborative responsibilities such that the different types of participants have control over the areas in which they have domain knowledge-the modelers are responsible for defining the data relevant to their models, while the database owners are responsible for mapping the contents of the database using the metadata definitions. Finally, the data extraction mechanism allows for the ability to control access to the data and what data are made available.« less

  2. Semantic Storyboard of Judicial Debates: A Novel Multimedia Summarization Environment

    ERIC Educational Resources Information Center

    Fersini, E.; Sartori, F.

    2012-01-01

    Purpose: The need of tools for content analysis, information extraction and retrieval of multimedia objects in their native form is strongly emphasized into the judicial domain: digital videos represent a fundamental informative source of events occurring during judicial proceedings that should be stored, organized and retrieved in short time and…

  3. Quantity and unit extraction for scientific and technical intelligence analysis

    NASA Astrophysics Data System (ADS)

    David, Peter; Hawes, Timothy

    2017-05-01

    Scientific and Technical (S and T) intelligence analysts consume huge amounts of data to understand how scientific progress and engineering efforts affect current and future military capabilities. One of the most important types of information S and T analysts exploit is the quantities discussed in their source material. Frequencies, ranges, size, weight, power, and numerous other properties and measurements describing the performance characteristics of systems and the engineering constraints that define them must be culled from source documents before quantified analysis can begin. Automating the process of finding and extracting the relevant quantities from a wide range of S and T documents is difficult because information about quantities and their units is often contained in unstructured text with ad hoc conventions used to convey their meaning. Currently, even simple tasks, such as searching for documents discussing RF frequencies in a band of interest, is a labor intensive and error prone process. This research addresses the challenges facing development of a document processing capability that extracts quantities and units from S and T data, and how Natural Language Processing algorithms can be used to overcome these challenges.

  4. Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds.

    PubMed

    Southan, Christopher; Várkonyi, Péter; Muresan, Sorel

    2009-07-06

    Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets. Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not. On the basis of chemical structure content per se public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.

  5. ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files.

    PubMed

    Karthikeyan, Muthukumarasamy; Vyas, Renu

    2016-01-01

    Digital access to chemical journals resulted in a vast array of molecular information that is now available in the supplementary material files in PDF format. However, extracting this molecular information, generally from a PDF document format is a daunting task. Here we present an approach to harvest 3D molecular data from the supporting information of scientific research articles that are normally available from publisher's resources. In order to demonstrate the feasibility of extracting truly computable molecules from PDF file formats in a fast and efficient manner, we have developed a Java based application, namely ChemEngine. This program recognizes textual patterns from the supplementary data and generates standard molecular structure data (bond matrix, atomic coordinates) that can be subjected to a multitude of computational processes automatically. The methodology has been demonstrated via several case studies on different formats of coordinates data stored in supplementary information files, wherein ChemEngine selectively harvested the atomic coordinates and interpreted them as molecules with high accuracy. The reusability of extracted molecular coordinate data was demonstrated by computing Single Point Energies that were in close agreement with the original computed data provided with the articles. It is envisaged that the methodology will enable large scale conversion of molecular information from supplementary files available in the PDF format into a collection of ready- to- compute molecular data to create an automated workflow for advanced computational processes. Software along with source codes and instructions available at https://sourceforge.net/projects/chemengine/files/?source=navbar.Graphical abstract.

  6. Extracting information from the text of electronic medical records to improve case detection: a systematic review

    PubMed Central

    Carroll, John A; Smith, Helen E; Scott, Donia; Cassell, Jackie A

    2016-01-01

    Background Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall). PMID:26911811

  7. MedXN: an open source medication extraction and normalization tool for clinical text

    PubMed Central

    Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang

    2014-01-01

    Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954

  8. Compositional and textural information from the dual inversion of visible, near and thermal infrared remotely sensed data

    NASA Technical Reports Server (NTRS)

    Brackett, Robert A.; Arvidson, Raymond E.

    1993-01-01

    A technique is presented that allows extraction of compositional and textural information from visible, near and thermal infrared remotely sensed data. Using a library of both emissivity and reflectance spectra, endmember abundances and endmember thermal inertias are extracted from AVIRIS (Airborne Visible and Infrared Imaging Spectrometer) and TIMS (Thermal Infrared Mapping Spectrometer) data over Lunar Crater Volcanic Field, Nevada, using a dual inversion. The inversion technique is motivated by upcoming Mars Observer data and the need for separation of composition and texture parameters from sub pixel mixtures of bedrock and dust. The model employed offers the opportunity to extract compositional and textural information for a variety of endmembers within a given pixel. Geologic inferences concerning grain size, abundance, and source of endmembers can be made directly from the inverted data. These parameters are of direct relevance to Mars exploration, both for Mars Observer and for follow-on missions.

  9. Using Web-Based Knowledge Extraction Techniques to Support Cultural Modeling

    NASA Astrophysics Data System (ADS)

    Smart, Paul R.; Sieck, Winston R.; Shadbolt, Nigel R.

    The World Wide Web is a potentially valuable source of information about the cognitive characteristics of cultural groups. However, attempts to use the Web in the context of cultural modeling activities are hampered by the large-scale nature of the Web and the current dominance of natural language formats. In this paper, we outline an approach to support the exploitation of the Web for cultural modeling activities. The approach begins with the development of qualitative cultural models (which describe the beliefs, concepts and values of cultural groups), and these models are subsequently used to develop an ontology-based information extraction capability. Our approach represents an attempt to combine conventional approaches to information extraction with epidemiological perspectives of culture and network-based approaches to cultural analysis. The approach can be used, we suggest, to support the development of models providing a better understanding of the cognitive characteristics of particular cultural groups.

  10. Toxics Release Inventory Chemical Hazard Information Profiles (TRI-CHIP) Dataset

    EPA Pesticide Factsheets

    The Toxics Release Inventory (TRI) Chemical Hazard Information Profiles (TRI-CHIP) dataset contains hazard information about the chemicals reported in TRI. Users can use this XML-format dataset to create their own databases and hazard analyses of TRI chemicals. The hazard information is compiled from a series of authoritative sources including the Integrated Risk Information System (IRIS). The dataset is provided as a downloadable .zip file that when extracted provides XML files and schemas for the hazard information tables.

  11. Worldwide Report, Nuclear Development and Proliferation

    DTIC Science & Technology

    1984-03-05

    transmissions and broadcasts. Materials from foreign- language sources are translated; those from English- language sources are transcribed or reprinted, with... Processing indicators such as [Text] or [Excerpt] in the first line of each item, or following the last line of a brief, indicate how the original...information was processed . Where no processing indicator is given, the infor- mation was summarized or extracted. Unfamiliar names rendered

  12. Simulations of the Far-infrared Sky

    NASA Astrophysics Data System (ADS)

    Andreani, P.; Lutz, D.; Poglitsch, A.; Genzel, R.

    2001-07-01

    One of the main tasks of FIRST is to carry out shallow and deep surveys in the far-IR / submm spectral domain with unprecedented sensitivity. Selecting unbiased samples out of deep surveys will be crucial to determine the history of evolving dusty objects, and therefore of star-formation. However, the usual procedures to extract information from a survey, i.e. selection of sources, computing the number counts, the luminosity and the correlation functions, and so on, cannot lead to a fully satisfactory and rigorous determination of the source characteristics. This is expecially true in the far-IR where source identification and redshift determination are difficult. To check the reliability of results the simulation of a large number of mock surveys is mandatory. This provides information on the observational biases and instrumental effects introduced by the observing procedures and allows one to understand how the different parameters affect the source observation and detection. The project we are undertaking consists of (1) simulating the far-IR/submm surveys as PACS (and SPIRE) will observe, (2) extracting from these complete mock catalogues, (3) for the foreseen photometric bands selecting high-z candidates in colour-colour diagrams, and (4) testing different observing strategies to assess observational biases and understand how the different parameters affect source observation and detection.

  13. a Geographic Data Gathering System for Image Geolocalization Refining

    NASA Astrophysics Data System (ADS)

    Semaan, B.; Servières, M.; Moreau, G.; Chebaro, B.

    2017-09-01

    Image geolocalization has become an important research field during the last decade. This field is divided into two main sections. The first is image geolocalization that is used to find out which country, region or city the image belongs to. The second one is refining image localization for uses that require more accuracy such as augmented reality and three dimensional environment reconstruction using images. In this paper we present a processing chain that gathers geographic data from several sources in order to deliver a better geolocalization than the GPS one of an image and precise camera pose parameters. In order to do so, we use multiple types of data. Among this information some are visible in the image and are extracted using image processing, other types of data can be extracted from image file headers or online image sharing platforms related information. Extracted information elements will not be expressive enough if they remain disconnected. We show that grouping these information elements helps finding the best geolocalization of the image.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, Levente

    Interpreting sensor data require knowledge about sensor placement and the surrounding environment. For a single sensor measurement, it is easy to document the context by visual observation, however for millions of sensors reporting data back to a server, the contextual information needs to be automatically extracted from either data analysis or leveraging complimentary data sources. Data layers that overlap spatially or temporally with sensor locations, can be used to extract the context and to validate the measurement. To minimize the amount of data transmitted through the internet, while preserving signal information content, two methods are explored; computation at the edgemore » and compressed sensing. We validate the above methods on wind and chemical sensor data (1) eliminate redundant measurement from wind sensors and (2) extract peak value of a chemical sensor measuring a methane plume. We present a general cloud based framework to validate sensor data based on statistical and physical modeling and contextual data extracted from geospatial data.« less

  15. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus

    PubMed Central

    2015-01-01

    Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus. PMID:26099853

  16. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    PubMed

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.

  17. Quality of information sources about mental disorders: a comparison of Wikipedia with centrally controlled web and printed sources.

    PubMed

    Reavley, N J; Mackinnon, A J; Morgan, A J; Alvarez-Jimenez, M; Hetrick, S E; Killackey, E; Nelson, B; Purcell, R; Yap, M B H; Jorm, A F

    2012-08-01

    Although mental health information on the internet is often of poor quality, relatively little is known about the quality of websites, such as Wikipedia, that involve participatory information sharing. The aim of this paper was to explore the quality of user-contributed mental health-related information on Wikipedia and compare this with centrally controlled information sources. Content on 10 mental health-related topics was extracted from 14 frequently accessed websites (including Wikipedia) providing information about depression and schizophrenia, Encyclopaedia Britannica, and a psychiatry textbook. The content was rated by experts according to the following criteria: accuracy, up-to-dateness, breadth of coverage, referencing and readability. Ratings varied significantly between resources according to topic. Across all topics, Wikipedia was the most highly rated in all domains except readability. The quality of information on depression and schizophrenia on Wikipedia is generally as good as, or better than, that provided by centrally controlled websites, Encyclopaedia Britannica and a psychiatry textbook.

  18. HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records

    PubMed Central

    Aggarwal, Anshul; Garhwal, Sunita

    2018-01-01

    Objectives One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. Methods A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. Results The HEDEA system is working, covering a large set of formats, to extract and analyse health information. Conclusions This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes. PMID:29770248

  19. HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records.

    PubMed

    Aggarwal, Anshul; Garhwal, Sunita; Kumar, Ajay

    2018-04-01

    One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. The HEDEA system is working, covering a large set of formats, to extract and analyse health information. This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.

  20. 78 FR 77670 - Information Collection Request Submitted to OMB for Review and Approval; Comment Request; NESHAP...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-24

    ...: http://www.epa.gov/dockets . Abstract: The sources subject to this rule (i.e., extraction plants, ceramic plants, foundries, incinerators, propellant plants, and machine shops which process beryllium and...

  1. Extraction of actionable information from crowdsourced disaster data.

    PubMed

    Kiatpanont, Rungsun; Tanlamai, Uthai; Chongstitvatana, Prabhas

    Natural disasters cause enormous damage to countries all over the world. To deal with these common problems, different activities are required for disaster management at each phase of the crisis. There are three groups of activities as follows: (1) make sense of the situation and determine how best to deal with it, (2) deploy the necessary resources, and (3) harmonize as many parties as possible, using the most effective communication channels. Current technological improvements and developments now enable people to act as real-time information sources. As a result, inundation with crowdsourced data poses a real challenge for a disaster manager. The problem is how to extract the valuable information from a gigantic data pool in the shortest possible time so that the information is still useful and actionable. This research proposed an actionable-data-extraction process to deal with the challenge. Twitter was selected as a test case because messages posted on Twitter are publicly available. Hashtag, an easy and very efficient technique, was also used to differentiate information. A quantitative approach to extract useful information from the tweets was supported and verified by interviews with disaster managers from many leading organizations in Thailand to understand their missions. The information classifications extracted from the collected tweets were first performed manually, and then the tweets were used to train a machine learning algorithm to classify future tweets. One particularly useful, significant, and primary section was the request for help category. The support vector machine algorithm was used to validate the results from the extraction process of 13,696 sample tweets, with over 74 percent accuracy. The results confirmed that the machine learning technique could significantly and practically assist with disaster management by dealing with crowdsourced data.

  2. Semi-Automated Approach for Mapping Urban Trees from Integrated Aerial LiDAR Point Cloud and Digital Imagery Datasets

    NASA Astrophysics Data System (ADS)

    Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-09-01

    Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.

  3. Maximum work extraction and implementation costs for nonequilibrium Maxwell's demons.

    PubMed

    Sandberg, Henrik; Delvenne, Jean-Charles; Newton, Nigel J; Mitter, Sanjoy K

    2014-10-01

    We determine the maximum amount of work extractable in finite time by a demon performing continuous measurements on a quadratic Hamiltonian system subjected to thermal fluctuations, in terms of the information extracted from the system. The maximum work demon is found to apply a high-gain continuous feedback involving a Kalman-Bucy estimate of the system state and operates in nonequilibrium. A simple and concrete electrical implementation of the feedback protocol is proposed, which allows for analytic expressions of the flows of energy, entropy, and information inside the demon. This let us show that any implementation of the demon must necessarily include an external power source, which we prove both from classical thermodynamics arguments and from a version of Landauer's memory erasure argument extended to nonequilibrium linear systems.

  4. A Review of Feature Extraction Software for Microarray Gene Expression Data

    PubMed Central

    Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini

    2014-01-01

    When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315

  5. Extracting Behaviorally Relevant Traits from Natural Stimuli: Benefits of Combinatorial Representations at the Accessory Olfactory Bulb

    PubMed Central

    Kahan, Anat; Ben-Shaul, Yoram

    2016-01-01

    For many animals, chemosensation is essential for guiding social behavior. However, because multiple factors can modulate levels of individual chemical cues, deriving information about other individuals via natural chemical stimuli involves considerable challenges. How social information is extracted despite these sources of variability is poorly understood. The vomeronasal system provides an excellent opportunity to study this topic due to its role in detecting socially relevant traits. Here, we focus on two such traits: a female mouse’s strain and reproductive state. In particular, we measure stimulus-induced neuronal activity in the accessory olfactory bulb (AOB) in response to various dilutions of urine, vaginal secretions, and saliva, from estrus and non-estrus female mice from two different strains. We first show that all tested secretions provide information about a female’s receptivity and genotype. Next, we investigate how these traits can be decoded from neuronal activity despite multiple sources of variability. We show that individual neurons are limited in their capacity to allow trait classification across multiple sources of variability. However, simple linear classifiers sampling neuronal activity from small neuronal ensembles can provide a substantial improvement over that attained with individual units. Furthermore, we show that some traits are more efficiently detected than others, and that particular secretions may be optimized for conveying information about specific traits. Across all tested stimulus sources, discrimination between strains is more accurate than discrimination of receptivity, and detection of receptivity is more accurate with vaginal secretions than with urine. Our findings highlight the challenges of chemosensory processing of natural stimuli, and suggest that downstream readout stages decode multiple behaviorally relevant traits by sampling information from distinct but overlapping populations of AOB neurons. PMID:26938460

  6. Extracting Behaviorally Relevant Traits from Natural Stimuli: Benefits of Combinatorial Representations at the Accessory Olfactory Bulb.

    PubMed

    Kahan, Anat; Ben-Shaul, Yoram

    2016-03-01

    For many animals, chemosensation is essential for guiding social behavior. However, because multiple factors can modulate levels of individual chemical cues, deriving information about other individuals via natural chemical stimuli involves considerable challenges. How social information is extracted despite these sources of variability is poorly understood. The vomeronasal system provides an excellent opportunity to study this topic due to its role in detecting socially relevant traits. Here, we focus on two such traits: a female mouse's strain and reproductive state. In particular, we measure stimulus-induced neuronal activity in the accessory olfactory bulb (AOB) in response to various dilutions of urine, vaginal secretions, and saliva, from estrus and non-estrus female mice from two different strains. We first show that all tested secretions provide information about a female's receptivity and genotype. Next, we investigate how these traits can be decoded from neuronal activity despite multiple sources of variability. We show that individual neurons are limited in their capacity to allow trait classification across multiple sources of variability. However, simple linear classifiers sampling neuronal activity from small neuronal ensembles can provide a substantial improvement over that attained with individual units. Furthermore, we show that some traits are more efficiently detected than others, and that particular secretions may be optimized for conveying information about specific traits. Across all tested stimulus sources, discrimination between strains is more accurate than discrimination of receptivity, and detection of receptivity is more accurate with vaginal secretions than with urine. Our findings highlight the challenges of chemosensory processing of natural stimuli, and suggest that downstream readout stages decode multiple behaviorally relevant traits by sampling information from distinct but overlapping populations of AOB neurons.

  7. Outcome-Focused Market Intelligence: Extracting Better Value and Effectiveness from Strategic Sourcing

    DTIC Science & Technology

    2013-04-01

    disseminating information are not systematically taught or developed in the government’s acquisition workforce. However, a study of 30 large firms ...to keep themselves abreast of changes in the marketplace, such as technological advances, process improvements, and available sources of supply. The...and performance measurement (Monczka & Petersen, 2008). Firms that develop supply management strategic plans typically set three-to-five year

  8. Search Analytics: Automated Learning, Analysis, and Search with Open Source

    NASA Astrophysics Data System (ADS)

    Hundman, K.; Mattmann, C. A.; Hyon, J.; Ramirez, P.

    2016-12-01

    The sheer volume of unstructured scientific data makes comprehensive human analysis impossible, resulting in missed opportunities to identify relationships, trends, gaps, and outliers. As the open source community continues to grow, tools like Apache Tika, Apache Solr, Stanford's DeepDive, and Data-Driven Documents (D3) can help address this challenge. With a focus on journal publications and conference abstracts often in the form of PDF and Microsoft Office documents, we've initiated an exploratory NASA Advanced Concepts project aiming to use the aforementioned open source text analytics tools to build a data-driven justification for the HyspIRI Decadal Survey mission. We call this capability Search Analytics, and it fuses and augments these open source tools to enable the automatic discovery and extraction of salient information. In the case of HyspIRI, a hyperspectral infrared imager mission, key findings resulted from the extractions and visualizations of relationships from thousands of unstructured scientific documents. The relationships include links between satellites (e.g. Landsat 8), domain-specific measurements (e.g. spectral coverage) and subjects (e.g. invasive species). Using the above open source tools, Search Analytics mined and characterized a corpus of information that would be infeasible for a human to process. More broadly, Search Analytics offers insights into various scientific and commercial applications enabled through missions and instrumentation with specific technical capabilities. For example, the following phrases were extracted in close proximity within a publication: "In this study, hyperspectral images…with high spatial resolution (1 m) were analyzed to detect cutleaf teasel in two areas. …Classification of cutleaf teasel reached a users accuracy of 82 to 84%." Without reading a single paper we can use Search Analytics to automatically identify that a 1 m spatial resolution provides a cutleaf teasel detection users accuracy of 82-84%, which could have tangible, direct downstream implications for crop protection. Automatically assimilating this information expedites and supplements human analysis, and, ultimately, Search Analytics and its foundation of open source tools will result in more efficient scientific investment and research.

  9. Integration and Beyond

    PubMed Central

    Stead, William W.; Miller, Randolph A.; Musen, Mark A.; Hersh, William R.

    2000-01-01

    The vision of integrating information—from a variety of sources, into the way people work, to improve decisions and process—is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of “integration” projects. First-generation projects create a database and use it for multiple purposes. Second-generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction. PMID:10730596

  10. Preview of the BATSE Earth Occultation Catalog of Low Energy Gamma Ray Sources

    NASA Technical Reports Server (NTRS)

    Harmon, B. A.; Wilson, C. A.; Fishman, G. J.; McCollough, M. L.; Robinson, C. R.; Sahi, M.; Paciesas, W. S.; Zhang, S. N.

    1999-01-01

    The Burst and Transient Source Experiment (BATSE) aboard the Compton Gamma Ray Observatory (CGRO) has been detecting and monitoring point sources in the high energy sky since 1991. Although BATSE is best known for gamma ray bursts, it also monitors the sky for longer-lived sources of radiation. Using the Earth occultation technique to extract flux information, a catalog is being prepared of about 150 sources potential emission in the large area detectors (20-1000 keV). The catalog will contain light curves, representative spectra, and parametric data for black hole and neutron star binaries, active galaxies, and super-nova remnants. In this preview, we present light curves for persistent and transient sources, and also show examples of what type of information can be obtained from the BATSE Earth occultation database. Options for making the data easily accessible as an "on line" WWW document are being explored.

  11. How Information Literate Are Junior and Senior Class Biology Students?

    NASA Astrophysics Data System (ADS)

    Schiffl, Iris

    2018-03-01

    Information literacy—i.e. obtaining, evaluating and using information—is a key element of scientific literacy. However, students are frequently equipped with poor information literacy skills—even at university level—as information literacy is often not explicitly taught in schools. Little is known about students' information skills in science at junior and senior class level, and about teachers' competences in dealing with information literacy in science class. This study examines the information literacy of Austrian 8th, 10th and 12th grade students. Information literacy is important for science education in Austria, because it is listed as a basic competence in Austria's science standards. Two different aspects of information literacy are examined: obtaining information and extracting information from texts. An additional research focus of this study is teachers' competences in diagnosing information skills. The results reveal that students mostly rely on online sources for obtaining information. However, they also use books and consult with people they trust. The younger the students, the more they rely on personal sources. Students' abilities to evaluate sources are poor, especially among younger students. Although teachers claim to use information research in class, their ability to assess their students' information competences is limited.

  12. A method for automatically extracting infectious disease-related primers and probes from the literature

    PubMed Central

    2010-01-01

    Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041

  13. Three dimensional global modeling of atmospheric CO2

    NASA Technical Reports Server (NTRS)

    Fung, I.; Hansen, J.; Rind, D.

    1983-01-01

    A model was developed to study the prospects of extracting information on carbon dioxide sources and sinks from observed CO2 variations. The approach uses a three dimensional global transport model, based on winds from a 3-D general circulation model (GCM), to advect CO2 noninteractively, i.e., as a tracer, with specified sources and sinks of CO2 at the surface. The 3-D model employed is identified and biosphere, ocean and fossil fuel sources and sinks are discussed. Some preliminary model results are presented.

  14. Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

    PubMed

    Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

    2012-10-01

    In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.

  15. Development of Hospital-based Data Sets as a Vehicle for Implementation of a National Electronic Health Record

    PubMed Central

    Keikha, Leila; Farajollah, Seyede Sedigheh Seied; Safdari, Reza; Ghazisaeedi, Marjan; Mohammadzadeh, Niloofar

    2018-01-01

    Background In developing countries such as Iran, international standards offer good sources to survey and use for appropriate planning in the domain of electronic health records (EHRs). Therefore, in this study, HL7 and ASTM standards were considered as the main sources from which to extract EHR data. Objective The objective of this study was to propose a hospital data set for a national EHR consisting of data classes and data elements by adjusting data sets extracted from the standards and paper-based records. Method This comparative study was carried out in 2017 by studying the contents of the paper-based records approved by the health ministry in Iran and the international ASTM and HL7 standards in order to extract a minimum hospital data set for a national EHR. Results As a result of studying the standards and paper-based records, a total of 526 data elements in 174 classes were extracted. An examination of the data indicated that the highest number of extracted data came from the free text elements, both in the paper-based records and in the standards related to the administrative data. The major sources of data extracted from ASTM and HL7 were the E1384 and Hl7V.x standards, respectively. In the paper-based records, data were extracted from 19 forms sporadically. Discussion By declaring the confidentiality of information, the ASTM standards acknowledge the issue of confidentiality of information as one of the main challenges of EHR development, and propose new types of admission, such as teleconference, tele-video, and home visit, which are inevitable with the advent of new technology for providing healthcare and treating diseases. Data related to finance and insurance, which were scattered in different categories by three organizations, emerged as the financial category. Documenting the role and responsibility of the provider by adding the authenticator/signature data element was deemed essential. Conclusion Not only using well-defined and standardized data, but also adapting EHR systems to the local facilities and the existing social and cultural conditions, will facilitate the development of structured data sets. PMID:29618962

  16. Development of Hospital-based Data Sets as a Vehicle for Implementation of a National Electronic Health Record.

    PubMed

    Keikha, Leila; Farajollah, Seyede Sedigheh Seied; Safdari, Reza; Ghazisaeedi, Marjan; Mohammadzadeh, Niloofar

    2018-01-01

    In developing countries such as Iran, international standards offer good sources to survey and use for appropriate planning in the domain of electronic health records (EHRs). Therefore, in this study, HL7 and ASTM standards were considered as the main sources from which to extract EHR data. The objective of this study was to propose a hospital data set for a national EHR consisting of data classes and data elements by adjusting data sets extracted from the standards and paper-based records. This comparative study was carried out in 2017 by studying the contents of the paper-based records approved by the health ministry in Iran and the international ASTM and HL7 standards in order to extract a minimum hospital data set for a national EHR. As a result of studying the standards and paper-based records, a total of 526 data elements in 174 classes were extracted. An examination of the data indicated that the highest number of extracted data came from the free text elements, both in the paper-based records and in the standards related to the administrative data. The major sources of data extracted from ASTM and HL7 were the E1384 and Hl7V.x standards, respectively. In the paper-based records, data were extracted from 19 forms sporadically. By declaring the confidentiality of information, the ASTM standards acknowledge the issue of confidentiality of information as one of the main challenges of EHR development, and propose new types of admission, such as teleconference, tele-video, and home visit, which are inevitable with the advent of new technology for providing healthcare and treating diseases. Data related to finance and insurance, which were scattered in different categories by three organizations, emerged as the financial category. Documenting the role and responsibility of the provider by adding the authenticator/signature data element was deemed essential. Not only using well-defined and standardized data, but also adapting EHR systems to the local facilities and the existing social and cultural conditions, will facilitate the development of structured data sets.

  17. DEXTER: Disease-Expression Relation Extraction from Text.

    PubMed

    Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K

    2018-01-01

    Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.

  18. Natural Antioxidants in Foods and Medicinal Plants: Extraction, Assessment and Resources

    PubMed Central

    Xu, Dong-Ping; Li, Ya; Meng, Xiao; Zhou, Tong; Zhou, Yue; Zheng, Jie; Zhang, Jiao-Jiao; Li, Hua-Bin

    2017-01-01

    Natural antioxidants are widely distributed in food and medicinal plants. These natural antioxidants, especially polyphenols and carotenoids, exhibit a wide range of biological effects, including anti-inflammatory, anti-aging, anti-atherosclerosis and anticancer. The effective extraction and proper assessment of antioxidants from food and medicinal plants are crucial to explore the potential antioxidant sources and promote the application in functional foods, pharmaceuticals and food additives. The present paper provides comprehensive information on the green extraction technologies of natural antioxidants, assessment of antioxidant activity at chemical and cellular based levels and their main resources from food and medicinal plants. PMID:28067795

  19. Natural Antioxidants in Foods and Medicinal Plants: Extraction, Assessment and Resources.

    PubMed

    Xu, Dong-Ping; Li, Ya; Meng, Xiao; Zhou, Tong; Zhou, Yue; Zheng, Jie; Zhang, Jiao-Jiao; Li, Hua-Bin

    2017-01-05

    Natural antioxidants are widely distributed in food and medicinal plants. These natural antioxidants, especially polyphenols and carotenoids, exhibit a wide range of biological effects, including anti-inflammatory, anti-aging, anti-atherosclerosis and anticancer. The effective extraction and proper assessment of antioxidants from food and medicinal plants are crucial to explore the potential antioxidant sources and promote the application in functional foods, pharmaceuticals and food additives. The present paper provides comprehensive information on the green extraction technologies of natural antioxidants, assessment of antioxidant activity at chemical and cellular based levels and their main resources from food and medicinal plants.

  20. EEG source space analysis of the supervised factor analytic approach for the classification of multi-directional arm movement

    NASA Astrophysics Data System (ADS)

    Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai

    2017-08-01

    Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.

  1. MRMer, an interactive open source and cross-platform system for data extraction and visualization of multiple reaction monitoring experiments.

    PubMed

    Martin, Daniel B; Holzman, Ted; May, Damon; Peterson, Amelia; Eastham, Ashley; Eng, Jimmy; McIntosh, Martin

    2008-11-01

    Multiple reaction monitoring (MRM) mass spectrometry identifies and quantifies specific peptides in a complex mixture with very high sensitivity and speed and thus has promise for the high throughput screening of clinical samples for candidate biomarkers. We have developed an interactive software platform, called MRMer, for managing highly complex MRM-MS experiments, including quantitative analyses using heavy/light isotopic peptide pairs. MRMer parses and extracts information from MS files encoded in the platform-independent mzXML data format. It extracts and infers precursor-product ion transition pairings, computes integrated ion intensities, and permits rapid visual curation for analyses exceeding 1000 precursor-product pairs. Results can be easily output for quantitative comparison of consecutive runs. Additionally MRMer incorporates features that permit the quantitative analysis experiments including heavy and light isotopic peptide pairs. MRMer is open source and provided under the Apache 2.0 license.

  2. The Use of Empirical Data Sources in HRA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bruce Hallbert; David Gertman; Julie Marble

    This paper presents a review of available information related to human performance to support Human Reliability Analysis (HRA) performed for nuclear power plants (NPPs). A number of data sources are identified as potentially useful. These include NPP licensee event reports (LERs), augmented inspection team (AIT) reports, operator requalification data, results from the literature in experimental psychology, and the Aviation Safety Reporting System (ASRSs). The paper discusses how utilizing such information improves our capability to model and quantify human performance. In particular the paper discusses how information related to performance shaping factors (PSFs) can be extracted from empirical data to determinemore » their size effect, their relative effects, as well as their interactions. The paper concludes that appropriate use of existing sources can help addressing some of the important issues we are currently facing in HRA.« less

  3. Information Extraction from Multiple Syntactic Sources

    DTIC Science & Technology

    2004-05-01

    Performance of SVM and KNN (k=3) on different kernel setups. Types are ordered in decreasing order of frequency of occurrence in the ACE corpus. For SVM, the...name. But it is not easy to recognize “A Real New York Bargain” as a company name. In other languages or transcripts of English speech where...symbolic rules for extraction of posted computer jobs. It only assumed simple syntactic preprocessing such as tokeniza- tion and Part-of- Speech tagging

  4. Fractal Complexity-Based Feature Extraction Algorithm of Communication Signals

    NASA Astrophysics Data System (ADS)

    Wang, Hui; Li, Jingchao; Guo, Lili; Dou, Zheng; Lin, Yun; Zhou, Ruolin

    How to analyze and identify the characteristics of radiation sources and estimate the threat level by means of detecting, intercepting and locating has been the central issue of electronic support in the electronic warfare, and communication signal recognition is one of the key points to solve this issue. Aiming at accurately extracting the individual characteristics of the radiation source for the increasingly complex communication electromagnetic environment, a novel feature extraction algorithm for individual characteristics of the communication radiation source based on the fractal complexity of the signal is proposed. According to the complexity of the received signal and the situation of environmental noise, use the fractal dimension characteristics of different complexity to depict the subtle characteristics of the signal to establish the characteristic database, and then identify different broadcasting station by gray relation theory system. The simulation results demonstrate that the algorithm can achieve recognition rate of 94% even in the environment with SNR of -10dB, and this provides an important theoretical basis for the accurate identification of the subtle features of the signal at low SNR in the field of information confrontation.

  5. Utility of linking primary care electronic medical records with Canadian census data to study the determinants of chronic disease: an example based on socioeconomic status and obesity.

    PubMed

    Biro, Suzanne; Williamson, Tyler; Leggett, Jannet Ann; Barber, David; Morkem, Rachael; Moore, Kieran; Belanger, Paul; Mosley, Brian; Janssen, Ian

    2016-03-11

    Electronic medical records (EMRs) used in primary care contain a breadth of data that can be used in public health research. Patient data from EMRs could be linked with other data sources, such as a postal code linkage with Census data, to obtain additional information on environmental determinants of health. While promising, successful linkages between primary care EMRs with geographic measures is limited due to ethics review board concerns. This study tested the feasibility of extracting full postal code from primary care EMRs and linking this with area-level measures of the environment to demonstrate how such a linkage could be used to examine the determinants of disease. The association between obesity and area-level deprivation was used as an example to illustrate inequalities of obesity in adults. The analysis included EMRs of 7153 patients aged 20 years and older who visited a single, primary care site in 2011. Extracted patient information included demographics (date of birth, sex, postal code) and weight status (height, weight). Information extraction and management procedures were designed to mitigate the risk of individual re-identification when extracting full postal code from source EMRs. Based on patients' postal codes, area-based deprivation indexes were created using the smallest area unit used in Canadian censuses. Descriptive statistics and socioeconomic disparity summary measures of linked census and adult patients were calculated. The data extraction of full postal code met technological requirements for rendering health information extracted from local EMRs into anonymized data. The prevalence of obesity was 31.6 %. There was variation of obesity between deprivation quintiles; adults in the most deprived areas were 35 % more likely to be obese compared with adults in the least deprived areas (Chi-Square = 20.24(1), p < 0.0001). Maps depicting spatial representation of regional deprivation and obesity were created to highlight high risk areas. An area based socio-economic measure was linked with EMR-derived objective measures of height and weight to show a positive association between area-level deprivation and obesity. The linked dataset demonstrates a promising model for assessing health disparities and ecological factors associated with the development of chronic diseases with far reaching implications for informing public health and primary health care interventions and services.

  6. Big Data: The Future of Biocuration

    USDA-ARS?s Scientific Manuscript database

    This report is a white paper that describes the roles of biocurators in linking biologists to their data. Roles include: to extract knowledge from published papers; to connect information from different sources in a coherent and comprehensible way; to inspect and correct automatically predicted gene...

  7. Describing knowledge encounters in healthcare: a mixed studies systematic review and development of a classification.

    PubMed

    Hurst, Dominic; Mickan, Sharon

    2017-03-14

    Implementation science seeks to promote the uptake of research and other evidence-based findings into practice, but for healthcare professionals, this is complex as practice draws on, in addition to scientific principles, rules of thumb and a store of practical wisdom acquired from a range of informational and experiential sources. The aims of this review were to identify sources of information and professional experiences encountered by healthcare workers and from this to build a classification system, for use in future observational studies, that describes influences on how healthcare professionals acquire and use information in their clinical practice. This was a mixed studies systematic review of observational studies. OVID MEDLINE and Embase and Google Scholar were searched using terms around information, knowledge or evidence and sharing, searching and utilisation combined with terms relating to healthcare groups. Studies were eligible if one of the intentions was to identify information or experiential encounters by healthcare workers. Data was extracted by one author after piloting with another. Studies were assessed using the Mixed Methods Appraisal Tool (MMAT). The primary outcome extracted was the information source or professional experience encounter. Similar encounters were grouped together as single constructs. Our synthesis involved a mixed approach using the top-down logic of the Bliss Bibliographic Classification System (BC2) to generate classification categories and a bottom-up approach to develop descriptive codes (or "facets") for each category, from the data. The generic terms of BC2 were customised by an iterative process of thematic content analysis. Facets were developed by using available theory and keeping in mind the pragmatic end use of the classification. Eighty studies were included from which 178 discreet knowledge encounters were extracted. Six classification categories were developed: what information or experience was encountered; how was the information or experience encountered; what was the mode of encounter; from whom did the information originate or with whom was the experience; how many participants were there; and where did the encounter take place. For each of these categories, relevant descriptive facets were identified. We have sought to identify and classify all knowledge encounters, and we have developed a faceted description of key categories which will support richer descriptions and interrogations of knowledge encounters in healthcare research.

  8. Automatic updating and 3D modeling of airport information from high resolution images using GIS and LIDAR data

    NASA Astrophysics Data System (ADS)

    Lv, Zheng; Sui, Haigang; Zhang, Xilin; Huang, Xianfeng

    2007-11-01

    As one of the most important geo-spatial objects and military establishment, airport is always a key target in fields of transportation and military affairs. Therefore, automatic recognition and extraction of airport from remote sensing images is very important and urgent for updating of civil aviation and military application. In this paper, a new multi-source data fusion approach on automatic airport information extraction, updating and 3D modeling is addressed. Corresponding key technologies including feature extraction of airport information based on a modified Ostu algorithm, automatic change detection based on new parallel lines-based buffer detection algorithm, 3D modeling based on gradual elimination of non-building points algorithm, 3D change detecting between old airport model and LIDAR data, typical CAD models imported and so on are discussed in detail. At last, based on these technologies, we develop a prototype system and the results show our method can achieve good effects.

  9. Platform development for merging various information sources for water management: methodological, technical and operational aspects

    NASA Astrophysics Data System (ADS)

    Galvao, Diogo

    2013-04-01

    As a result of various economic, social and environmental factors, we can all experience the increase in importance of water resources at a global scale. As a consequence, we can also notice the increasing need of methods and systems capable of efficiently managing and combining the rich and heterogeneous data available that concerns, directly or indirectly, these water resources, such as in-situ monitoring station data, Earth Observation images and measurements, Meteorological modeling forecasts and Hydrological modeling. Under the scope of the MyWater project, we developed a water management system capable of satisfying just such needs, under a flexible platform capable of accommodating future challenges, not only in terms of sources of data but also on applicable models to extract information from it. From a methodological point of view, the MyWater platform obtains data from distinct sources, and in distinct formats, be they Satellite images or meteorological model forecasts, transforms and combines them in ways that allow them to be fed to a variety of hydrological models (such as MOHID Land, SIMGRO, etc…), which themselves can also be combined, using such approaches as those advocated by the OpenMI standard, to extract information in an automated and time efficient manner. Such an approach brings its own deal of challenges, and further research was developed under this project on the best ways to combine such data and on novel approaches to hydrological modeling (like the PriceXD model). From a technical point of view, the MyWater platform is structured according to a classical SOA architecture, with a flexible object oriented modular backend service responsible for all the model process management and data treatment, while the information extracted can be interacted with using a variety of frontends, from a web portal, including also a desktop client, down to mobile phone and tablet applications. From an operational point of view, a user can not only see these model results on graphically rich user interfaces, but also interact with them in ways that allows them to extract their own information. This platform was then applied to a variety of case studies in such countries as the Netherlands, Greece, Portugal, Brazil and Africa, to verify the practicality, accuracy and value that it brings to end users and stakeholders.

  10. Developing a hybrid dictionary-based bio-entity recognition technique.

    PubMed

    Song, Min; Yu, Hwanjo; Han, Wook-Shin

    2015-01-01

    Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.

  11. Developing a hybrid dictionary-based bio-entity recognition technique

    PubMed Central

    2015-01-01

    Background Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. Methods This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. Results The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. Conclusions The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall. PMID:26043907

  12. Presentation video retrieval using automatically recovered slide and spoken text

    NASA Astrophysics Data System (ADS)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  13. Detecting the red tide based on remote sensing data in optically complex East China Sea

    NASA Astrophysics Data System (ADS)

    Xu, Xiaohui; Pan, Delu; Mao, Zhihua; Tao, Bangyi; Liu, Qiong

    2012-09-01

    Red tide not only destroys marine fishery production, deteriorates the marine environment, affects coastal tourist industry, but also causes human poison, even death by eating toxic seafood contaminated by red tide organisms. Remote sensing technology has the characteristics of large-scale, synchronized, rapid monitoring, so it is one of the most important and most effective means of red tide monitoring. This paper selects the high frequency red tides areas of the East China Sea as study area, MODIS/Aqua L2 data as the data source, analysis and compares the spectral differences in the red tide water bodies and non-red tide water bodies of many historical events. Based on the spectral differences, this paper develops the algorithm of Rrs555/Rrs488> 1.5 to extract the red tide information. Apply the algorithm on red tide event happened in the East China Sea on May 28, 2009 to extract the information of red tide, and found that the method can determine effectively the location of the occurrence of red tide; there is a good corresponding relationship between red tide extraction result and chlorophyll a concentration extracted by remote sensing, shows that these algorithm can determine effectively the location and extract the red tide information.

  14. Topographic Information Requirements and Computer-Graphic Display Techniques for Nap-of-the-Earth Flight.

    DTIC Science & Technology

    1979-12-01

    required of the Army aviator. The successful accomplishment of many of these activities depends upon the aviator’s ability to extract information from maps...Cruise NOE VBI Determine Position VB2 Crew Coordination (Topographic) VB3 Radio Communication VI . TERM4INATION C. Post-Flight VIC1 Debriefing 11LA 1I...NOE FUNCTION: VBI DETERMINE POSITION INFORMATION REQUIREMENT SPECIFICS SOURCE COMMENTS See Function IIIAl ! FUNCTION: VB2 CREW COORDINATION

  15. Localizing the sources of two independent noises: Role of time varying amplitude differences

    PubMed Central

    Yost, William A.; Brown, Christopher A.

    2013-01-01

    Listeners localized the free-field sources of either one or two simultaneous and independently generated noise bursts. Listeners' localization performance was better when localizing one rather than two sound sources. With two sound sources, localization performance was better when the listener was provided prior information about the location of one of them. Listeners also localized two simultaneous noise bursts that had sinusoidal amplitude modulation (AM) applied, in which the modulation envelope was in-phase across the two source locations or was 180° out-of-phase. The AM was employed to investigate a hypothesis as to what process listeners might use to localize multiple sound sources. The results supported the hypothesis that localization of two sound sources might be based on temporal-spectral regions of the combined waveform in which the sound from one source was more intense than that from the other source. The interaural information extracted from such temporal-spectral regions might provide reliable estimates of the sound source location that produced the more intense sound in that temporal-spectral region. PMID:23556597

  16. Localizing the sources of two independent noises: role of time varying amplitude differences.

    PubMed

    Yost, William A; Brown, Christopher A

    2013-04-01

    Listeners localized the free-field sources of either one or two simultaneous and independently generated noise bursts. Listeners' localization performance was better when localizing one rather than two sound sources. With two sound sources, localization performance was better when the listener was provided prior information about the location of one of them. Listeners also localized two simultaneous noise bursts that had sinusoidal amplitude modulation (AM) applied, in which the modulation envelope was in-phase across the two source locations or was 180° out-of-phase. The AM was employed to investigate a hypothesis as to what process listeners might use to localize multiple sound sources. The results supported the hypothesis that localization of two sound sources might be based on temporal-spectral regions of the combined waveform in which the sound from one source was more intense than that from the other source. The interaural information extracted from such temporal-spectral regions might provide reliable estimates of the sound source location that produced the more intense sound in that temporal-spectral region.

  17. Semantic Location Extraction from Crowdsourced Data

    NASA Astrophysics Data System (ADS)

    Koswatte, S.; Mcdougall, K.; Liu, X.

    2016-06-01

    Crowdsourced Data (CSD) has recently received increased attention in many application areas including disaster management. Convenience of production and use, data currency and abundancy are some of the key reasons for attracting this high interest. Conversely, quality issues like incompleteness, credibility and relevancy prevent the direct use of such data in important applications like disaster management. Moreover, location information availability of CSD is problematic as it remains very low in many crowd sourced platforms such as Twitter. Also, this recorded location is mostly related to the mobile device or user location and often does not represent the event location. In CSD, event location is discussed descriptively in the comments in addition to the recorded location (which is generated by means of mobile device's GPS or mobile communication network). This study attempts to semantically extract the CSD location information with the help of an ontological Gazetteer and other available resources. 2011 Queensland flood tweets and Ushahidi Crowd Map data were semantically analysed to extract the location information with the support of Queensland Gazetteer which is converted to an ontological gazetteer and a global gazetteer. Some preliminary results show that the use of ontologies and semantics can improve the accuracy of place name identification of CSD and the process of location information extraction.

  18. Reporting of sources of funding in systematic reviews in periodontology and implant dentistry.

    PubMed

    Faggion, C M; Atieh, M; Zanicotti, D G

    2014-02-01

    Industry-supported clinical trials may present better outcomes than those supported by other sources. The aim of this paper was to assess whether systematic reviews (SRs) published in periodontology and implant dentistry report and discuss the influence of funding sources on study results. Two reviewers conducted a comprehensive search in PubMed and the Cochrane Database of Systematic Reviews independently and in duplicate to identify SRs published up to 11 November 2012. Speciality dental journals and the reference lists of included SRs were also scrutinised. Information on the reporting and discussion of funding sources of primary studies included in the SRs was extracted independently and in duplicate. Any disagreement regarding SR selection or data extraction was discussed until consensus was achieved. Of 146 SRs included in the assessment, only 45 (31%) reported the funding sources of primary studies. Fourteen (10%) SRs discussed the potential influence of funding sources on study results, that is, sponsorship bias. Funding sources are inadequately reported and discussed in SRs in periodontology and implant dentistry. Assessment, reporting, and critical appraisal of potential sponsorship bias of meta-analytic estimates are paramount to provide proper guidance for clinical treatments.

  19. Collaborative human-machine analysis to disambiguate entities in unstructured text and structured datasets

    NASA Astrophysics Data System (ADS)

    Davenport, Jack H.

    2016-05-01

    Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.

  20. Integration of Remotely Sensed Data Into Geospatial Reference Information Databases. Un-Ggim National Approach

    NASA Astrophysics Data System (ADS)

    Arozarena, A.; Villa, G.; Valcárcel, N.; Pérez, B.

    2016-06-01

    Remote sensing satellites, together with aerial and terrestrial platforms (mobile and fixed), produce nowadays huge amounts of data coming from a wide variety of sensors. These datasets serve as main data sources for the extraction of Geospatial Reference Information (GRI), constituting the "skeleton" of any Spatial Data Infrastructure (SDI). Since very different situations can be found around the world in terms of geographic information production and management, the generation of global GRI datasets seems extremely challenging. Remotely sensed data, due to its wide availability nowadays, is able to provide fundamental sources for any production or management system present in different countries. After several automatic and semiautomatic processes including ancillary data, the extracted geospatial information is ready to become part of the GRI databases. In order to optimize these data flows for the production of high quality geospatial information and to promote its use to address global challenges several initiatives at national, continental and global levels have been put in place, such as European INSPIRE initiative and Copernicus Programme, and global initiatives such as the Group on Earth Observation/Global Earth Observation System of Systems (GEO/GEOSS) and United Nations Global Geospatial Information Management (UN-GGIM). These workflows are established mainly by public organizations, with the adequate institutional arrangements at national, regional or global levels. Other initiatives, such as Volunteered Geographic Information (VGI), on the other hand may contribute to maintain the GRI databases updated. Remotely sensed data hence becomes one of the main pillars underpinning the establishment of a global SDI, as those datasets will be used by public agencies or institutions as well as by volunteers to extract the required spatial information that in turn will feed the GRI databases. This paper intends to provide an example of how institutional arrangements and cooperative production systems can be set up at any territorial level in order to exploit remotely sensed data in the most intensive manner, taking advantage of all its potential.

  1. Standardized data sharing in a paediatric oncology research network--a proof-of-concept study.

    PubMed

    Hochedlinger, Nina; Nitzlnader, Michael; Falgenhauer, Markus; Welte, Stefan; Hayn, Dieter; Koumakis, Lefteris; Potamias, George; Tsiknakis, Manolis; Saraceno, Davide; Rinaldi, Eugenia; Ladenstein, Ruth; Schreier, Günter

    2015-01-01

    Data that has been collected in the course of clinical trials are potentially valuable for additional scientific research questions in so called secondary use scenarios. This is of particular importance in rare disease areas like paediatric oncology. If data from several research projects need to be connected, so called Core Datasets can be used to define which information needs to be extracted from every involved source system. In this work, the utility of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) as a format for Core Datasets was evaluated and a web tool was developed which received Source ODM XML files and--via Extensible Stylesheet Language Transformation (XSLT)--generated standardized Core Dataset ODM XML files. Using this tool, data from different source systems were extracted and pooled for joined analysis in a proof-of-concept study, facilitating both, basic syntactic and semantic interoperability.

  2. Text-in-Context: A Method for Extracting Findings in Mixed-Methods Mixed Research Synthesis Studies

    PubMed Central

    Leeman, Jennifer; Knafl, Kathleen; Crandell, Jamie L.

    2012-01-01

    Aim Our purpose in this paper is to propose a new method for extracting findings from research reports included in mixed-methods mixed research synthesis studies. Background International initiatives in the domains of systematic review and evidence synthesis have been focused on broadening the conceptualization of evidence, increased methodological inclusiveness and the production of evidence syntheses that will be accessible to and usable by a wider range of consumers. Initiatives in the general mixed-methods research field have been focused on developing truly integrative approaches to data analysis and interpretation. Data source The data extraction challenges described here were encountered and the method proposed for addressing these challenges was developed, in the first year of the ongoing (2011–2016) study: Mixed-Methods Synthesis of Research on Childhood Chronic Conditions and Family. Discussion To preserve the text-in-context of findings in research reports, we describe a method whereby findings are transformed into portable statements that anchor results to relevant information about sample, source of information, time, comparative reference point, magnitude and significance and study-specific conceptions of phenomena. Implications for nursing The data extraction method featured here was developed specifically to accommodate mixed-methods mixed research synthesis studies conducted in nursing and other health sciences, but reviewers might find it useful in other kinds of research synthesis studies. Conclusion This data extraction method itself constitutes a type of integration to preserve the methodological context of findings when statements are read individually and in comparison to each other. PMID:22924808

  3. Clustering header categories extracted from web tables

    NASA Astrophysics Data System (ADS)

    Nagy, George; Embley, David W.; Krishnamoorthy, Mukkai; Seth, Sharad

    2015-01-01

    Revealing related content among heterogeneous web tables is part of our long term objective of formulating queries over multiple sources of information. Two hundred HTML tables from institutional web sites are segmented and each table cell is classified according to the fundamental indexing property of row and column headers. The categories that correspond to the multi-dimensional data cube view of a table are extracted by factoring the (often multi-row/column) headers. To reveal commonalities between tables from diverse sources, the Jaccard distances between pairs of category headers (and also table titles) are computed. We show how about one third of our heterogeneous collection can be clustered into a dozen groups that exhibit table-title and header similarities that can be exploited for queries.

  4. Kudi: A free open-source python library for the analysis of properties along reaction paths.

    PubMed

    Vogt-Geisse, Stefan

    2016-05-01

    With increasing computational capabilities, an ever growing amount of data is generated in computational chemistry that contains a vast amount of chemically relevant information. It is therefore imperative to create new computational tools in order to process and extract this data in a sensible way. Kudi is an open source library that aids in the extraction of chemical properties from reaction paths. The straightforward structure of Kudi makes it easy to use for users and allows for effortless implementation of new capabilities, and extension to any quantum chemistry package. A use case for Kudi is shown for the tautomerization reaction of formic acid. Kudi is available free of charge at www.github.com/stvogt/kudi.

  5. Preliminary results concerning the simulation of beam profiles from extracted ion current distributions for mini-STRIKE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Agostinetti, P., E-mail: piero.agostinetti@igi.cnr.it; Serianni, G.; Veltri, P.

    The Radio Frequency (RF) negative hydrogen ion source prototype has been chosen for the ITER neutral beam injectors due to its optimal performances and easier maintenance demonstrated at Max-Planck-Institut für Plasmaphysik, Garching in hydrogen and deuterium. One of the key information to better understand the operating behavior of the RF ion sources is the extracted negative ion current density distribution. This distribution—influenced by several factors like source geometry, particle drifts inside the source, cesium distribution, and layout of cesium ovens—is not straightforward to be evaluated. The main outcome of the present contribution is the development of a minimization method tomore » estimate the extracted current distribution using the footprint of the beam recorded with mini-STRIKE (Short-Time Retractable Instrumented Kalorimeter). To accomplish this, a series of four computational models have been set up, where the output of a model is the input of the following one. These models compute the optics of the ion beam, evaluate the distribution of the heat deposited on the mini-STRIKE diagnostic calorimeter, and finally give an estimate of the temperature distribution on the back of mini-STRIKE. Several iterations with different extracted current profiles are necessary to give an estimate of the profile most compatible with the experimental data. A first test of the application of the method to the BAvarian Test Machine for Negative ions beam is given.« less

  6. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

    PubMed Central

    Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

    2016-01-01

    Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. PMID:26911818

  7. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.

    PubMed

    Tahsin, Tasnia; Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

    2016-09-01

    The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. [Application of hyper-spectral remote sensing technology in environmental protection].

    PubMed

    Zhao, Shao-Hua; Zhang, Feng; Wang, Qiao; Yao, Yun-Jun; Wang, Zhong-Ting; You, Dai-An

    2013-12-01

    Hyper-spectral remote sensing (RS) technology has been widely used in environmental protection. The present work introduces its recent application in the RS monitoring of pollution gas, green-house gas, algal bloom, water quality of catch water environment, safety of drinking water sources, biodiversity, vegetation classification, soil pollution, and so on. Finally, issues such as scarce hyper-spectral satellites, the limits of data processing and information extract are related. Some proposals are also presented, including developing subsequent satellites of HJ-1 satellite with differential optical absorption spectroscopy, greenhouse gas spectroscopy and hyper-spectral imager, strengthening the study of hyper-spectral data processing and information extraction, and promoting the construction of environmental application system.

  9. Fusion of Remote Sensing Methods, UAV Photogrammetry and LiDAR Scanning products for monitoring fluvial dynamics

    NASA Astrophysics Data System (ADS)

    Lendzioch, Theodora; Langhammer, Jakub; Hartvich, Filip

    2015-04-01

    Fusion of remote sensing data is a common and rapidly developing discipline, which combines data from multiple sources with different spatial and spectral resolution, from satellite sensors, aircraft and ground platforms. Fusion data contains more detailed information than each of the source and enhances the interpretation performance and accuracy of the source data and produces a high-quality visualisation of the final data. Especially, in fluvial geomorphology it is essential to get valuable images in sub-meter resolution to obtain high quality 2D and 3D information for a detailed identification, extraction and description of channel features of different river regimes and to perform a rapid mapping of changes in river topography. In order to design, test and evaluate a new approach for detection of river morphology, we combine different research techniques from remote sensing products to drone-based photogrammetry and LiDAR products (aerial LiDAR Scanner and TLS). Topographic information (e.g. changes in river channel morphology, surface roughness, evaluation of floodplain inundation, mapping gravel bars and slope characteristics) will be extracted either from one single layer or from combined layers in accordance to detect fluvial topographic changes before and after flood events. Besides statistical approaches for predictive geomorphological mapping and the determination of errors and uncertainties of the data, we will also provide 3D modelling of small fluvial features.

  10. HELP: XID+, the probabilistic de-blender for Herschel SPIRE maps

    NASA Astrophysics Data System (ADS)

    Hurley, P. D.; Oliver, S.; Betancourt, M.; Clarke, C.; Cowley, W. I.; Duivenvoorden, S.; Farrah, D.; Griffin, M.; Lacey, C.; Le Floc'h, E.; Papadopoulos, A.; Sargent, M.; Scudder, J. M.; Vaccari, M.; Valtchanov, I.; Wang, L.

    2017-01-01

    We have developed a new prior-based source extraction tool, XID+, to carry out photometry in the Herschel SPIRE (Spectral and Photometric Imaging Receiver) maps at the positions of known sources. XID+ is developed using a probabilistic Bayesian framework that provides a natural framework in which to include prior information, and uses the Bayesian inference tool Stan to obtain the full posterior probability distribution on flux estimates. In this paper, we discuss the details of XID+ and demonstrate the basic capabilities and performance by running it on simulated SPIRE maps resembling the COSMOS field, and comparing to the current prior-based source extraction tool DESPHOT. Not only we show that XID+ performs better on metrics such as flux accuracy and flux uncertainty accuracy, but we also illustrate how obtaining the posterior probability distribution can help overcome some of the issues inherent with maximum-likelihood-based source extraction routines. We run XID+ on the COSMOS SPIRE maps from Herschel Multi-Tiered Extragalactic Survey using a 24-μm catalogue as a positional prior, and a uniform flux prior ranging from 0.01 to 1000 mJy. We show the marginalized SPIRE colour-colour plot and marginalized contribution to the cosmic infrared background at the SPIRE wavelengths. XID+ is a core tool arising from the Herschel Extragalactic Legacy Project (HELP) and we discuss how additional work within HELP providing prior information on fluxes can and will be utilized. The software is available at https://github.com/H-E-L-P/XID_plus. We also provide the data product for COSMOS. We believe this is the first time that the full posterior probability of galaxy photometry has been provided as a data product.

  11. VizieR Online Data Catalog: Herschel Multi-tiered Extragalactic Survey (Oliver+, 2012)

    NASA Astrophysics Data System (ADS)

    Oliver, S. J.; Bock, J.; Altieri, B.; Amblard, A.; Arumugam, V.; Aussel, H.; Babbedge, T.; Beelen, A.; Bethermin, M.; Blain, A.; Boselli, A.; Bridge, C.; Brisbin, D.; Buat, V.; Burgarella, D.; Castro-Rodriguez, N.; Cava, A.; Chanial, P.; Cirasuolo, M.; Clements, D. L.; Conley, A.; Conversi, L.; Cooray, A.; Dowell, C. D.; Dubois, E. N.; Dwek, E.; Dye, S.; Eales, S.; Elbaz, D.; Farrah, D.; Feltre, A.; Ferrero, P.; Fiolet, N.; Fox, M.; Franceschini, A.; Gear, W.; Giovannoli, E.; Glenn, J.; Gong, Y.; Gonzalez Solares, E. A.; Griffin, M.; Halpern, M.; Harwit, M.; Hatziminaoglou, E.; Heinis, S.; Hurley, P.; Hwang, H. S.; Hyde, A.; Ibar, E.; Ilbert, O.; Isaak, K.; Ivison, R. J.; Lagache, G.; Le Floc'h, E.; Levenson, L.; Faro, B. L.; Lu, N.; Madden, S.; Maffei, B.; Magdis, G.; Mainetti, G.; Marchetti, L.; Marsden, G.; Marshall, J.; Mortier, A. M. J.; Nguyen, H. T.; O'Halloran, B.; Omont, A.; Page, M. J.; Panuzzo, P.; Papageorgiou, A.; Patel, H.; Pearson, C. P.; Perez-Fournon, I.; Pohlen, M.; Rawlings, J. I.; Raymond, G.; Rigopoulou, D.; Riguccini, L.; Rizzo, D.; Rodighier!, O. G.; Ros Eboom, I. G.; Rowan-Robinson, M.; Sanchez Portal, M.; Schulz, B.; Scott, D.; Seymour, N.; Shupe, D. L.; Smith, A. J.; Stevens, J. A.; Symeonidis, M.; Trichas, M.; Tugwell, K. E.; Vaccari, M.; Valtchanov, I.; Vieira, J. D.; Viero, M.; Vigroux, L.; Wang, L.; Ward, R.; Wardlow, J.; Wright, G.; Xu, C. K.; Zemcov, M.

    2017-03-01

    SPIRE maps (250, 350 and 500 microns) and PACS maps (100 and 160 microns) covering an area of more than 385 square degrees in the sky resulting from observations taken as part of HerMES (KPGTsoliver1), a Herschel Key Project whose main objective was to chart the formation and evolution of infrared galaxies throughout cosmic history, measuring the bolometric emission of infrared galaxies and their clustering properties. The associated catalogues extracted from these maps include over 1,200,000 entries representing over 340,000 galaxies. They consist of 'blind extraction' catalogues containing photometric information derived directly from these maps, 'band merged' catalogues extracted at SPIRE 250 micron positions plus 'cross-identification' catalogues based on prior Spitzer MIPS 24 micron source positions. The latest data releases contain also information derived from the complementary Herschel programmes HeLMS (GT2mviero1) and HeRS (OT2mviero2). (4 data files).

  12. Sensor-based architecture for medical imaging workflow analysis.

    PubMed

    Silva, Luís A Bastião; Campos, Samuel; Costa, Carlos; Oliveira, José Luis

    2014-08-01

    The growing use of computer systems in medical institutions has been generating a tremendous quantity of data. While these data have a critical role in assisting physicians in the clinical practice, the information that can be extracted goes far beyond this utilization. This article proposes a platform capable of assembling multiple data sources within a medical imaging laboratory, through a network of intelligent sensors. The proposed integration framework follows a SOA hybrid architecture based on an information sensor network, capable of collecting information from several sources in medical imaging laboratories. Currently, the system supports three types of sensors: DICOM repository meta-data, network workflows and examination reports. Each sensor is responsible for converting unstructured information from data sources into a common format that will then be semantically indexed in the framework engine. The platform was deployed in the Cardiology department of a central hospital, allowing identification of processes' characteristics and users' behaviours that were unknown before the utilization of this solution.

  13. Optimization of an Innovative Biofiltration System as a VOC Control Technology for Aircraft Painting Facilities

    DTIC Science & Technology

    2004-04-20

    EUROPE (Leson, 1991). Chemical Operations Coffee Roasting Composting Facilities Chemical Storage Coca Roasting Landfill Gas Extraction Film Coating...information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and...maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect

  14. [A new tool for retrieving clinical data from various sources].

    PubMed

    Nielsen, Erik Waage; Hovland, Anders; Strømsnes, Oddgeir

    2006-02-23

    A doctor's tool for extracting clinical data from various sources on groups of hospital patients into one file has been in demand. For this purpose we evaluated Qlikview. Based on clinical information required by two cardiologists, an IT specialist with thorough knowledge of the hospital's data system (www.dips.no) used 30 days to assemble one Qlikview file. Data was also assembled from a pre-hospital ambulance system. The 13 Mb Qlikview file held various information on 12430 patients admitted to the cardiac unit 26,287 times over the last 21 years. Included were also 530,912 clinical laboratory analyses from these patients during the past five years. Some information required by the cardiologists was inaccessible due to lack of coding or data storage. Some databases could not export their data. Others were encrypted by the software company. A major part of the required data could be extracted to Qlikview. Searches went fast in spite of the huge amount of data. Qlikview could assemble clinical information to doctors from different data systems. Doctors from different hospitals could share and further refine empty Qlikview files for their own use. When the file is assembled, doctors can, on their own, search for answers to constantly changing clinical questions, also at odd hours.

  15. Extraction of Urban Trees from Integrated Airborne Based Digital Image and LIDAR Point Cloud Datasets - Initial Results

    NASA Astrophysics Data System (ADS)

    Dogon-yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-10-01

    Timely and accurate acquisition of information on the condition and structural changes of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting tree features include; ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraint, such as labour intensive field work, a lot of financial requirement, influences by weather condition and topographical covers which can be overcome by means of integrated airborne based LiDAR and very high resolution digital image datasets. This study presented a semi-automated approach for extracting urban trees from integrated airborne based LIDAR and multispectral digital image datasets over Istanbul city of Turkey. The above scheme includes detection and extraction of shadow free vegetation features based on spectral properties of digital images using shadow index and NDVI techniques and automated extraction of 3D information about vegetation features from the integrated processing of shadow free vegetation image and LiDAR point cloud datasets. The ability of the developed algorithms shows a promising result as an automated and cost effective approach to estimating and delineated 3D information of urban trees. The research also proved that integrated datasets is a suitable technology and a viable source of information for city managers to be used in urban trees management.

  16. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach

    PubMed Central

    2012-01-01

    Background Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. Methods We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. Results We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. Conclusions We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data. PMID:22759462

  17. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.

    PubMed

    Ratkovic, Zorana; Golik, Wiktoria; Warnier, Pierre

    2012-06-26

    Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data.

  18. Automated detection of extended sources in radio maps: progress from the SCORPIO survey

    NASA Astrophysics Data System (ADS)

    Riggi, S.; Ingallinera, A.; Leto, P.; Cavallaro, F.; Bufano, F.; Schillirò, F.; Trigilio, C.; Umana, G.; Buemi, C. S.; Norris, R. P.

    2016-08-01

    Automated source extraction and parametrization represents a crucial challenge for the next-generation radio interferometer surveys, such as those performed with the Square Kilometre Array (SKA) and its precursors. In this paper, we present a new algorithm, called CAESAR (Compact And Extended Source Automated Recognition), to detect and parametrize extended sources in radio interferometric maps. It is based on a pre-filtering stage, allowing image denoising, compact source suppression and enhancement of diffuse emission, followed by an adaptive superpixel clustering stage for final source segmentation. A parametrization stage provides source flux information and a wide range of morphology estimators for post-processing analysis. We developed CAESAR in a modular software library, also including different methods for local background estimation and image filtering, along with alternative algorithms for both compact and diffuse source extraction. The method was applied to real radio continuum data collected at the Australian Telescope Compact Array (ATCA) within the SCORPIO project, a pathfinder of the Evolutionary Map of the Universe (EMU) survey at the Australian Square Kilometre Array Pathfinder (ASKAP). The source reconstruction capabilities were studied over different test fields in the presence of compact sources, imaging artefacts and diffuse emission from the Galactic plane and compared with existing algorithms. When compared to a human-driven analysis, the designed algorithm was found capable of detecting known target sources and regions of diffuse emission, outperforming alternative approaches over the considered fields.

  19. Recent development of feature extraction and classification multispectral/hyperspectral images: a systematic literature review

    NASA Astrophysics Data System (ADS)

    Setiyoko, A.; Dharma, I. G. W. S.; Haryanto, T.

    2017-01-01

    Multispectral data and hyperspectral data acquired from satellite sensor have the ability in detecting various objects on the earth ranging from low scale to high scale modeling. These data are increasingly being used to produce geospatial information for rapid analysis by running feature extraction or classification process. Applying the most suited model for this data mining is still challenging because there are issues regarding accuracy and computational cost. This research aim is to develop a better understanding regarding object feature extraction and classification applied for satellite image by systematically reviewing related recent research projects. A method used in this research is based on PRISMA statement. After deriving important points from trusted sources, pixel based and texture-based feature extraction techniques are promising technique to be analyzed more in recent development of feature extraction and classification.

  20. OLED lighting devices having multi element light extraction and luminescence conversion layer

    DOEpatents

    Krummacher, Benjamin Claus; Antoniadis, Homer

    2010-11-16

    An apparatus such as a light source has a multi element light extraction and luminescence conversion layer disposed over a transparent layer of the light source and on the exterior of said light source. The multi-element light extraction and luminescence conversion layer includes a plurality of light extraction elements and a plurality of luminescence conversion elements. The light extraction elements diffuses the light from the light source while luminescence conversion elements absorbs a first spectrum of light from said light source and emits a second spectrum of light.

  1. In-Situ Wave Observations in the High Resolution Air-Sea Interaction DRI

    DTIC Science & Technology

    2007-09-30

    directional spectra extracted from the Coastal Data Information Program ( CDIP ) Harvest buoy located in 204 m depth off Point Conception. The initial sea...frequency-directional wave spectra (source: CDIP ). Upper panels: Typical summer-time South swell in the presence of a light North-West wind sea

  2. 76 FR 63359 - Endangered and Threatened Wildlife and Plants; Proposed Designation of Critical Habitat for the...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-12

    ... resource extraction (e.g., coal mining, silviculture, natural gas development), agriculture, road... channel instability, and natural gas development. Chucky Madtom The chucky madtom (Noturus crypticus) is a... information sources may include articles in peer-reviewed journals, conservation plans developed by States and...

  3. A scalable architecture for extracting, aligning, linking, and visualizing multi-Int data

    NASA Astrophysics Data System (ADS)

    Knoblock, Craig A.; Szekely, Pedro

    2015-05-01

    An analyst today has a tremendous amount of data available, but each of the various data sources typically exists in their own silos, so an analyst has limited ability to see an integrated view of the data and has little or no access to contextual information that could help in understanding the data. We have developed the Domain-Insight Graph (DIG) system, an innovative architecture for extracting, aligning, linking, and visualizing massive amounts of domain-specific content from unstructured sources. Under the DARPA Memex program we have already successfully applied this architecture to multiple application domains, including the enormous international problem of human trafficking, where we extracted, aligned and linked data from 50 million online Web pages. DIG builds on our Karma data integration toolkit, which makes it easy to rapidly integrate structured data from a variety of sources, including databases, spreadsheets, XML, JSON, and Web services. The ability to integrate Web services allows Karma to pull in live data from the various social media sites, such as Twitter, Instagram, and OpenStreetMaps. DIG then indexes the integrated data and provides an easy to use interface for query, visualization, and analysis.

  4. Shadow Detection Based on Regions of Light Sources for Object Extraction in Nighttime Video

    PubMed Central

    Lee, Gil-beom; Lee, Myeong-jin; Lee, Woo-Kyung; Park, Joo-heon; Kim, Tae-Hwan

    2017-01-01

    Intelligent video surveillance systems detect pre-configured surveillance events through background modeling, foreground and object extraction, object tracking, and event detection. Shadow regions inside video frames sometimes appear as foreground objects, interfere with ensuing processes, and finally degrade the event detection performance of the systems. Conventional studies have mostly used intensity, color, texture, and geometric information to perform shadow detection in daytime video, but these methods lack the capability of removing shadows in nighttime video. In this paper, a novel shadow detection algorithm for nighttime video is proposed; this algorithm partitions each foreground object based on the object’s vertical histogram and screens out shadow objects by validating their orientations heading toward regions of light sources. From the experimental results, it can be seen that the proposed algorithm shows more than 93.8% shadow removal and 89.9% object extraction rates for nighttime video sequences, and the algorithm outperforms conventional shadow removal algorithms designed for daytime videos. PMID:28327515

  5. Component-resolved evaluation of the content of major allergens in therapeutic extracts for specific immunotherapy of honeybee venom allergy

    PubMed Central

    Blank, Simon; Etzold, Stefanie; Darsow, Ulf; Schiener, Maximilian; Eberlein, Bernadette; Russkamp, Dennis; Wolf, Sara; Graessel, Anke; Biedermann, Tilo; Ollert, Markus; Schmidt-Weber, Carsten B.

    2017-01-01

    ABSTRACT Allergen-specific immunotherapy is the only curative treatment of honeybee venom (HBV) allergy, which is able to protect against further anaphylactic sting reactions. Recent analyses on a molecular level have demonstrated that HBV represents a complex allergen source that contains more relevant major allergens than formerly anticipated. Moreover, allergic patients show very diverse sensitization profiles with the different allergens. HBV-specific immunotherapy is conducted with HBV extracts which are derived from pure venom. The allergen content of these therapeutic extracts might differ due to natural variations of the source material or different down-stream processing strategies of the manufacturers. Since variations of the allergen content of therapeutic HBV extracts might be associated with therapeutic failure, we adressed the component-resolved allergen composition of different therapeutic grade HBV extracts which are approved for immunotherapy in numerous countries. The extracts were analyzed for their content of the major allergens Api m 1, Api m 2, Api m 3, Api m 5 and Api m 10. Using allergen-specific antibodies we were able to demonstrate the underrepresentation of relevant major allergens such as Api m 3, Api m 5 and Api m 10 in particular therapeutic extracts. Taken together, standardization of therapeutic extracts by determination of the total allergenic potency might imply the intrinsic pitfall of losing information about particular major allergens. Moreover, the variable allergen composition of different therapeutic HBV extracts might have an impact on therapy outcome and the clinical management of HBV-allergic patients with specific IgE to particular allergens. PMID:28494206

  6. Towards full waveform ambient noise inversion

    NASA Astrophysics Data System (ADS)

    Sager, Korbinian; Ermert, Laura; Boehm, Christian; Fichtner, Andreas

    2018-01-01

    In this work we investigate fundamentals of a method—referred to as full waveform ambient noise inversion—that improves the resolution of tomographic images by extracting waveform information from interstation correlation functions that cannot be used without knowing the distribution of noise sources. The fundamental idea is to drop the principle of Green function retrieval and to establish correlation functions as self-consistent observables in seismology. This involves the following steps: (1) We introduce an operator-based formulation of the forward problem of computing correlation functions. It is valid for arbitrary distributions of noise sources in both space and frequency, and for any type of medium, including 3-D elastic, heterogeneous and attenuating media. In addition, the formulation allows us to keep the derivations independent of time and frequency domain and it facilitates the application of adjoint techniques, which we use to derive efficient expressions to compute first and also second derivatives. The latter are essential for a resolution analysis that accounts for intra- and interparameter trade-offs. (2) In a forward modelling study we investigate the effect of noise sources and structure on different observables. Traveltimes are hardly affected by heterogeneous noise source distributions. On the other hand, the amplitude asymmetry of correlations is at least to first order insensitive to unmodelled Earth structure. Energy and waveform differences are sensitive to both structure and the distribution of noise sources. (3) We design and implement an appropriate inversion scheme, where the extraction of waveform information is successively increased. We demonstrate that full waveform ambient noise inversion has the potential to go beyond ambient noise tomography based on Green function retrieval and to refine noise source location, which is essential for a better understanding of noise generation. Inherent trade-offs between source and structure are quantified using Hessian-vector products.

  7. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association.

    PubMed

    Ma, Jian; Casey, Cameron P; Zheng, Xueyun; Ibrahim, Yehia M; Wilkins, Christopher S; Renslow, Ryan S; Thomas, Dennis G; Payne, Samuel H; Monroe, Matthew E; Smith, Richard D; Teeguarden, Justin G; Baker, Erin S; Metz, Thomas O

    2017-09-01

    Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library . erin.baker@pnnl.gov or thomas.metz@pnnl.gov. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  8. Natural Language Processing in Radiology: A Systematic Review.

    PubMed

    Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A

    2016-05-01

    Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.

  9. Event-based text mining for biology and functional genomics

    PubMed Central

    Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

    2015-01-01

    The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365

  10. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review

    PubMed Central

    Bellet, Florelle; Asfari, Hadyl; Souvignet, Julien; Texier, Nathalie; Jaulent, Marie-Christine; Beyens, Marie-Noëlle; Burgun, Anita; Bousquet, Cédric

    2015-01-01

    Background The underreporting of adverse drug reactions (ADRs) through traditional reporting channels is a limitation in the efficiency of the current pharmacovigilance system. Patients’ experiences with drugs that they report on social media represent a new source of data that may have some value in postmarketing safety surveillance. Objective A scoping review was undertaken to explore the breadth of evidence about the use of social media as a new source of knowledge for pharmacovigilance. Methods Daubt et al’s recommendations for scoping reviews were followed. The research questions were as follows: How can social media be used as a data source for postmarketing drug surveillance? What are the available methods for extracting data? What are the different ways to use these data? We queried PubMed, Embase, and Google Scholar to extract relevant articles that were published before June 2014 and with no lower date limit. Two pairs of reviewers independently screened the selected studies and proposed two themes of review: manual ADR identification (theme 1) and automated ADR extraction from social media (theme 2). Descriptive characteristics were collected from the publications to create a database for themes 1 and 2. Results Of the 1032 citations from PubMed and Embase, 11 were relevant to the research question. An additional 13 citations were added after further research on the Internet and in reference lists. Themes 1 and 2 explored 11 and 13 articles, respectively. Ways of approaching the use of social media as a pharmacovigilance data source were identified. Conclusions This scoping review noted multiple methods for identifying target data, extracting them, and evaluating the quality of medical information from social media. It also showed some remaining gaps in the field. Studies related to the identification theme usually failed to accurately assess the completeness, quality, and reliability of the data that were analyzed from social media. Regarding extraction, no study proposed a generic approach to easily adding a new site or data source. Additional studies are required to precisely determine the role of social media in the pharmacovigilance system. PMID:26163365

  11. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review.

    PubMed

    Lardon, Jérémy; Abdellaoui, Redhouane; Bellet, Florelle; Asfari, Hadyl; Souvignet, Julien; Texier, Nathalie; Jaulent, Marie-Christine; Beyens, Marie-Noëlle; Burgun, Anita; Bousquet, Cédric

    2015-07-10

    The underreporting of adverse drug reactions (ADRs) through traditional reporting channels is a limitation in the efficiency of the current pharmacovigilance system. Patients' experiences with drugs that they report on social media represent a new source of data that may have some value in postmarketing safety surveillance. A scoping review was undertaken to explore the breadth of evidence about the use of social media as a new source of knowledge for pharmacovigilance. Daubt et al's recommendations for scoping reviews were followed. The research questions were as follows: How can social media be used as a data source for postmarketing drug surveillance? What are the available methods for extracting data? What are the different ways to use these data? We queried PubMed, Embase, and Google Scholar to extract relevant articles that were published before June 2014 and with no lower date limit. Two pairs of reviewers independently screened the selected studies and proposed two themes of review: manual ADR identification (theme 1) and automated ADR extraction from social media (theme 2). Descriptive characteristics were collected from the publications to create a database for themes 1 and 2. Of the 1032 citations from PubMed and Embase, 11 were relevant to the research question. An additional 13 citations were added after further research on the Internet and in reference lists. Themes 1 and 2 explored 11 and 13 articles, respectively. Ways of approaching the use of social media as a pharmacovigilance data source were identified. This scoping review noted multiple methods for identifying target data, extracting them, and evaluating the quality of medical information from social media. It also showed some remaining gaps in the field. Studies related to the identification theme usually failed to accurately assess the completeness, quality, and reliability of the data that were analyzed from social media. Regarding extraction, no study proposed a generic approach to easily adding a new site or data source. Additional studies are required to precisely determine the role of social media in the pharmacovigilance system.

  12. Ultra-short ion and neutron pulse production

    DOEpatents

    Leung, Ka-Ngo; Barletta, William A.; Kwan, Joe W.

    2006-01-10

    An ion source has an extraction system configured to produce ultra-short ion pulses, i.e. pulses with pulse width of about 1 .mu.s or less, and a neutron source based on the ion source produces correspondingly ultra-short neutron pulses. To form a neutron source, a neutron generating target is positioned to receive an accelerated extracted ion beam from the ion source. To produce the ultra-short ion or neutron pulses, the apertures in the extraction system of the ion source are suitably sized to prevent ion leakage, the electrodes are suitably spaced, and the extraction voltage is controlled. The ion beam current leaving the source is regulated by applying ultra-short voltage pulses of a suitable voltage on the extraction electrode.

  13. Apparatus for proton radiography

    DOEpatents

    Martin, Ronald L.

    1976-01-01

    An apparatus for effecting diagnostic proton radiography of patients in hospitals comprises a source of negative hydrogen ions, a synchrotron for accelerating the negative hydrogen ions to a predetermined energy, a plurality of stations for stripping extraction of a radiography beam of protons, means for sweeping the extracted beam to cover a target, and means for measuring the residual range, residual energy, or percentage transmission of protons that pass through the target. The combination of information identifying the position of the beam with information about particles traversing the subject and the back absorber is performed with the aid of a computer to provide a proton radiograph of the subject. In an alternate embodiment of the invention, a back absorber comprises a plurality of scintillators which are coupled to detectors.

  14. Device structure for OLED light device having multi element light extraction and luminescence conversion layer

    DOEpatents

    Antoniadis,; Homer, Krummacher [Mountain View, CA; Claus, Benjamin [Regensburg, DE

    2008-01-22

    An apparatus such as a light source has a multi-element light extraction and luminescence conversion layer disposed over a transparent layer of the light source and on the exterior of said light source. The multi-element light extraction and luminescence conversion layer includes a plurality of light extraction elements and a plurality of luminescence conversion elements. The light extraction elements diffuses the light from the light source while luminescence conversion elements absorbs a first spectrum of light from said light source and emits a second spectrum of light.

  15. The modification at CSNS ion source

    NASA Astrophysics Data System (ADS)

    Liu, S.; Ouyang, H.; Huang, T.; Xiao, Y.; Cao, X.; Lv, Y.; Xue, K.; Chen, W.

    2017-08-01

    The commissioning of CSNS front end has been finished. Above 15 mA beam intensity is obtained at the end of RFQ. For CSNS ion source, it is a type of penning surface plasma ion source, similar to ISIS ion source. To improve the operation stability and reduce spark rate, some modifications have been performed, including Penning field, extraction optics and post acceleration. PBGUNS is applied to optimize beam extraction. The co-extraction electrons are considered at PBGUNS simulation and various extracted structure are simulated aiming to make the beam through the extracted electrode without loss. The stability of ion source is improved further.

  16. Biological activity and chemical profile of Lavatera thuringiaca L. extracts obtained by different extraction approaches.

    PubMed

    Mašković, Pavle Z; Veličković, Vesna; Đurović, Saša; Zeković, Zoran; Radojković, Marija; Cvetanović, Aleksandra; Švarc-Gajić, Jaroslava; Mitić, Milan; Vujić, Jelena

    2018-01-01

    Lavatera thuringiaca L. is herbaceous perennial plant from Malvaceae family, which is known for its biological activity and richness in polyphenolic compounds. Despite this, the information regarding the biological activity and chemical profile is still insufficient. Aim of this study was to investigate biological potential and chemical profile of Lavatera thuringiaca L., as well as influence of applied extraction technique on them. Two conventional and four non-conventional extraction techniques were applied in order to obtain extracts rich in bioactive compound. Extracts were further tested for total phenolics, flavonoids, condensed tannins, gallotannins and anthocyanins contents using spectrophotometric assays. Polyphenolic profile was established using HPLC-DAD analysis. Biological activity was investigated regarding antioxidant, cytotoxic and antibacterial activities. Four antioxidant assays were applied as well as three different cell lines for cytotoxic and fifteen bacterial strain for antibacterial activity. Results showed that subcritical water extraction (SCW) dominated over the other extraction techniques, where SCW extract exhibited the highest biological activity. Study indicates that plant Lavatera thuringiaca L. may be used as a potential source of biologically compounds. Copyright © 2017 Elsevier GmbH. All rights reserved.

  17. Modelling spatiotemporal change using multidimensional arrays Meng

    NASA Astrophysics Data System (ADS)

    Lu, Meng; Appel, Marius; Pebesma, Edzer

    2017-04-01

    The large variety of remote sensors, model simulations, and in-situ records provide great opportunities to model environmental change. The massive amount of high-dimensional data calls for methods to integrate data from various sources and to analyse spatiotemporal and thematic information jointly. An array is a collection of elements ordered and indexed in arbitrary dimensions, which naturally represent spatiotemporal phenomena that are identified by their geographic locations and recording time. In addition, array regridding (e.g., resampling, down-/up-scaling), dimension reduction, and spatiotemporal statistical algorithms are readily applicable to arrays. However, the role of arrays in big geoscientific data analysis has not been systematically studied: How can arrays discretise continuous spatiotemporal phenomena? How can arrays facilitate the extraction of multidimensional information? How can arrays provide a clean, scalable and reproducible change modelling process that is communicable between mathematicians, computer scientist, Earth system scientist and stakeholders? This study emphasises on detecting spatiotemporal change using satellite image time series. Current change detection methods using satellite image time series commonly analyse data in separate steps: 1) forming a vegetation index, 2) conducting time series analysis on each pixel, and 3) post-processing and mapping time series analysis results, which does not consider spatiotemporal correlations and ignores much of the spectral information. Multidimensional information can be better extracted by jointly considering spatial, spectral, and temporal information. To approach this goal, we use principal component analysis to extract multispectral information and spatial autoregressive models to account for spatial correlation in residual based time series structural change modelling. We also discuss the potential of multivariate non-parametric time series structural change methods, hierarchical modelling, and extreme event detection methods to model spatiotemporal change. We show how array operations can facilitate expressing these methods, and how the open-source array data management and analytics software SciDB and R can be used to scale the process and make it easily reproducible.

  18. Hardware independence checkout software

    NASA Technical Reports Server (NTRS)

    Cameron, Barry W.; Helbig, H. R.

    1990-01-01

    ACSI has developed a program utilizing CLIPS to assess compliance with various programming standards. Essentially the program parses C code to extract the names of all function calls. These are asserted as CLIPS facts which also include information about line numbers, source file names, and called functions. Rules have been devised to establish functions called that have not been defined in any of the source parsed. These are compared against lists of standards (represented as facts) using rules that check intersections and/or unions of these. By piping the output into other processes the source is appropriately commented by generating and executing parsed scripts.

  19. The economic value of remote sensing of earth resources from space: An ERTS overview and the value of continuity of service. Volume 7: Nonreplenishable natural resources: Minerals, fossil fuels and geothermal energy sources

    NASA Technical Reports Server (NTRS)

    Lietzke, K. R.

    1974-01-01

    The application of remotely-sensed information to the mineral, fossil fuel, and geothermal energy extraction industry is investigated. Public and private cost savings are documented in geologic mapping activities. Benefits and capabilities accruing to the ERS system are assessed. It is shown that remote sensing aids in resource extraction, as well as the monitoring of several dynamic phenomena, including disturbed lands, reclamation, erosion, glaciation, and volcanic and seismic activity.

  20. Procedure for extraction of disparate data from maps into computerized data bases

    NASA Technical Reports Server (NTRS)

    Junkin, B. G.

    1979-01-01

    A procedure is presented for extracting disparate sources of data from geographic maps and for the conversion of these data into a suitable format for processing on a computer-oriented information system. Several graphic digitizing considerations are included and related to the NASA Earth Resources Laboratory's Digitizer System. Current operating procedures for the Digitizer System are given in a simplified and logical manner. The report serves as a guide to those organizations interested in converting map-based data by using a comparable map digitizing system.

  1. Open-Source Programming for Automated Generation of Graphene Raman Spectral Maps

    NASA Astrophysics Data System (ADS)

    Vendola, P.; Blades, M.; Pierre, W.; Jedlicka, S.; Rotkin, S. V.

    Raman microscopy is a useful tool for studying the structural characteristics of graphene deposited onto substrates. However, extracting useful information from the Raman spectra requires data processing and 2D map generation. An existing home-built confocal Raman microscope was optimized for graphene samples and programmed to automatically generate Raman spectral maps across a specified area. In particular, an open source data collection scheme was generated to allow the efficient collection and analysis of the Raman spectral data for future use. NSF ECCS-1509786.

  2. Analysis and suppression of passive noise in surface microseismic data

    NASA Astrophysics Data System (ADS)

    Forghani-Arani, Farnoush

    Surface microseismic surveys are gaining popularity in monitoring the hydraulic fracturing process. The effectiveness of these surveys, however, is strongly dependent on the signal-to-noise ratio of the acquired data. Cultural and industrial noise generated during hydraulic fracturing operations usually dominate the data, thereby decreasing the effectiveness of using these data in identifying and locating microseismic events. Hence, noise suppression is a critical step in surface microseismic monitoring. In this thesis, I focus on two important aspects in using surface-recorded microseismic seismic data: first, I take advantage of the unwanted surface noise to understand the characteristics of these noise and extract information about the propagation medium from the noise; second, I propose effective techniques to suppress the surface noise while preserving the waveforms that contain information about the source of microseisms. Automated event identification on passive seismic data using only a few receivers is challenging especially when the record lengths span over long durations of time. I introduce an automatic event identification algorithm that is designed specifically for detecting events in passive data acquired with a small number of receivers. I demonstrate that the conventional STA/LTA (Short-term Average/Long-term Average) algorithm is not sufficiently effective in event detection in the common case of low signal-to-noise ratio. With a cross-correlation based method as an extension of the STA/LTA algorithm, even low signal-to-noise events (that were not detectable with conventional STA/LTA) were revealed. Surface microseismic data contains surface-waves (generated primarily from hydraulic fracturing activities) and body-waves in the form of microseismic events. It is challenging to analyze the surface-waves on the recorded data directly because of the randomness of their source and their unknown source signatures. I use seismic interferometry to extract the surface-wave arrivals. Interferometry is a powerful tool to extract waves (including body-wave and surface-waves) that propagate from any receiver in the array (called a pseudo source) to the other receivers across the array. Since most of the noise sources in surface microseismic data lie on the surface, seismic interferometry yields pseudo source gathers dominated by surface-wave energy. The dispersive characteristics of these surface-waves are important properties that can be used to extract information necessary for suppressing these waves. I demonstrate the application of interferometry to surface passive data recorded during the hydraulic fracturing operation of a tight gas reservoir and extract the dispersion properties of surface-waves corresponding to a pseudo-shot gather. Comparison of the dispersion characteristics of the surface waves from the pseudo-shot gather with that of an active shot-gather shows interesting similarities and differences. The dispersion character (e.g. velocity change with frequency) of the fundamental mode was observed to have the same behavior for both the active and passive data. However, for the higher mode surface-waves, the dispersion properties are extracted at different frequency ranges. Conventional noise suppression techniques in passive data are mostly stacking-based that rely on enforcing the amplitude of the signal by stacking the waveforms at the receivers and are unable to preserve the waveforms at the individual receivers necessary for estimating the microseismic source location and source mechanism. Here, I introduce a technique based on the tau - p transform, that effectively identifies and separates microseismic events from surface-wave noise in the tau -p domain. This technique is superior to conventional stacking-based noise suppression techniques, because it preserves the waveforms at individual receivers. Application of this methodology to microseismic events with isotropic and double-couple source mechanism, show substantial improvement in the signal-to-noise ratio. Imaging of the processed field data also show improved imaging of the hypocenter location of the microseismic source. In the case of double-couple source mechanism, I suggest two approaches for unifying the polarities at the receivers, a cross-correlation approach and a semblance-based prediction approach. The semblance-based approach is more effective at unifying the polarities, especially for low signal-to-noise ratio data.

  3. Assessment of Anaerobic Metabolic Activity and Microbial Diversity in a Petroleum-Contaminated Aquifer Using Push-Pull Tests in Combination With Molecular Tools and Stable Isotopes

    NASA Astrophysics Data System (ADS)

    Schroth, M. H.; Kleikemper, J.; Pombo, S. A.; Zeyer, J.

    2002-12-01

    In the past, studies on microbial communities in natural environments have typically focused on either their structure or on their metabolic function. However, linking structure and function is important for understanding microbial community dynamics, in particular in contaminated environments. We will present results of a novel combination of a hydrogeological field method (push-pull tests) with molecular tools and stable isotope analysis, which was employed to quantify anaerobic activities and associated microbial diversity in a petroleum-contaminated aquifer in Studen, Switzerland. Push-pull tests consisted of the injection of test solution containing a conservative tracer and reactants (electron acceptors, 13C-labeled carbon sources) into the aquifer anoxic zone. Following an incubation period, the test solution/groundwater mixture was extracted from the same location. Metabolic activities were computed from solute concentrations measured during extraction. Simultaneously, microbial diversity in sediment and groundwater was characterized by using fluorescence in situ hybridization (FISH), denaturing gradient gel electrophoresis (DGGE), as well as phospholipids fatty acid (PLFA) analysis in combination with 13C isotopic measurements. Results from DGGE analyses provided information on the general community structure before, during and after the tests, while FISH yielded information on active populations. Moreover, using 13C-labeling of microbial PLFA we were able to directly link carbon source assimilation in an aquifer to indigenous microorganisms while providing quantitative information on respective carbon source consumption.

  4. Bayesian source tracking via focalization and marginalization in an uncertain Mediterranean Sea environment.

    PubMed

    Dosso, Stan E; Wilmut, Michael J; Nielsen, Peter L

    2010-07-01

    This paper applies Bayesian source tracking in an uncertain environment to Mediterranean Sea data, and investigates the resulting tracks and track uncertainties as a function of data information content (number of data time-segments, number of frequencies, and signal-to-noise ratio) and of prior information (environmental uncertainties and source-velocity constraints). To track low-level sources, acoustic data recorded for multiple time segments (corresponding to multiple source positions along the track) are inverted simultaneously. Environmental uncertainty is addressed by including unknown water-column and seabed properties as nuisance parameters in an augmented inversion. Two approaches are considered: Focalization-tracking maximizes the posterior probability density (PPD) over the unknown source and environmental parameters. Marginalization-tracking integrates the PPD over environmental parameters to obtain a sequence of joint marginal probability distributions over source coordinates, from which the most-probable track and track uncertainties can be extracted. Both approaches apply track constraints on the maximum allowable vertical and radial source velocity. The two approaches are applied for towed-source acoustic data recorded at a vertical line array at a shallow-water test site in the Mediterranean Sea where previous geoacoustic studies have been carried out.

  5. CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources.

    PubMed

    Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio

    2012-07-01

    During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.

  6. Detecting misinformation and knowledge conflicts in relational data

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Jackobsen, Matthew; Riordan, Brian

    2014-06-01

    Information fusion is required for many mission-critical intelligence analysis tasks. Using knowledge extracted from various sources, including entities, relations, and events, intelligence analysts respond to commander's information requests, integrate facts into summaries about current situations, augment existing knowledge with inferred information, make predictions about the future, and develop action plans. However, information fusion solutions often fail because of conflicting and redundant knowledge contained in multiple sources. Most knowledge conflicts in the past were due to translation errors and reporter bias, and thus could be managed. Current and future intelligence analysis, especially in denied areas, must deal with open source data processing, where there is much greater presence of intentional misinformation. In this paper, we describe a model for detecting conflicts in multi-source textual knowledge. Our model is based on constructing semantic graphs representing patterns of multi-source knowledge conflicts and anomalies, and detecting these conflicts by matching pattern graphs against the data graph constructed using soft co-reference between entities and events in multiple sources. The conflict detection process maintains the uncertainty throughout all phases, providing full traceability and enabling incremental updates of the detection results as new knowledge or modification to previously analyzed information are obtained. Detected conflicts are presented to analysts for further investigation. In the experimental study with SYNCOIN dataset, our algorithms achieved perfect conflict detection in ideal situation (no missing data) while producing 82% recall and 90% precision in realistic noise situation (15% of missing attributes).

  7. Cancellation of spurious arrivals in Green's function extraction and the generalized optical theorem

    USGS Publications Warehouse

    Snieder, R.; Van Wijk, K.; Haney, M.; Calvert, R.

    2008-01-01

    The extraction of the Green's function by cross correlation of waves recorded at two receivers nowadays finds much application. We show that for an arbitrary small scatterer, the cross terms of scattered waves give an unphysical wave with an arrival time that is independent of the source position. This constitutes an apparent inconsistency because theory predicts that such spurious arrivals do not arise, after integration over a complete source aperture. This puzzling inconsistency can be resolved for an arbitrary scatterer by integrating the contribution of all sources in the stationary phase approximation to show that the stationary phase contributions to the source integral cancel the spurious arrival by virtue of the generalized optical theorem. This work constitutes an alternative derivation of this theorem. When the source aperture is incomplete, the spurious arrival is not canceled and could be misinterpreted to be part of the Green's function. We give an example of how spurious arrivals provide information about the medium complementary to that given by the direct and scattered waves; the spurious waves can thus potentially be used to better constrain the medium. ?? 2008 The American Physical Society.

  8. On-board data management study for EOPAP

    NASA Technical Reports Server (NTRS)

    Davisson, L. D.

    1975-01-01

    The requirements, implementation techniques, and mission analysis associated with on-board data management for EOPAP were studied. SEASAT-A was used as a baseline, and the storage requirements, data rates, and information extraction requirements were investigated for each of the following proposed SEASAT sensors: a short pulse 13.9 GHz radar, a long pulse 13.9 GHz radar, a synthetic aperture radar, a multispectral passive microwave radiometer facility, and an infrared/visible very high resolution radiometer (VHRR). Rate distortion theory was applied to determine theoretical minimum data rates and compared with the rates required by practical techniques. It was concluded that practical techniques can be used which approach the theoretically optimum based upon an empirically determined source random process model. The results of the preceding investigations were used to recommend an on-board data management system for (1) data compression through information extraction, optimal noiseless coding, source coding with distortion, data buffering, and data selection under command or as a function of data activity, (2) for command handling, (3) for spacecraft operation and control, and (4) for experiment operation and monitoring.

  9. Identification of pests and diseases of Dalbergia hainanensis based on EVI time series and classification of decision tree

    NASA Astrophysics Data System (ADS)

    Luo, Qiu; Xin, Wu; Qiming, Xiong

    2017-06-01

    In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.

  10. A data-driven approach for quality assessment of radiologic interpretations.

    PubMed

    Hsu, William; Han, Simon X; Arnold, Corey W; Bui, Alex At; Enzmann, Dieter R

    2016-04-01

    Given the increasing emphasis on delivering high-quality, cost-efficient healthcare, improved methodologies are needed to measure the accuracy and utility of ordered diagnostic examinations in achieving the appropriate diagnosis. Here, we present a data-driven approach for performing automated quality assessment of radiologic interpretations using other clinical information (e.g., pathology) as a reference standard for individual radiologists, subspecialty sections, imaging modalities, and entire departments. Downstream diagnostic conclusions from the electronic medical record are utilized as "truth" to which upstream diagnoses generated by radiology are compared. The described system automatically extracts and compares patient medical data to characterize concordance between clinical sources. Initial results are presented in the context of breast imaging, matching 18 101 radiologic interpretations with 301 pathology diagnoses and achieving a precision and recall of 84% and 92%, respectively. The presented data-driven method highlights the challenges of integrating multiple data sources and the application of information extraction tools to facilitate healthcare quality improvement. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. Extraction fatty acid as a source to produce biofuel in microalgae Chlorella sp. and Spirulina sp. using supercritical carbon dioxide

    NASA Astrophysics Data System (ADS)

    Tai, Do Chiem; Hai, Dam Thi Thanh; Vinh, Nguyen Hanh; Phung, Le Thi Kim

    2016-06-01

    In this research, the fatty acids of isolated microalgae were extracted by some technologies such as maceration, Soxhlet, ultrasonic-assisted extraction and supercritical fluid extraction; and analyzed for biodiesel production using GC-MS. This work deals with the extraction of microalgae oil from dry biomass by using supercritical fluid extraction method. A complete study at laboratory of the influence of some parameters on the extraction kinetics and yields and on the composition of the oil in terms of lipid classes and profiles is proposed. Two types of microalgae were studied: Chlorella sp. and Spirulina sp. For the extraction of oil from microalgae, supercritical CO2 (SC-CO2) is regarded with interest, being safer than n-hexane and offering a negligible environmental impact, a short extraction time and a high-quality final product. Whilst some experimental papers are available on the supercritical fluid extraction (SFE) of oil from microalgae, only limited information exists on the kinetics of the process. These results demonstrate that supercritical CO2 extraction is an efficient method for the complete recovery of the neutral lipid phase.

  12. Predicate Argument Structure Frames for Modeling Information in Operative Notes

    PubMed Central

    Wang, Yan; Pakhomov, Serguei; Melton, Genevieve B.

    2015-01-01

    The rich information about surgical procedures contained in operative notes is a valuable data source for improving the clinical evidence base and clinical research. In this study, we propose a set of Predicate Argument Structure (PAS) frames for surgical action verbs to assist in the creation of an information extraction (IE) system to automatically extract details about the techniques, equipment, and operative steps from operative notes. We created PropBank style PAS frames for the 30 top surgical action verbs based on examination of randomly selected sample sentences from 3,000 Laparoscopic Cholecystectomy notes. To assess completeness of the PAS frames to represent usage of same action verbs, we evaluated the PAS frames created on sample sentences from operative notes of 6 other gastrointestinal surgical procedures. Our results showed that the PAS frames created with one type of surgery can successfully denote the usage of the same verbs in operative notes of broader surgical categories. PMID:23920664

  13. A drunken search in crystallization space.

    PubMed

    Fazio, Vincent J; Peat, Thomas S; Newman, Janet

    2014-10-01

    The REMARK280 field of the Protein Data Bank is the richest open source of successful crystallization information. The REMARK280 field is optional and currently uncurated, so significant effort needs to be applied to extract reliable data. There are well over 15 000 crystallization conditions available commercially from 12 different vendors. After putting the PDB crystallization information and the commercial cocktail data into a consistent format, these data are used to extract information about the overlap between the two sets of crystallization conditions. An estimation is made as to which commercially available conditions are most appropriate for producing well diffracting crystals by looking at which commercial conditions are found unchanged (or almost unchanged) in the PDB. Further analyses include which commercial kits are the most appropriate for shotgun or more traditional approaches to crystallization screening. This analysis suggests that almost 40% of the crystallization conditions found currently in the PDB are identical or very similar to a commercial condition.

  14. Ultrafast single photon emitting quantum photonic structures based on a nano-obelisk.

    PubMed

    Kim, Je-Hyung; Ko, Young-Ho; Gong, Su-Hyun; Ko, Suk-Min; Cho, Yong-Hoon

    2013-01-01

    A key issue in a single photon source is fast and efficient generation of a single photon flux with high light extraction efficiency. Significant progress toward high-efficiency single photon sources has been demonstrated by semiconductor quantum dots, especially using narrow bandgap materials. Meanwhile, there are many obstacles, which restrict the use of wide bandgap semiconductor quantum dots as practical single photon sources in ultraviolet-visible region, despite offering free space communication and miniaturized quantum information circuits. Here we demonstrate a single InGaN quantum dot embedded in an obelisk-shaped GaN nanostructure. The nano-obelisk plays an important role in eliminating dislocations, increasing light extraction, and minimizing a built-in electric field. Based on the nano-obelisks, we observed nonconventional narrow quantum dot emission and positive biexciton binding energy, which are signatures of negligible built-in field in single InGaN quantum dots. This results in efficient and ultrafast single photon generation in the violet color region.

  15. Definition of variables required for comprehensive description of drug dosage and clinical pharmacokinetics.

    PubMed

    Medem, Anna V; Seidling, Hanna M; Eichler, Hans-Georg; Kaltschmidt, Jens; Metzner, Michael; Hubert, Carina M; Czock, David; Haefeli, Walter E

    2017-05-01

    Electronic clinical decision support systems (CDSS) require drug information that can be processed by computers. The goal of this project was to determine and evaluate a compilation of variables that comprehensively capture the information contained in the summary of product characteristic (SmPC) and unequivocally describe the drug, its dosage options, and clinical pharmacokinetics. An expert panel defined and structured a set of variables and drafted a guideline to extract and enter information on dosage and clinical pharmacokinetics from textual SmPCs as published by the European Medicines Agency (EMA). The set of variables was iteratively revised and evaluated by data extraction and variable allocation of roughly 7% of all centrally approved drugs. The information contained in the SmPC was allocated to three information clusters consisting of 260 variables. The cluster "drug characterization" specifies the nature of the drug. The cluster "dosage" provides information on approved drug dosages and defines corresponding specific conditions. The cluster "clinical pharmacokinetics" includes pharmacokinetic parameters of relevance for dosing in clinical practice. A first evaluation demonstrated that, despite the complexity of the current free text SmPCs, dosage and pharmacokinetic information can be reliably extracted from the SmPCs and comprehensively described by a limited set of variables. By proposing a compilation of variables well describing drug dosage and clinical pharmacokinetics, the project represents a step forward towards the development of a comprehensive database system serving as information source for sophisticated CDSS.

  16. DOA-informed source extraction in the presence of competing talkers and background noise

    NASA Astrophysics Data System (ADS)

    Taseska, Maja; Habets, Emanuël A. P.

    2017-12-01

    A desired speech signal in hands-free communication systems is often degraded by noise and interfering speech. Even though the number and locations of the interferers are often unknown in practice, it is justified to assume in certain applications that the direction-of-arrival (DOA) of the desired source is approximately known. Using the known DOA, fixed spatial filters such as the delay-and-sum beamformer can be steered to extract the desired source. However, it is well-known that fixed data-independent spatial filters do not provide sufficient reduction of directional interferers. Instead, the DOA information can be used to estimate the statistics of the desired and the undesired signals and to compute optimal data-dependent spatial filters. One way the DOA is exploited for optimal spatial filtering in the literature, is by designing DOA-based narrowband detectors to determine whether a desired or an undesired signal is dominant at each time-frequency (TF) bin. Subsequently, the statistics of the desired and the undesired signals can be estimated during the TF bins where the respective signal is dominant. In a similar manner, a Gaussian signal model-based detector which does not incorporate DOA information has been used in scenarios where the undesired signal consists of stationary background noise. However, when the undesired signal is non-stationary, resulting for example from interfering speakers, such a Gaussian signal model-based detector is unable to robustly distinguish desired from undesired speech. To this end, we propose a DOA model-based detector to determine the dominant source at each TF bin and estimate the desired and undesired signal statistics. We demonstrate that data-dependent spatial filters that use the statistics estimated by the proposed framework achieve very good undesired signal reduction, even when using only three microphones.

  17. Evaluation and Verification of the Global Rapid Identification of Threats System for Infectious Diseases in Textual Data Sources.

    PubMed

    Huff, Andrew G; Breit, Nathan; Allen, Toph; Whiting, Karissa; Kiley, Christopher

    2016-01-01

    The Global Rapid Identification of Threats System (GRITS) is a biosurveillance application that enables infectious disease analysts to monitor nontraditional information sources (e.g., social media, online news outlets, ProMED-mail reports, and blogs) for infectious disease threats. GRITS analyzes these textual data sources by identifying, extracting, and succinctly visualizing epidemiologic information and suggests potentially associated infectious diseases. This manuscript evaluates and verifies the diagnoses that GRITS performs and discusses novel aspects of the software package. Via GRITS' web interface, infectious disease analysts can examine dynamic visualizations of GRITS' analyses and explore historical infectious disease emergence events. The GRITS API can be used to continuously analyze information feeds, and the API enables GRITS technology to be easily incorporated into other biosurveillance systems. GRITS is a flexible tool that can be modified to conduct sophisticated medical report triaging, expanded to include customized alert systems, and tailored to address other biosurveillance needs.

  18. Evaluation and Verification of the Global Rapid Identification of Threats System for Infectious Diseases in Textual Data Sources

    PubMed Central

    Breit, Nathan

    2016-01-01

    The Global Rapid Identification of Threats System (GRITS) is a biosurveillance application that enables infectious disease analysts to monitor nontraditional information sources (e.g., social media, online news outlets, ProMED-mail reports, and blogs) for infectious disease threats. GRITS analyzes these textual data sources by identifying, extracting, and succinctly visualizing epidemiologic information and suggests potentially associated infectious diseases. This manuscript evaluates and verifies the diagnoses that GRITS performs and discusses novel aspects of the software package. Via GRITS' web interface, infectious disease analysts can examine dynamic visualizations of GRITS' analyses and explore historical infectious disease emergence events. The GRITS API can be used to continuously analyze information feeds, and the API enables GRITS technology to be easily incorporated into other biosurveillance systems. GRITS is a flexible tool that can be modified to conduct sophisticated medical report triaging, expanded to include customized alert systems, and tailored to address other biosurveillance needs. PMID:27698665

  19. Semiotic foundation for multisensor-multilook fusion

    NASA Astrophysics Data System (ADS)

    Myler, Harley R.

    1998-07-01

    This paper explores the concept of an application of semiotic principles to the design of a multisensor-multilook fusion system. Semiotics is an approach to analysis that attempts to process media in a united way using qualitative methods as opposed to quantitative. The term semiotic refers to signs, or signatory data that encapsulates information. Semiotic analysis involves the extraction of signs from information sources and the subsequent processing of the signs into meaningful interpretations of the information content of the source. The multisensor fusion problem predicated on a semiotic system structure and incorporating semiotic analysis techniques is explored and the design for a multisensor system as an information fusion system is explored. Semiotic analysis opens the possibility of using non-traditional sensor sources and modalities in the fusion process, such as verbal and textual intelligence derived from human observers. Examples of how multisensor/multimodality data might be analyzed semiotically is shown and discussion on how a semiotic system for multisensor fusion could be realized is outlined. The architecture of a semiotic multisensor fusion processor that can accept situational awareness data is described, although an implementation has not as yet been constructed.

  20. Hybrid single-source online Fourier transform coherent anti-Stokes Raman scattering/optical coherence tomography.

    PubMed

    Kamali, Tschackad; Považay, Boris; Kumar, Sunil; Silberberg, Yaron; Hermann, Boris; Werkmeister, René; Drexler, Wolfgang; Unterhuber, Angelika

    2014-10-01

    We demonstrate a multimodal optical coherence tomography (OCT) and online Fourier transform coherent anti-Stokes Raman scattering (FTCARS) platform using a single sub-12 femtosecond (fs) Ti:sapphire laser enabling simultaneous extraction of structural and chemical ("morphomolecular") information of biological samples. Spectral domain OCT prescreens the specimen providing a fast ultrahigh (4×12  μm axial and transverse) resolution wide field morphologic overview. Additional complementary intrinsic molecular information is obtained by zooming into regions of interest for fast label-free chemical mapping with online FTCARS spectroscopy. Background-free CARS is based on a Michelson interferometer in combination with a highly linear piezo stage, which allows for quick point-to-point extraction of CARS spectra in the fingerprint region in less than 125 ms with a resolution better than 4  cm(-1) without the need for averaging. OCT morphology and CARS spectral maps indicating phosphate and carbonate bond vibrations from human bone samples are extracted to demonstrate the performance of this hybrid imaging platform.

  1. Technical design and system implementation of region-line primitive association framework

    NASA Astrophysics Data System (ADS)

    Wang, Min; Xing, Jinjin; Wang, Jie; Lv, Guonian

    2017-08-01

    Apart from regions, image edge lines are an important information source, and they deserve more attention in object-based image analysis (OBIA) than they currently receive. In the region-line primitive association framework (RLPAF), we promote straight-edge lines as line primitives to achieve powerful OBIAs. Along with regions, straight lines become basic units for subsequent extraction and analysis of OBIA features. This study develops a new software system called remote-sensing knowledge finder (RSFinder) to implement RLPAF for engineering application purposes. This paper introduces the extended technical framework, a comprehensively designed feature set, key technology, and software implementation. To our knowledge, RSFinder is the world's first OBIA system based on two types of primitives, namely, regions and lines. It is fundamentally different from other well-known region-only-based OBIA systems, such as eCogntion and ENVI feature extraction module. This paper has important reference values for the development of similarly structured OBIA systems and line-involved extraction algorithms of remote sensing information.

  2. Fusion of infrared polarization and intensity images based on improved toggle operator

    NASA Astrophysics Data System (ADS)

    Zhu, Pan; Ding, Lei; Ma, Xiaoqing; Huang, Zhanhua

    2018-01-01

    Integration of infrared polarization and intensity images has been a new topic in infrared image understanding and interpretation. The abundant infrared details and target from infrared image and the salient edge and shape information from polarization image should be preserved or even enhanced in the fused result. In this paper, a new fusion method is proposed for infrared polarization and intensity images based on the improved multi-scale toggle operator with spatial scale, which can effectively extract the feature information of source images and heavily reduce redundancy among different scale. Firstly, the multi-scale image features of infrared polarization and intensity images are respectively extracted at different scale levels by the improved multi-scale toggle operator. Secondly, the redundancy of the features among different scales is reduced by using spatial scale. Thirdly, the final image features are combined by simply adding all scales of feature images together, and a base image is calculated by performing mean value weighted method on smoothed source images. Finally, the fusion image is obtained by importing the combined image features into the base image with a suitable strategy. Both objective assessment and subjective vision of the experimental results indicate that the proposed method obtains better performance in preserving the details and edge information as well as improving the image contrast.

  3. Framework for automatic information extraction from research papers on nanocrystal devices

    PubMed Central

    Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

    2015-01-01

    Summary To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system. PMID:26665057

  4. Framework for automatic information extraction from research papers on nanocrystal devices.

    PubMed

    Dieb, Thaer M; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

    2015-01-01

    To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.

  5. KneeTex: an ontology-driven system for information extraction from MRI reports.

    PubMed

    Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

    2015-01-01

    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance. KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

  6. PASTE: patient-centered SMS text tagging in a medication management system.

    PubMed

    Stenner, Shane P; Johnson, Kevin B; Denny, Joshua C

    2012-01-01

    To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages.

  7. IN VITRO DIGESTIVE FLUID EXTRACTION AS A MEASURE OF THE BIOAVAILABILITY OF SEDIMENT-ASSOCIATED POLYCYCLIC AROMATIC HYDROCARBONS: SOURCES OF VARIATION AND IMPLICATIONS FOR PARTITIONING MODELS. (R825353)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  8. Exploring Affiliation Network Models as a Collaborative Filtering Mechanism in E-Learning

    ERIC Educational Resources Information Center

    Rodriguez, Daniel; Sicilia, Miguel Angel; Sanchez-Alonso, Salvador; Lezcano, Leonardo; Garcia-Barriocanal, Elena

    2011-01-01

    The online interaction of learners and tutors in activities with concrete objectives provides a valuable source of data that can be analyzed for different purposes. One of these purposes is the use of the information extracted from that interaction to aid tutors and learners in decision making about either the configuration of further learning…

  9. Music Educator Vacancies in Faith-Based K-12 Schools in the United States: 2013-2014

    ERIC Educational Resources Information Center

    Hash, Phillip M.

    2015-01-01

    The purpose of this study was to analyze and summarize characteristics of music educator vacancies in faith-based K-12 schools in the United States for the 2013-2014 academic year. Data extracted from placement notices and supplemental sources included demographic information, job responsibilities, and employment requirements for 153 listings in…

  10. Effects of DNA Extraction Procedures on Bacteroides Profiles in Fecal Samples From Various Animals Determined by Terminal Restriction Fragment Length Polymorphism Analysis

    EPA Science Inventory

    A major assumption in microbial source tracking is that some fecal bacteria are specific to a host animal, and thus provide unique microbial fingerprints that can be used to differentiate hosts. However, the DNA information obtained from a particular sample may be biased dependi...

  11. Profiling of poorly stratified atmospheres with scanning lidar

    Treesearch

    C. E. Wold; V. A. Kovalev; A. P. Petkov; W. M. Hao

    2012-01-01

    The direct multiangle solution may allow inversion of the scanning lidar data even when the requirement of the horizontally stratified atmosphere is poorly met. The solution is based on two principles: (1) The signal measured in zenith is the core source for extracting the information about the atmospheric aerosol loading, and (2) The multiangle signals are used as...

  12. Layout-aware text extraction from full-text PDF of scientific articles.

    PubMed

    Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc

    2012-05-28

    The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Fang; Williams, Travis; Hattrick-Simpers, Jason

    Investment in brighter sources and larger detectors has resulted in an explosive rise in the data collected at synchrotron facilities. Currently, human experts extract scientific information from these data, but they cannot keep pace with the rate of data collection. Here, we present three on-the-fly approaches—attribute extraction, nearest-neighbor distance, and cluster analysis—to quickly segment x-ray diffraction (XRD) data into groups with similar XRD profiles. An expert can then analyze representative spectra from each group in detail with much reduced time, but without loss of scientific insights. As a result, on-the-fly segmentation would, therefore, result in accelerated scientific productivity.

  14. ABM Drag_Pass Report Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat

    2008-01-01

    dragREPORT software was developed in parallel with abmREPORT, which is described in the preceding article. Both programs were built on the capabilities created during that process. This tool generates a drag_pass report that summarizes vital information from the MRO aerobreaking drag_pass build process to facilitate both sequence reviews and provide a high-level summarization of the sequence for mission management. The script extracts information from the ENV, SSF, FRF, SCMFmax, and OPTG files, presenting them in a single, easy-to-check report providing the majority of parameters needed for cross check and verification as part of the sequence review process. Prior to dragReport, all the needed information was spread across a number of different files, each in a different format. This software is a Perl script that extracts vital summarization information and build-process details from a number of source files into a single, concise report format used to aid the MPST sequence review process and to provide a high-level summarization of the sequence for mission management reference. This software could be adapted for future aerobraking missions to provide similar reports, review and summarization information.

  15. An integrated, open-source set of tools for urban vulnerability monitoring from Earth observation data

    NASA Astrophysics Data System (ADS)

    De Vecchi, Daniele; Harb, Mostapha; Dell'Acqua, Fabio; Aurelio Galeazzo, Daniel

    2015-04-01

    Aim: The paper introduces an integrated set of open-source tools designed to process medium and high-resolution imagery with the aim to extract vulnerability indicators [1]. Problem: In the context of risk monitoring [2], a series of vulnerability proxies can be defined, such as the extension of a built-up area or buildings regularity [3]. Different open-source C and Python libraries are already available for image processing and geospatial information (e.g. OrfeoToolbox, OpenCV and GDAL). They include basic processing tools but not vulnerability-oriented workflows. Therefore, it is of significant importance to provide end-users with a set of tools capable to return information at a higher level. Solution: The proposed set of python algorithms is a combination of low-level image processing and geospatial information handling tools along with high-level workflows. In particular, two main products are released under the GPL license: source code, developers-oriented, and a QGIS plugin. These tools were produced within the SENSUM project framework (ended December 2014) where the main focus was on earthquake and landslide risk. Further development and maintenance is guaranteed by the decision to include them in the platform designed within the FP 7 RASOR project . Conclusion: With the lack of a unified software suite for vulnerability indicators extraction, the proposed solution can provide inputs for already available models like the Global Earthquake Model. The inclusion of the proposed set of algorithms within the RASOR platforms can guarantee support and enlarge the community of end-users. Keywords: Vulnerability monitoring, remote sensing, optical imagery, open-source software tools References [1] M. Harb, D. De Vecchi, F. Dell'Acqua, "Remote sensing-based vulnerability proxies in the EU FP7 project SENSUM", Symposium on earthquake and landslide risk in Central Asia and Caucasus: exploiting remote sensing and geo-spatial information management, 29-30th January 2014, Bishkek, Kyrgyz Republic. [2] UNISDR, "Living with Risk", Geneva, Switzerland, 2004. [3] P. Bisch, E. Carvalho, H. Degree, P. Fajfar, M. Fardis, P. Franchin, M. Kreslin, A. Pecker, "Eurocode 8: Seismic Design of Buildings", Lisbon, 2011. (SENSUM: www.sensum-project.eu, grant number: 312972 ) (RASOR: www.rasor-project.eu, grant number: 606888 )

  16. Cluster compression algorithm: A joint clustering/data compression concept

    NASA Technical Reports Server (NTRS)

    Hilbert, E. E.

    1977-01-01

    The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.

  17. Feature Vector Construction Method for IRIS Recognition

    NASA Astrophysics Data System (ADS)

    Odinokikh, G.; Fartukov, A.; Korobkin, M.; Yoo, J.

    2017-05-01

    One of the basic stages of iris recognition pipeline is iris feature vector construction procedure. The procedure represents the extraction of iris texture information relevant to its subsequent comparison. Thorough investigation of feature vectors obtained from iris showed that not all the vector elements are equally relevant. There are two characteristics which determine the vector element utility: fragility and discriminability. Conventional iris feature extraction methods consider the concept of fragility as the feature vector instability without respect to the nature of such instability appearance. This work separates sources of the instability into natural and encodinginduced which helps deeply investigate each source of instability independently. According to the separation concept, a novel approach of iris feature vector construction is proposed. The approach consists of two steps: iris feature extraction using Gabor filtering with optimal parameters and quantization with separated preliminary optimized fragility thresholds. The proposed method has been tested on two different datasets of iris images captured under changing environmental conditions. The testing results show that the proposed method surpasses all the methods considered as a prior art by recognition accuracy on both datasets.

  18. A semantic model for multimodal data mining in healthcare information systems.

    PubMed

    Iakovidis, Dimitris; Smailis, Christos

    2012-01-01

    Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.

  19. Multi-source feature extraction and target recognition in wireless sensor networks based on adaptive distributed wavelet compression algorithms

    NASA Astrophysics Data System (ADS)

    Hortos, William S.

    2008-04-01

    Proposed distributed wavelet-based algorithms are a means to compress sensor data received at the nodes forming a wireless sensor network (WSN) by exchanging information between neighboring sensor nodes. Local collaboration among nodes compacts the measurements, yielding a reduced fused set with equivalent information at far fewer nodes. Nodes may be equipped with multiple sensor types, each capable of sensing distinct phenomena: thermal, humidity, chemical, voltage, or image signals with low or no frequency content as well as audio, seismic or video signals within defined frequency ranges. Compression of the multi-source data through wavelet-based methods, distributed at active nodes, reduces downstream processing and storage requirements along the paths to sink nodes; it also enables noise suppression and more energy-efficient query routing within the WSN. Targets are first detected by the multiple sensors; then wavelet compression and data fusion are applied to the target returns, followed by feature extraction from the reduced data; feature data are input to target recognition/classification routines; targets are tracked during their sojourns through the area monitored by the WSN. Algorithms to perform these tasks are implemented in a distributed manner, based on a partition of the WSN into clusters of nodes. In this work, a scheme of collaborative processing is applied for hierarchical data aggregation and decorrelation, based on the sensor data itself and any redundant information, enabled by a distributed, in-cluster wavelet transform with lifting that allows multiple levels of resolution. The wavelet-based compression algorithm significantly decreases RF bandwidth and other resource use in target processing tasks. Following wavelet compression, features are extracted. The objective of feature extraction is to maximize the probabilities of correct target classification based on multi-source sensor measurements, while minimizing the resource expenditures at participating nodes. Therefore, the feature-extraction method based on the Haar DWT is presented that employs a maximum-entropy measure to determine significant wavelet coefficients. Features are formed by calculating the energy of coefficients grouped around the competing clusters. A DWT-based feature extraction algorithm used for vehicle classification in WSNs can be enhanced by an added rule for selecting the optimal number of resolution levels to improve the correct classification rate and reduce energy consumption expended in local algorithm computations. Published field trial data for vehicular ground targets, measured with multiple sensor types, are used to evaluate the wavelet-assisted algorithms. Extracted features are used in established target recognition routines, e.g., the Bayesian minimum-error-rate classifier, to compare the effects on the classification performance of the wavelet compression. Simulations of feature sets and recognition routines at different resolution levels in target scenarios indicate the impact on classification rates, while formulas are provided to estimate reduction in resource use due to distributed compression.

  20. Context-based electronic health record: toward patient specific healthcare.

    PubMed

    Hsu, William; Taira, Ricky K; El-Saden, Suzie; Kangarloo, Hooshang; Bui, Alex A T

    2012-03-01

    Due to the increasingly data-intensive clinical environment, physicians now have unprecedented access to detailed clinical information from a multitude of sources. However, applying this information to guide medical decisions for a specific patient case remains challenging. One issue is related to presenting information to the practitioner: displaying a large (irrelevant) amount of information often leads to information overload. Next-generation interfaces for the electronic health record (EHR) should not only make patient data easily searchable and accessible, but also synthesize fragments of evidence documented in the entire record to understand the etiology of a disease and its clinical manifestation in individual patients. In this paper, we describe our efforts toward creating a context-based EHR, which employs biomedical ontologies and (graphical) disease models as sources of domain knowledge to identify relevant parts of the record to display. We hypothesize that knowledge (e.g., variables, relationships) from these sources can be used to standardize, annotate, and contextualize information from the patient record, improving access to relevant parts of the record and informing medical decision making. To achieve this goal, we describe a framework that aggregates and extracts findings and attributes from free-text clinical reports, maps findings to concepts in available knowledge sources, and generates a tailored presentation of the record based on the information needs of the user. We have implemented this framework in a system called Adaptive EHR, demonstrating its capabilities to present and synthesize information from neurooncology patients. This paper highlights the challenges and potential applications of leveraging disease models to improve the access, integration, and interpretation of clinical patient data. © 2012 IEEE

  1. Operation and Applications of the Boron Cathodic Arc Ion Source

    NASA Astrophysics Data System (ADS)

    Williams, J. M.; Klepper, C. C.; Chivers, D. J.; Hazelton, R. C.; Freeman, J. H.

    2008-11-01

    The boron cathodic arc ion source has been developed with a view to several applications, particularly the problem of shallow junction doping in semiconductors. Research has included not only development and operation of the boron cathode, but other cathode materials as well. Applications have included a large deposition directed toward development of a neutron detector and another deposition for an orthopedic coating, as well as the shallow ion implantation function. Operational experience is described and information pertinent to commercial operation, extracted from these experiments, is presented.

  2. Proceedings of the 8th Matched-Field Processing Workshop, 12-14 June 1996,

    DTIC Science & Technology

    1996-10-01

    and M. B. Porter Active Matched-Field Tracking (AMFT) ............................................ 29 Homer Bucker Matched-Field Track - Before - Detect (TBD...CD I- z - .4 U) - U :T 0 4,) 0j w CfI -ID 0 ci) CD) CD CD o0 0 C 0D CD 0C o 00 Matched-Field Track - Before - Detect (TBD) Processing using SWellEX...surfaces are used in a source-track search. Track - before - detect (TBD) processing makes use of this technique to extract source track information so that the

  3. Obtention and characterization of phenolic extracts from different cocoa sources.

    PubMed

    Ortega, Nàdia; Romero, Maria-Paz; Macià, Alba; Reguant, Jordi; Anglès, Neus; Morelló, José-Ramón; Motilva, Maria-Jose

    2008-10-22

    The aim of this study was to evaluate several cocoa sources to obtain a rich phenol extract for use as an ingredient in the food industry. Two types of phenolic extracts, complete and purified, from different cocoa sources (beans, nibs, liquor, and cocoa powder) were investigated. UPLC-MS/MS was used to identify and quantify the phenolic composition of the extracts, and the Folin-Ciocalteu and vanillin assays were used to determine the total phenolic and flavan-3-ol contents, respectively. The DPPH and ORAC assays were used to measure their antioxidant activity. The results of the analysis of the composition of the extracts revealed that the major fraction was procyanidins, followed by flavones and phenolic acids. From the obtained results, the nib could be considered the most interesting source for obtaining a rich phenolic cocoa extract because of its rich phenolic profile content and high antioxidant activity in comparison with the other cocoa sources.

  4. Synthesising quantitative and qualitative research in evidence-based patient information.

    PubMed

    Goldsmith, Megan R; Bankhead, Clare R; Austoker, Joan

    2007-03-01

    Systematic reviews have, in the past, focused on quantitative studies and clinical effectiveness, while excluding qualitative evidence. Qualitative research can inform evidence-based practice independently of other research methodologies but methods for the synthesis of such data are currently evolving. Synthesising quantitative and qualitative research in a single review is an important methodological challenge. This paper describes the review methods developed and the difficulties encountered during the process of updating a systematic review of evidence to inform guidelines for the content of patient information related to cervical screening. Systematic searches of 12 electronic databases (January 1996 to July 2004) were conducted. Studies that evaluated the content of information provided to women about cervical screening or that addressed women's information needs were assessed for inclusion. A data extraction form and quality assessment criteria were developed from published resources. A non-quantitative synthesis was conducted and a tabular evidence profile for each important outcome (eg "explain what the test involves") was prepared. The overall quality of evidence for each outcome was then assessed using an approach published by the GRADE working group, which was adapted to suit the review questions and modified to include qualitative research evidence. Quantitative and qualitative studies were considered separately for every outcome. 32 papers were included in the systematic review following data extraction and assessment of methodological quality. The review questions were best answered by evidence from a range of data sources. The inclusion of qualitative research, which was often highly relevant and specific to many components of the screening information materials, enabled the production of a set of recommendations that will directly affect policy within the NHS Cervical Screening Programme. A practical example is provided of how quantitative and qualitative data sources might successfully be brought together and considered in one review.

  5. WHATIF: an open-source desktop application for extraction and management of the incidental findings from next-generation sequencing variant data

    PubMed Central

    Ye, Zhan; Kadolph, Christopher; Strenn, Robert; Wall, Daniel; McPherson, Elizabeth; Lin, Simon

    2015-01-01

    Background Identification and evaluation of incidental findings in patients following whole exome (WGS) or whole genome sequencing (WGS) is challenging for both practicing physicians and researchers. The American College of Medical Genetics and Genomics (ACMG) recently recommended a list of reportable incidental genetic findings. However, no informatics tools are currently available to support evaluation of incidental findings in next-generation sequencing data. Methods The Wisconsin Hierarchical Analysis Tool for Incidental Findings (WHATIF), was developed as a stand-alone Windows-based desktop executable, to support the interactive analysis of incidental findings in the context of the ACMG recommendations. WHATIF integrates the European Bioinformatics Institute Variant Effect Predictor (VEP) tool for biological interpretation and the National Center for Biotechnology Information ClinVar tool for clinical interpretation. Results An open-source desktop program was created to annotate incidental findings and present the results with a user-friendly interface. Further, a meaningful index (WHATIF Index) was devised for each gene to facilitate ranking of the relative importance of the variants and estimate the potential workload associated with further evaluation of the variants. Our WHATIF application is available at: http://tinyurl.com/WHATIF-SOFTWARE Conclusions The WHATIF application offers a user-friendly interface and allows users to investigate the extracted variant information efficiently and intuitively while always accessing the up to date information on variants via application programming interfaces (API) connections. WHATIF’s highly flexible design and straightforward implementation aids users in customizing the source code to meet their own special needs. PMID:25890833

  6. An annotated corpus with nanomedicine and pharmacokinetic parameters

    PubMed Central

    Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

    2017-01-01

    A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. PMID:29066897

  7. Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

    Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.

  8. Quantification method for the appearance of melanin pigmentation using independent component analysis

    NASA Astrophysics Data System (ADS)

    Ojima, Nobutoshi; Okiyama, Natsuko; Okaguchi, Saya; Tsumura, Norimichi; Nakaguchi, Toshiya; Hori, Kimihiko; Miyake, Yoichi

    2005-04-01

    In the cosmetics industry, skin color is very important because skin color gives a direct impression of the face. In particular, many people suffer from melanin pigmentation such as liver spots and freckles. However, it is very difficult to evaluate melanin pigmentation using conventional colorimetric values because these values contain information on various skin chromophores simultaneously. Therefore, it is necessary to extract information of the chromophore of individual skins independently as density information. The isolation of the melanin component image based on independent component analysis (ICA) from a single skin image was reported in 2003. However, this technique has not developed a quantification method for melanin pigmentation. This paper introduces a quantification method based on the ICA of a skin color image to isolate melanin pigmentation. The image acquisition system we used consists of commercially available equipment such as digital cameras and lighting sources with polarized light. The images taken were analyzed using ICA to extract the melanin component images, and Laplacian of Gaussian (LOG) filter was applied to extract the pigmented area. As a result, for skin images including those showing melanin pigmentation and acne, the method worked well. Finally, the total amount of extracted area had a strong correspondence to the subjective rating values for the appearance of pigmentation. Further analysis is needed to recognize the appearance of pigmentation concerning the size of the pigmented area and its spatial gradation.

  9. Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

    PubMed

    Elayavilli, Ravikumar Komandur; Liu, Hongfang

    2016-01-01

    Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.

  10. Performance of multi-aperture grid extraction systems for an ITER-relevant RF-driven negative hydrogen ion source

    NASA Astrophysics Data System (ADS)

    Franzen, P.; Gutser, R.; Fantz, U.; Kraus, W.; Falter, H.; Fröschle, M.; Heinemann, B.; McNeely, P.; Nocentini, R.; Riedl, R.; Stäbler, A.; Wünderlich, D.

    2011-07-01

    The ITER neutral beam system requires a negative hydrogen ion beam of 48 A with an energy of 0.87 MeV, and a negative deuterium beam of 40 A with an energy of 1 MeV. The beam is extracted from a large ion source of dimension 1.9 × 0.9 m2 by an acceleration system consisting of seven grids with 1280 apertures each. Currently, apertures with a diameter of 14 mm in the first grid are foreseen. In 2007, the IPP RF source was chosen as the ITER reference source due to its reduced maintenance compared with arc-driven sources and the successful development at the BATMAN test facility of being equipped with the small IPP prototype RF source ( {\\sim}\\frac{1}{8} of the area of the ITER NBI source). These results, however, were obtained with an extraction system with 8 mm diameter apertures. This paper reports on the comparison of the source performance at BATMAN of an ITER-relevant extraction system equipped with chamfered apertures with a 14 mm diameter and 8 mm diameter aperture extraction system. The most important result is that there is almost no difference in the achieved current density—being consistent with ion trajectory calculations—and the amount of co-extracted electrons. Furthermore, some aspects of the beam optics of both extraction systems are discussed.

  11. Hot dry rock geothermal energy development program. Semiannual report, October 1, 1978-March 31, 1979

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, M.C.; Nunz, G.J.; Cremer, G.M.

    1979-09-01

    The potential of energy extracted from hot dry rock (HDR) was investigated as a commercailly feasible alternate energy source. Run Segments 3 and 4 were completed in the prototype reservoir of the Phase I energy-extraction system at Fenton Hill, New Mexico. Results of these tests yielded significant data on the existing system and this information will be applicable to future HDR systems. Plans and operations initiating a Phase II system are underway at the Fenton Hill site. This system, a deeper, hotter commercial-size reservoir, is intended to demonstrate the longevity and economics of an HDR system. Major activity occurred inmore » evaluation of the national resource potential and in characterizing possible future HDR geothermal sites. Work has begun in the institutional and industrial support area to assess the economics and promote commercial interest in HDR systems as an alternate energy source.« less

  12. Layout-aware text extraction from full-text PDF of scientific articles

    PubMed Central

    2012-01-01

    Background The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. Conclusions LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/. PMID:22640904

  13. Mining biomedical images towards valuable information retrieval in biomedical and life sciences

    PubMed Central

    Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas

    2016-01-01

    Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. PMID:27538578

  14. Newspaper archives + text mining = rich sources of historical geo-spatial data

    NASA Astrophysics Data System (ADS)

    Yzaguirre, A.; Smit, M.; Warren, R.

    2016-04-01

    Newspaper archives are rich sources of cultural, social, and historical information. These archives, even when digitized, are typically unstructured and organized by date rather than by subject or location, and require substantial manual effort to analyze. The effort of journalists to be accurate and precise means that there is often rich geo-spatial data embedded in the text, alongside text describing events that editors considered to be of sufficient importance to the region or the world to merit column inches. A regional newspaper can add over 100,000 articles to its database each year, and extracting information from this data for even a single country would pose a substantial Big Data challenge. In this paper, we describe a pilot study on the construction of a database of historical flood events (location(s), date, cause, magnitude) to be used in flood assessment projects, for example to calibrate models, estimate frequency, establish high water marks, or plan for future events in contexts ranging from urban planning to climate change adaptation. We then present a vision for extracting and using the rich geospatial data available in unstructured text archives, and suggest future avenues of research.

  15. UK surveillance: provision of quality assured information from combined datasets.

    PubMed

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  16. Assessment of parental awareness about malocclusion in Shiraz, Islamic Republic of Iran.

    PubMed

    Danaei, S Momeni; Oshagh, M; Pajuhi, N; Ghahremani, Y; Bushehri, Ghodsi S

    2011-07-01

    Information empowers people to take charge of their health. The aim of this study in Shiraz, Islamic Republic of Iran was to evaluate parents' knowledge about dental malocclusion, referral routes and information sources. A random sample of 1000 7-9-year-old schoolchildren were given a questionnaire to complete at home. Questionnaires were completed by 795 parents. Knowledge about malocclusion was significantly greater in families with higher levels of education and income. Most respondents (83.5%) were aware of the importance of maintaining primary teeth to prevent malocclusion, and 25.1% thought that carious primary teeth must be extracted. Half of the parents (50.6%) did not know that spaces between primary teeth are normal. Only 28.8% of the children visited dentists for annual routine check-ups. Television (43.3%) was the most common source of dental information. The level of general public awareness about malocclusion needs to be improved.

  17. Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development.

    PubMed

    McEntire, Robin; Szalkowski, Debbie; Butler, James; Kuo, Michelle S; Chang, Meiping; Chang, Man; Freeman, Darren; McQuay, Sarah; Patel, Jagruti; McGlashen, Michael; Cornell, Wendy D; Xu, Jinghai James

    2016-05-01

    External content sources such as MEDLINE(®), National Institutes of Health (NIH) grants and conference websites provide access to the latest breaking biomedical information, which can inform pharmaceutical and biotechnology company pipeline decisions. The value of the sites for industry, however, is limited by the use of the public internet, the limited synonyms, the rarity of batch searching capability and the disconnected nature of the sites. Fortunately, many sites now offer their content for download and we have developed an automated internal workflow that uses text mining and tailored ontologies for programmatic search and knowledge extraction. We believe such an efficient and secure approach provides a competitive advantage to companies needing access to the latest information for a range of use cases and complements manually curated commercial sources. Copyright © 2016. Published by Elsevier Ltd.

  18. Obtaining lutein-rich extract from microalgal biomass at preparative scale.

    PubMed

    Fernández-Sevilla, José M; Fernández, F Gabriel Acién; Grima, Emilio Molina

    2012-01-01

    Lutein extracts are in increasing demand due to their alleged role in the prevention of degenerative disorders such as age-related macular degeneration (AMD). Lutein extracts are currently obtained from plant sources, but microalgae have been demonstrated to be a competitive source likely to become an alternative. The extraction of lutein from microalgae posesses specific problems that arise from the different structure and composition of the source biomass. Here is presented a method for the recovery of lutein-rich carotenoid extracts from microalgal biomass in the kilogram scale.

  19. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

    PubMed

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2016-10-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets ( e.g. , application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

  20. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora

    PubMed Central

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2017-01-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes. PMID:28210517

  1. Extraction of multi-scale landslide morphological features based on local Gi* using airborne LiDAR-derived DEM

    NASA Astrophysics Data System (ADS)

    Shi, Wenzhong; Deng, Susu; Xu, Wenbing

    2018-02-01

    For automatic landslide detection, landslide morphological features should be quantitatively expressed and extracted. High-resolution Digital Elevation Models (DEMs) derived from airborne Light Detection and Ranging (LiDAR) data allow fine-scale morphological features to be extracted, but noise in DEMs influences morphological feature extraction, and the multi-scale nature of landslide features should be considered. This paper proposes a method to extract landslide morphological features characterized by homogeneous spatial patterns. Both profile and tangential curvature are utilized to quantify land surface morphology, and a local Gi* statistic is calculated for each cell to identify significant patterns of clustering of similar morphometric values. The method was tested on both synthetic surfaces simulating natural terrain and airborne LiDAR data acquired over an area dominated by shallow debris slides and flows. The test results of the synthetic data indicate that the concave and convex morphologies of the simulated terrain features at different scales and distinctness could be recognized using the proposed method, even when random noise was added to the synthetic data. In the test area, cells with large local Gi* values were extracted at a specified significance level from the profile and the tangential curvature image generated from the LiDAR-derived 1-m DEM. The morphologies of landslide main scarps, source areas and trails were clearly indicated, and the morphological features were represented by clusters of extracted cells. A comparison with the morphological feature extraction method based on curvature thresholds proved the proposed method's robustness to DEM noise. When verified against a landslide inventory, the morphological features of almost all recent (< 5 years) landslides and approximately 35% of historical (> 10 years) landslides were extracted. This finding indicates that the proposed method can facilitate landslide detection, although the cell clusters extracted from curvature images should be filtered using a filtering strategy based on supplementary information provided by expert knowledge or other data sources.

  2. Context Oriented Information Integration

    NASA Astrophysics Data System (ADS)

    Mohania, Mukesh; Bhide, Manish; Roy, Prasan; Chakaravarthy, Venkatesan T.; Gupta, Himanshu

    Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. Academicians have focused on this problem but there still remain a lot of obstacles for its widespread use in practice. One of the key problems is the absence of schema in unstructured text. In this paper we present a new paradigm for integrating information which overcomes this problem - that of Context Oriented Information Integration. The goal is to integrate unstructured data with the structured data present in the enterprise and use the extracted information to generate actionable insights for the enterprise. We present two techniques which enable context oriented information integration and show how they can be used for solving real world problems.

  3. Pattern Mining for Extraction of mentions of Adverse Drug Reactions from User Comments

    PubMed Central

    Nikfarjam, Azadeh; Gonzalez, Graciela H.

    2011-01-01

    Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%. PMID:22195162

  4. [An Extraction and Recognition Method of the Distributed Optical Fiber Vibration Signal Based on EMD-AWPP and HOSA-SVM Algorithm].

    PubMed

    Zhang, Yanjun; Liu, Wen-zhe; Fu, Xing-hu; Bi, Wei-hong

    2016-02-01

    Given that the traditional signal processing methods can not effectively distinguish the different vibration intrusion signal, a feature extraction and recognition method of the vibration information is proposed based on EMD-AWPP and HOSA-SVM, using for high precision signal recognition of distributed fiber optic intrusion detection system. When dealing with different types of vibration, the method firstly utilizes the adaptive wavelet processing algorithm based on empirical mode decomposition effect to reduce the abnormal value influence of sensing signal and improve the accuracy of signal feature extraction. Not only the low frequency part of the signal is decomposed, but also the high frequency part the details of the signal disposed better by time-frequency localization process. Secondly, it uses the bispectrum and bicoherence spectrum to accurately extract the feature vector which contains different types of intrusion vibration. Finally, based on the BPNN reference model, the recognition parameters of SVM after the implementation of the particle swarm optimization can distinguish signals of different intrusion vibration, which endows the identification model stronger adaptive and self-learning ability. It overcomes the shortcomings, such as easy to fall into local optimum. The simulation experiment results showed that this new method can effectively extract the feature vector of sensing information, eliminate the influence of random noise and reduce the effects of outliers for different types of invasion source. The predicted category identifies with the output category and the accurate rate of vibration identification can reach above 95%. So it is better than BPNN recognition algorithm and improves the accuracy of the information analysis effectively.

  5. Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation

    PubMed Central

    Audeh, Bissan; Beigbeder, Michel; Zimmermann, Antoine; Jaillon, Philippe; Bousquet, Cédric

    2017-01-01

    The extraction of information from social media is an essential yet complicated step for data analysis in multiple domains. In this paper, we present Vigi4Med Scraper, a generic open source framework for extracting structured data from web forums. Our framework is highly configurable; using a configuration file, the user can freely choose the data to extract from any web forum. The extracted data are anonymized and represented in a semantic structure using Resource Description Framework (RDF) graphs. This representation enables efficient manipulation by data analysis algorithms and allows the collected data to be directly linked to any existing semantic resource. To avoid server overload, an integrated proxy with caching functionality imposes a minimal delay between sequential requests. Vigi4Med Scraper represents the first step of Vigi4Med, a project to detect adverse drug reactions (ADRs) from social networks founded by the French drug safety agency Agence Nationale de Sécurité du Médicament (ANSM). Vigi4Med Scraper has successfully extracted greater than 200 gigabytes of data from the web forums of over 20 different websites. PMID:28122056

  6. A drunken search in crystallization space

    PubMed Central

    Fazio, Vincent J.; Peat, Thomas S.; Newman, Janet

    2014-01-01

    The REMARK280 field of the Protein Data Bank is the richest open source of successful crystallization information. The REMARK280 field is optional and currently uncurated, so significant effort needs to be applied to extract reliable data. There are well over 15 000 crystallization conditions available commercially from 12 different vendors. After putting the PDB crystallization information and the commercial cocktail data into a consistent format, these data are used to extract information about the overlap between the two sets of crystallization conditions. An estimation is made as to which commercially available conditions are most appropriate for producing well diffracting crystals by looking at which commercial conditions are found unchanged (or almost unchanged) in the PDB. Further analyses include which commercial kits are the most appropriate for shotgun or more traditional approaches to crystallization screening. This analysis suggests that almost 40% of the crystallization conditions found currently in the PDB are identical or very similar to a commercial condition. PMID:25286930

  7. Review of particle-in-cell modeling for the extraction region of large negative hydrogen ion sources for fusion

    NASA Astrophysics Data System (ADS)

    Wünderlich, D.; Mochalskyy, S.; Montellano, I. M.; Revel, A.

    2018-05-01

    Particle-in-cell (PIC) codes are used since the early 1960s for calculating self-consistently the motion of charged particles in plasmas, taking into account external electric and magnetic fields as well as the fields created by the particles itself. Due to the used very small time steps (in the order of the inverse plasma frequency) and mesh size, the computational requirements can be very high and they drastically increase with increasing plasma density and size of the calculation domain. Thus, usually small computational domains and/or reduced dimensionality are used. In the last years, the available central processing unit (CPU) power strongly increased. Together with a massive parallelization of the codes, it is now possible to describe in 3D the extraction of charged particles from a plasma, using calculation domains with an edge length of several centimeters, consisting of one extraction aperture, the plasma in direct vicinity of the aperture, and a part of the extraction system. Large negative hydrogen or deuterium ion sources are essential parts of the neutral beam injection (NBI) system in future fusion devices like the international fusion experiment ITER and the demonstration reactor (DEMO). For ITER NBI RF driven sources with a source area of 0.9 × 1.9 m2 and 1280 extraction apertures will be used. The extraction of negative ions is accompanied by the co-extraction of electrons which are deflected onto an electron dump. Typically, the maximum negative extracted ion current is limited by the amount and the temporal instability of the co-extracted electrons, especially for operation in deuterium. Different PIC codes are available for the extraction region of large driven negative ion sources for fusion. Additionally, some effort is ongoing in developing codes that describe in a simplified manner (coarser mesh or reduced dimensionality) the plasma of the whole ion source. The presentation first gives a brief overview of the current status of the ion source development for ITER NBI and of the PIC method. Different PIC codes for the extraction region are introduced as well as the coupling to codes describing the whole source (PIC codes or fluid codes). Presented and discussed are different physical and numerical aspects of applying PIC codes to negative hydrogen ion sources for fusion as well as selected code results. The main focus of future calculations will be the meniscus formation and identifying measures for reducing the co-extracted electrons, in particular for deuterium operation. The recent results of the 3D PIC code ONIX (calculation domain: one extraction aperture and its vicinity) for the ITER prototype source (1/8 size of the ITER NBI source) are presented.

  8. PASTE: patient-centered SMS text tagging in a medication management system

    PubMed Central

    Johnson, Kevin B; Denny, Joshua C

    2011-01-01

    Objective To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Design Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. Measurements A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Results Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Conclusion Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages. PMID:21984605

  9. Aviation obstacle auto-extraction using remote sensing information

    NASA Astrophysics Data System (ADS)

    Zimmer, N.; Lugsch, W.; Ravenscroft, D.; Schiefele, J.

    2008-10-01

    An Obstacle, in the aviation context, may be any natural, man-made, fixed or movable object, permanent or temporary. Currently, the most common way to detect relevant aviation obstacles from an aircraft or helicopter for navigation purposes and collision avoidance is the use of merged infrared and synthetic information of obstacle data. Several algorithms have been established to utilize synthetic and infrared images to generate obstacle information. There might be a situation however where the system is error-prone and may not be able to consistently determine the current environment. This situation can be avoided when the system knows the true position of the obstacle. The quality characteristics of the obstacle data strongly depends on the quality of the source data such as maps and official publications. In some countries such as newly industrializing and developing countries, quality and quantity of obstacle information is not available. The aviation world has two specifications - RTCA DO-276A and ICAO ANNEX 15 Ch. 10 - which describe the requirements for aviation obstacles. It is essential to meet these requirements to be compliant with the specifications and to support systems based on these specifications, e.g. 3D obstacle warning systems where accurate coordinates based on WGS-84 is a necessity. Existing aerial and satellite or soon to exist high quality remote sensing data makes it feasible to think about automated aviation obstacle data origination. This paper will describe the feasibility to auto-extract aviation obstacles from remote sensing data considering limitations of image and extraction technologies. Quality parameters and possible resolution of auto-extracted obstacle data will be discussed and presented.

  10. Advanced Optimal Extraction for the Spitzer/IRS

    NASA Astrophysics Data System (ADS)

    Lebouteiller, V.; Bernard-Salas, J.; Sloan, G. C.; Barry, D. J.

    2010-02-01

    We present new advances in the spectral extraction of pointlike sources adapted to the Infrared Spectrograph (IRS) on board the Spitzer Space Telescope. For the first time, we created a supersampled point-spread function of the low-resolution modules. We describe how to use the point-spread function to perform optimal extraction of a single source and of multiple sources within the slit. We also examine the case of the optimal extraction of one or several sources with a complex background. The new algorithms are gathered in a plug-in called AdOpt which is part of the SMART data analysis software.

  11. Term Coverage of Dietary Supplements Ingredients in Product Labels.

    PubMed

    Wang, Yefeng; Adam, Terrence J; Zhang, Rui

    2016-01-01

    As the clinical application and consumption of dietary supplements has grown, their side effects and possible interactions with prescribed medications has become a serious issue. Information extraction of dietary supplement related information is a critical need to support dietary supplement research. However, there currently is not an existing terminology for dietary supplements, placing a barrier for informatics research in this field. The terms related to dietary supplement ingredients should be collected and normalized before a terminology can be established to facilitate convenient search on safety information and control possible adverse effects of dietary supplements. In this study, the Dietary Supplement Label Database (DSLD) was chosen as the data source from which the ingredient information was extracted and normalized. The distribution based on the product type and the ingredient type of the dietary supplements were analyzed. The ingredient terms were then mapped to the existing terminologies, including UMLS, RxNorm and NDF-RT by using MetaMap and RxMix. The large gap between existing terminologies and ingredients were found: only 14.67%, 19.65%, and 12.88% of ingredient terms were covered by UMLS, RxNorm and NDF-RT, respectively.

  12. Automatic identification of comparative effectiveness research from Medline citations to support clinicians’ treatment information needs

    PubMed Central

    Zhang, Mingyuan; Fiol, Guilherme Del; Grout, Randall W.; Jonnalagadda, Siddhartha; Medlin, Richard; Mishra, Rashmi; Weir, Charlene; Liu, Hongfang; Mostafa, Javed; Fiszman, Marcelo

    2014-01-01

    Online knowledge resources such as Medline can address most clinicians’ patient care information needs. Yet, significant barriers, notably lack of time, limit the use of these sources at the point of care. The most common information needs raised by clinicians are treatment-related. Comparative effectiveness studies allow clinicians to consider multiple treatment alternatives for a particular problem. Still, solutions are needed to enable efficient and effective consumption of comparative effectiveness research at the point of care. Objective Design and assess an algorithm for automatically identifying comparative effectiveness studies and extracting the interventions investigated in these studies. Methods The algorithm combines semantic natural language processing, Medline citation metadata, and machine learning techniques. We assessed the algorithm in a case study of treatment alternatives for depression. Results Both precision and recall for identifying comparative studies was 0.83. A total of 86% of the interventions extracted perfectly or partially matched the gold standard. Conclusion Overall, the algorithm achieved reasonable performance. The method provides building blocks for the automatic summarization of comparative effectiveness research to inform point of care decision-making. PMID:23920677

  13. Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.

    PubMed

    Ng; Wong

    1999-01-01

    We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.

  14. Revealing the properties of oils from their dissolved hydrocarbon compounds in water with an integrated sensor array system.

    PubMed

    Qi, Xiubin; Crooke, Emma; Ross, Andrew; Bastow, Trevor P; Stalvies, Charlotte

    2011-09-21

    This paper presents a system and method developed to identify a source oil's characteristic properties by testing the oil's dissolved components in water. Through close examination of the oil dissolution process in water, we hypothesise that when oil is in contact with water, the resulting oil-water extract, a complex hydrocarbon mixture, carries the signature property information of the parent oil. If the dominating differences in compositions between such extracts of different oils can be identified, this information could guide the selection of various sensors, capable of capturing such chemical variations. When used as an array, such a sensor system can be used to determine parent oil information from the oil-water extract. To test this hypothesis, 22 oils' water extracts were prepared and selected dominant hydrocarbons analyzed with Gas Chromatography-Mass Spectrometry (GC-MS); the subsequent Principal Component Analysis (PCA) indicates that the major difference between the extract solutions is the relative concentration between the volatile mono-aromatics and fluorescent polyaromatics. An integrated sensor array system that is composed of 3 volatile hydrocarbon sensors and 2 polyaromatic hydrocarbon sensors was built accordingly to capture the major and subtle differences of these extracts. It was tested by exposure to a total of 110 water extract solutions diluted from the 22 extracts. The sensor response data collected from the testing were processed with two multivariate analysis tools to reveal information retained in the response patterns of the arrayed sensors: by conducting PCA, we were able to demonstrate the ability to qualitatively identify and distinguish different oil samples from their sensor array response patterns. When a supervised PCA, Linear Discriminate Analysis (LDA), was applied, even quantitative classification can be achieved: the multivariate model generated from the LDA achieved 89.7% of successful classification of the type of the oil samples. By grouping the samples based on the level of viscosity and density we were able to reveal the correlation between the oil extracts' sensor array responses and their original oils' feature properties. The equipment and method developed in this study have promising potential to be readily applied in field studies and marine surveys for oil exploration or oil spill monitoring.

  15. Astaxanthin: Sources, Extraction, Stability, Biological Activities and Its Commercial Applications—A Review

    PubMed Central

    Ambati, Ranga Rao; Siew Moi, Phang; Ravi, Sarada; Aswathanarayana, Ravishankar Gokare

    2014-01-01

    There is currently much interest in biological active compounds derived from natural resources, especially compounds that can efficiently act on molecular targets, which are involved in various diseases. Astaxanthin (3,3′-dihydroxy-β, β′-carotene-4,4′-dione) is a xanthophyll carotenoid, contained in Haematococcus pluvialis, Chlorella zofingiensis, Chlorococcum, and Phaffia rhodozyma. It accumulates up to 3.8% on the dry weight basis in H. pluvialis. Our recent published data on astaxanthin extraction, analysis, stability studies, and its biological activities results were added to this review paper. Based on our results and current literature, astaxanthin showed potential biological activity in in vitro and in vivo models. These studies emphasize the influence of astaxanthin and its beneficial effects on the metabolism in animals and humans. Bioavailability of astaxanthin in animals was enhanced after feeding Haematococcus biomass as a source of astaxanthin. Astaxanthin, used as a nutritional supplement, antioxidant and anticancer agent, prevents diabetes, cardiovascular diseases, and neurodegenerative disorders, and also stimulates immunization. Astaxanthin products are used for commercial applications in the dosage forms as tablets, capsules, syrups, oils, soft gels, creams, biomass and granulated powders. Astaxanthin patent applications are available in food, feed and nutraceutical applications. The current review provides up-to-date information on astaxanthin sources, extraction, analysis, stability, biological activities, health benefits and special attention paid to its commercial applications. PMID:24402174

  16. On-the-fly segmentation approaches for x-ray diffraction datasets for metallic glasses

    DOE PAGES

    Ren, Fang; Williams, Travis; Hattrick-Simpers, Jason; ...

    2017-08-30

    Investment in brighter sources and larger detectors has resulted in an explosive rise in the data collected at synchrotron facilities. Currently, human experts extract scientific information from these data, but they cannot keep pace with the rate of data collection. Here, we present three on-the-fly approaches—attribute extraction, nearest-neighbor distance, and cluster analysis—to quickly segment x-ray diffraction (XRD) data into groups with similar XRD profiles. An expert can then analyze representative spectra from each group in detail with much reduced time, but without loss of scientific insights. As a result, on-the-fly segmentation would, therefore, result in accelerated scientific productivity.

  17. Interferometric millimeter wave and THz wave doppler radar

    DOEpatents

    Liao, Shaolin; Gopalsami, Nachappa; Bakhtiari, Sasan; Raptis, Apostolos C.; Elmer, Thomas

    2015-08-11

    A mixerless high frequency interferometric Doppler radar system and methods has been invented, numerically validated and experimentally tested. A continuous wave source, phase modulator (e.g., a continuously oscillating reference mirror) and intensity detector are utilized. The intensity detector measures the intensity of the combined reflected Doppler signal and the modulated reference beam. Rigorous mathematics formulas have been developed to extract bot amplitude and phase from the measured intensity signal. Software in Matlab has been developed and used to extract such amplitude and phase information from the experimental data. Both amplitude and phase are calculated and the Doppler frequency signature of the object is determined.

  18. Population Estimation in Singapore Based on Remote Sensing and Open Data

    NASA Astrophysics Data System (ADS)

    Guo, H.; Cao, K.; Wang, P.

    2017-09-01

    Population estimation statistics are widely used in government, commercial and educational sectors for a variety of purposes. With growing emphases on real-time and detailed population information, data users nowadays have switched from traditional census data to more technology-based data source such as LiDAR point cloud and High-Resolution Satellite Imagery. Nevertheless, such data are costly and periodically unavailable. In this paper, the authors use West Coast District, Singapore as a case study to investigate the applicability and effectiveness of using satellite image from Google Earth for extraction of building footprint and population estimation. At the same time, volunteered geographic information (VGI) is also utilized as ancillary data for building footprint extraction. Open data such as Open Street Map OSM could be employed to enhance the extraction process. In view of challenges in building shadow extraction, this paper discusses several methods including buffer, mask and shape index to improve accuracy. It also illustrates population estimation methods based on building height and number of floor estimates. The results show that the accuracy level of housing unit method on population estimation can reach 92.5 %, which is remarkably accurate. This paper thus provides insights into techniques for building extraction and fine-scale population estimation, which will benefit users such as urban planners in terms of policymaking and urban planning of Singapore.

  19. Walker Ranch 3D seismic images

    DOE Data Explorer

    Robert J. Mellors

    2016-03-01

    Amplitude images (both vertical and depth slices) extracted from 3D seismic reflection survey over area of Walker Ranch area (adjacent to Raft River). Crossline spacing of 660 feet and inline of 165 feet using a Vibroseis source. Processing included depth migration. Micro-earthquake hypocenters on images. Stratigraphic information and nearby well tracks added to images. Images are embedded in a Microsoft Word document with additional information. Exact location and depth restricted for proprietary reasons. Data collection and processing funded by Agua Caliente. Original data remains property of Agua Caliente.

  20. Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources.

    PubMed

    Mao, Jin; Moore, Lisa R; Blank, Carrine E; Wu, Elvis Hsin-Hui; Ackerman, Marcia; Ranade, Sonali; Cui, Hong

    2016-12-13

    The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.

  1. Estimation of Potential Shale Gas Yield Amount and Land Degradation in China by Landcover Distribution regarding Water-Food-Energy and Forest

    NASA Astrophysics Data System (ADS)

    Kim, N.; Heo, S.; Lim, C. H.; Lee, W. K.

    2017-12-01

    Shale gas is gain attention due to the tremendous reserves beneath the earth. The two known high reservoirs are located in United States and China. According to U.S Energy Information Administration China have estimated 7,299 trillion cubic feet of recoverable shale gas and placed as world first reservoir. United States had 665 trillion cubic feet for the shale gas reservoir and placed fourth. Unlike the traditional fossil fuel, spatial distribution of shale gas is considered to be widely spread and the reserved amount and location make the resource as energy source for the next generation. United States dramatically increased the shale gas production. For instance, shale gas production composes more than 50% of total natural gas production whereas China and Canada shale gas produce very small amount of the shale gas. According to U.S Energy Information Administration's report, in 2014 United States produced shale gas almost 40 billion cubic feet per day but China only produced 0.25 billion cubic feet per day. Recently, China's policy had changed to decrease the coal powerplants to reduce the air pollution and the energy stress in China is keep increasing. Shale gas produce less air pollution while producing energy and considered to be clean energy source. Considering the situation of China and characteristics of shale gas, soon the demand of shale gas will increase in China. United States invested 71.7 billion dollars in 2013 but it Chinese government is only proceeding fundamental investment due to land degradation, limited water resources, geological location of the reservoirs.In this study, firstly we reviewed the current system and technology of shale gas extraction such as hydraulic Fracturing. Secondly, listed the possible environmental damages, land degradations, and resource demands for the shale gas extraction. Thirdly, invested the potential shale gas extraction amount in China based on the location of shale gas reservoirs and limited resources for the gas extraction. Fourthly, invested the potential land degradation on agricultural, surface water, and forest in developing shale gas extraction scenario. In conclusion, we suggested possible environmental damages and social impacts from shale gas extraction in China.

  2. Utilization of ontology look-up services in information retrieval for biomedical literature.

    PubMed

    Vishnyakova, Dina; Pasche, Emilie; Lovis, Christian; Ruch, Patrick

    2013-01-01

    With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance efforts. In this context, we developed Ontology Look-up services (OLS), based on NEWT and MeSH vocabularies. Our services were involved in some information retrieval tasks such as gene/disease normalization. The implementation of OLS services significantly accelerated the extraction of particular biomedical facts by structuring and enriching the data context. The results of precision in normalization tasks were boosted on about 20%.

  3. Comparison of Protein Extracts from Various Unicellular Green Sources.

    PubMed

    Teuling, Emma; Wierenga, Peter A; Schrama, Johan W; Gruppen, Harry

    2017-09-13

    Photosynthetic unicellular organisms are considered as promising alternative protein sources. The aim of this study is to understand the extent to which these green sources differ with respect to their gross composition and how these differences affect the final protein isolate. Using mild isolation techniques, proteins were extracted and isolated from four different unicellular sources (Arthrospira (spirulina) maxima, Nannochloropsis gaditana, Tetraselmis impellucida, and Scenedesmus dimorphus). Despite differences in protein contents of the sources (27-62% w/w) and in protein extractability (17-74% w/w), final protein isolates were obtained that had similar protein contents (62-77% w/w) and protein yields (3-9% w/w). Protein solubility as a function of pH was different between the sources and in ionic strength dependency, especially at pH < 4.0. Overall, the characterization and extraction protocol used allows a relatively fast and well-described isolation of purified proteins from novel protein sources.

  4. Comparison of Protein Extracts from Various Unicellular Green Sources

    PubMed Central

    2017-01-01

    Photosynthetic unicellular organisms are considered as promising alternative protein sources. The aim of this study is to understand the extent to which these green sources differ with respect to their gross composition and how these differences affect the final protein isolate. Using mild isolation techniques, proteins were extracted and isolated from four different unicellular sources (Arthrospira (spirulina) maxima, Nannochloropsis gaditana, Tetraselmis impellucida, and Scenedesmus dimorphus). Despite differences in protein contents of the sources (27–62% w/w) and in protein extractability (17–74% w/w), final protein isolates were obtained that had similar protein contents (62–77% w/w) and protein yields (3–9% w/w). Protein solubility as a function of pH was different between the sources and in ionic strength dependency, especially at pH < 4.0. Overall, the characterization and extraction protocol used allows a relatively fast and well-described isolation of purified proteins from novel protein sources. PMID:28701042

  5. Infrared imaging of WENSS radio sources

    NASA Astrophysics Data System (ADS)

    Villani, D.; di Serego Alighieri, S.

    1999-03-01

    We have performed deep imaging in the IR J- and K- bands for three sub-samples of radio sources extracted from the Westerbork Northern Sky Survey, a large low-frequency radio survey containing Ultra Steep Spectrum (USS), Gigahertz Peaked Spectrum (GPS) and Flat Spectrum (FS) sources. We present the results of these IR observations, carried out with the ARcetri Near Infrared CAmera (ARNICA) at the Nordic Optical Telescope (NOT), providing photometric and morphologic information on high redshift radio galaxies and quasars. We find that the radio galaxies contained in our sample do not show the pronounced radio/IR alignment claimed for 3CR sources. IR photometric measurements of the gravitational lens system 1600+434 are also presented. % This paper is based on data obtained at the Nordic Optical Telescope on La Palma (Canary Islands).

  6. Iron deficiency chlorosis in plants as related to Fe sources in soil

    NASA Astrophysics Data System (ADS)

    Díaz, I.; Delgado, A.; de Santiago, A.; del Campillo, M. C.; Torrent, J.

    2012-04-01

    Iron deficiency chlorosis (IDC) is a relevant agricultural problem in many areas of the World where calcareous soils are dominant. Although this problem has been traditionally ascribed to the pH-buffering effect of soil carbonates, the content and type of Fe oxides in soil contribute to explain Fe uptake by plants and the incidence of this problem. During the last two decades, it has been demonstrated Fe extraction with oxalate, related to the content of poorly crystalline Fe oxides, was well-correlated with the chlorophyll content of plants and thus with the incidence of IDC. This reveals the contribution of poorly crystalline Fe oxides in soil to Fe availability to plants in calcareous soils, previously shown in microcosm experiments using ferrihydrite as Fe source in the growing media. In order to supply additional information about the contribution of Fe sources in soil to explain the incidence of IDC and to perform accurate methods to predict it, a set of experiments involving different methods to extract soil Fe and plant cultivation in pots to correlate amounts of extracted Fe with the chlorophyll content of plants (measured using the SPAD chlorophyll meter) were performed. The first experiment involved 21 soils and white lupin cultivation, sequential Fe extraction in soil to study Fe forms, and single extractions (DTPA, rapid oxalate and non-buffered hydroxylamine). After that, a set of experiments in pot involving growing of grapevine rootstocks, chickpea, and sunflower were performed, although in this case only single extractions in soil were done. The Fe fraction more closely related to chlorophyll content in plants (r = 0.5, p < 0.05) was the citrate + ascorbate (CA) extraction, which was the fraction that releases most of the Fe related to poorly crystalline Fe oxides, thus revealing the key role of these compounds in Fe supply to plants. Fe extracted with CA was more correlated with chlorophyll content in plants that oxalate extractable Fe, probably due to a more selective dissolution of poorly crystalline oxides by the former extractant. In general terms, the best correlation between extractable Fe and chlorophyll content in plants was observed with hydroxylamine, which explained from 21 to 72 % of the variance observed in chlorophyll content in plants, greater than the variance explained by the rapid oxalate (11 to 60 %, not always significant) or the classical active calcium carbonate content determination (6 to 56 %, not always significant). Extraction with DTPA provided the worse results, explaining from 18 to 36 % of the variance in chlorophyll content in plants. The good predictive value of the hydroxylamine extraction was explained by its correlation with Fe in poorly crystalline Fe oxides (estimated as CA-extractable Fe) and by its negative correlation with the active calcium carbonate content of soils.

  7. Sources of Information and Behavioral Patterns in Online Health Forums: Observational Study

    PubMed Central

    Friede, Tim; Grabowski, Jens; Koschack, Janka; Makedonski, Philip; Himmel, Wolfgang

    2014-01-01

    Background Increasing numbers of patients are raising their voice in online forums. This shift is welcome as an act of patient autonomy, reflected in the term “expert patient”. At the same time, there is considerable concern that patients can be easily misguided by pseudoscientific research and debate. Little is known about the sources of information used in health-related online forums, how users apply this information, and how they behave in such forums. Objective The intent of the study was to identify (1) the sources of information used in online health-related forums, and (2) the roles and behavior of active forum visitors in introducing and disseminating this information. Methods This observational study used the largest German multiple sclerosis (MS) online forum as a database, analyzing the user debate about the recently proposed and controversial Chronic Cerebrospinal Venous Insufficiency (CCSVI) hypothesis. After extracting all posts and then filtering relevant CCSVI posts between 01 January 2008 and 17 August 2012, we first identified hyperlinks to scientific publications and other information sources used or referenced in the posts. Employing k-means clustering, we then analyzed the users’ preference for sources of information and their general posting habits. Results Of 139,912 posts from 11,997 threads, 8628 posts discussed or at least mentioned CCSVI. We detected hyperlinks pointing to CCSVI-related scientific publications in 31 posts. In contrast, 2829 different URLs were posted to the forum, most frequently referring to social media, such as YouTube or Facebook. We identified a total of 6 different roles of hyperlink posters including Social Media Fans, Organization Followers, and Balanced Source Users. Apart from the large and nonspecific residual category of the “average user”, several specific behavior patterns were identified, such as the small but relevant groups of CCSVI-Focused Responders or CCSVI Activators. Conclusions The bulk of the observed contributions were not based on scientific results, but on various social media sources. These sources seem to contain mostly opinions and personal experience. A small group of people with distinct behavioral patterns played a core role in fuelling the discussion about CCSVI. PMID:24425598

  8. Clinical data integration of distributed data sources using Health Level Seven (HL7) v3-RIM mapping

    PubMed Central

    2011-01-01

    Background Health information exchange and health information integration has become one of the top priorities for healthcare systems across institutions and hospitals. Most organizations and establishments implement health information exchange and integration in order to support meaningful information retrieval among their disparate healthcare systems. The challenges that prevent efficient health information integration for heterogeneous data sources are the lack of a common standard to support mapping across distributed data sources and the numerous and diverse healthcare domains. Health Level Seven (HL7) is a standards development organization which creates standards, but is itself not the standard. They create the Reference Information Model. RIM is developed by HL7's technical committees. It is a standardized abstract representation of HL7 data across all the domains of health care. In this article, we aim to present a design and a prototype implementation of HL7 v3-RIM mapping for information integration of distributed clinical data sources. The implementation enables the user to retrieve and search information that has been integrated using HL7 v3-RIM technology from disparate health care systems. Method and results We designed and developed a prototype implementation of HL7 v3-RIM mapping function to integrate distributed clinical data sources using R-MIM classes from HL7 v3-RIM as a global view along with a collaborative centralized web-based mapping tool to tackle the evolution of both global and local schemas. Our prototype was implemented and integrated with a Clinical Database management Systems CDMS as a plug-in module. We tested the prototype system with some use case scenarios for distributed clinical data sources across several legacy CDMS. The results have been effective in improving information delivery, completing tasks that would have been otherwise difficult to accomplish, and reducing the time required to finish tasks which are used in collaborative information retrieval and sharing with other systems. Conclusions We created a prototype implementation of HL7 v3-RIM mapping for information integration between distributed clinical data sources to promote collaborative healthcare and translational research. The prototype has effectively and efficiently ensured the accuracy of the information and knowledge extractions for systems that have been integrated PMID:22104558

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liscom, W.L.

    This book presents a complete graphic and statistical portrait of the dramatic shifts in global energy flows during the 1970s and the resultant transfer of economic and political power from the industrial nations to the oil-producing states. The information was extracted from government-source documents and compiled in a computer data base. Computer graphics were combined with the data base to produce over 400 full-color graphs. The energy commodities covered are oil, natural gas, coal, nuclear, and conventional electric-power generation. Also included are data on hydroelectric and geothermal power, oil shale, tar sands, and other alternative energy sources. 72 references.

  10. Client-side Skype forensics: an overview

    NASA Astrophysics Data System (ADS)

    Meißner, Tina; Kröger, Knut; Creutzburg, Reiner

    2013-03-01

    IT security and computer forensics are important components in the information technology. In the present study, a client-side Skype forensics is performed. It is designed to explain which kind of user data are stored on a computer and which tools allow the extraction of those data for a forensic investigation. There are described both methods - a manual analysis and an analysis with (mainly) open source tools, respectively.

  11. Characteristics of hemolytic activity induced by the aqueous extract of the Mexican fire coral Millepora complanata.

    PubMed

    García-Arredondo, Alejandro; Murillo-Esquivel, Luis J; Rojas, Alejandra; Sanchez-Rodriguez, Judith

    2014-01-01

    Millepora complanata is a plate-like fire coral common throughout the Caribbean. Contact with this species usually provokes burning pain, erythema and urticariform lesions. Our previous study suggested that the aqueous extract of M. complanata contains non-protein hemolysins that are soluble in water and ethanol. In general, the local damage induced by cnidarian venoms has been associated with hemolysins. The characterization of the effects of these components is important for the understanding of the defense mechanisms of fire corals. In addition, this information could lead to better care for victims of envenomation accidents. An ethanolic extract from the lyophilized aqueous extract was prepared and its hemolytic activity was compared with the hemolysis induced by the denatured aqueous extract. Based on the finding that ethanol failed to induce nematocyst discharge, ethanolic extracts were prepared from artificially bleached and normal M. complanata fragments and their hemolytic activity was tested in order to obtain information about the source of the heat-stable hemolysins. Rodent erythrocytes were more susceptible to the aqueous extract than chicken and human erythrocytes. Hemolytic activity started at ten minutes of incubation and was relatively stable within the range of 28-50°C. When the aqueous extract was preincubated at temperatures over 60°C, hemolytic activity was significantly reduced. The denatured extract induced a slow hemolytic activity (HU50 = 1,050.00 ± 45.85 μg/mL), detectable four hours after incubation, which was similar to that induced by the ethanolic extract prepared from the aqueous extract (HU50 = 1,167.00 ± 54.95 μg/mL). No significant differences were observed between hemolysis induced by ethanolic extracts from bleached and normal fragments, although both activities were more potent than hemolysis induced by the denatured extract. The results showed that the aqueous extract of M. complanata possesses one or more powerful heat-labile hemolytic proteins that are slightly more resistant to temperature than jellyfish venoms. This extract also contains slow thermostable hemolysins highly soluble in ethanol that are probably derived from the body tissues of the hydrozoan.

  12. Indirect tissue electrophoresis: a new method for analyzing solid tissue protein.

    PubMed

    Smith, A C

    1988-01-01

    1. The eye lens core (nucleus) has been a valuable source of molecular biologic information. 2. In these studies, lens nuclei are usually homogenized so that any protein information related to anatomical subdivisions, or layers, of the nucleus is lost. 3. The present report is of a new method, indirect tissue electrophoresis (ITE), which, when applied to fish lens nuclei, permitted (a) automatic correlation of protein information with anatomic layer, (b) production of large, clear electrophoretic patterns even from small tissue samples and (c) detection of more proteins than in liquid extracts of homogenized tissues. 4. ITE seems potentially applicable to a variety of solid tissues.

  13. Improving life sciences information retrieval using semantic web technology.

    PubMed

    Quan, Dennis

    2007-05-01

    The ability to retrieve relevant information is at the heart of every aspect of research and development in the life sciences industry. Information is often distributed across multiple systems and recorded in a way that makes it difficult to piece together the complete picture. Differences in data formats, naming schemes and network protocols amongst information sources, both public and private, must be overcome, and user interfaces not only need to be able to tap into these diverse information sources but must also assist users in filtering out extraneous information and highlighting the key relationships hidden within an aggregated set of information. The Semantic Web community has made great strides in proposing solutions to these problems, and many efforts are underway to apply Semantic Web techniques to the problem of information retrieval in the life sciences space. This article gives an overview of the principles underlying a Semantic Web-enabled information retrieval system: creating a unified abstraction for knowledge using the RDF semantic network model; designing semantic lenses that extract contextually relevant subsets of information; and assembling semantic lenses into powerful information displays. Furthermore, concrete examples of how these principles can be applied to life science problems including a scenario involving a drug discovery dashboard prototype called BioDash are provided.

  14. The virtual library: Coming of age

    NASA Technical Reports Server (NTRS)

    Hunter, Judy F.; Cotter, Gladys A.

    1994-01-01

    With the high speed networking capabilities, multiple media options, and massive amounts of information that exist in electronic format today, the concept of a 'virtual' library or 'library without walls' is becoming viable. In virtual library environment, the information processed goes beyond the traditional definition of documents to include the results of scientific and technical research and development (reports, software, data) recorded in any format or media: electronic, audio, video, or scanned images. Network access to information must include tools to help locate information sources and navigate the networks to connect to the sources, as well as methods to extract the relevant information. Graphical User Interfaces (GUI's) that are intuitive and navigational tools such as Intelligent Gateway Processors (IGP) will provide users with seamless and transparent use of high speed networks to access, organize, and manage information. Traditional libraries will become points of electronic access to information on multiple medias. The emphasis will be towards unique collections of information at each library rather than entire collections at every library. It is no longer a question of whether there is enough information available; it is more a question of how to manage the vast volumes of information. The future equation will involve being able to organize knowledge, manage information, and provide access at the point of origin.

  15. Characterizing Urban Volumetry Using LIDAR Data

    NASA Astrophysics Data System (ADS)

    Santos, T.; Rodrigues, A. M.; Tenedório, J. A.

    2013-05-01

    Urban indicators are efficient tools designed to simplify, quantify and communicate relevant information for land planners. Since urban data has a strong spatial representation, one can use geographical data as the basis for constructing information regarding urban environments. One important source of information about the land status is imagery collected through remote sensing. Afterwards, using digital image processing techniques, thematic detail can be extracted from those images and used to build urban indicators. Most common metrics are based on area (2D) measurements. These include indicators like impervious area per capita or surface occupied by green areas, having usually as primary source a spectral image obtained through a satellite or airborne camera. More recently, laser scanning data has become available for large-scale applications. Such sensors acquire altimetric information and are used to produce Digital Surface Models (DSM). In this context, LiDAR data available for the city is explored along with demographic information, and a framework to produce volumetric (3D) urban indexes is proposed, and measures like Built Volume per capita, Volumetric Density and Volumetric Homogeneity are computed.

  16. Combining TXRF, FT-IR and GC-MS information for identification of inorganic and organic components in black pigments of rock art from Alero Hornillos 2 (Jujuy, Argentina).

    PubMed

    Vázquez, Cristina; Maier, Marta S; Parera, Sara D; Yacobaccio, Hugo; Solá, Patricia

    2008-06-01

    Archaeological samples are complex in composition since they generally comprise a mixture of materials submitted to deterioration factors largely dependent on the environmental conditions. Therefore, the integration of analytical tools such as TXRF, FT-IR and GC-MS can maximize the amount of information provided by the sample. Recently, two black rock art samples of camelid figures at Alero Hornillos 2, an archaeological site located near the town of Susques (Jujuy Province, Argentina), were investigated. TXRF, selected for inorganic information, showed the presence of manganese and iron among other elements, consistent with an iron and manganese oxide as the black pigment. Aiming at the detection of any residual organic compounds, the samples were extracted with a chloroform-methanol mixture and the extracts were analyzed by FT-IR, showing the presence of bands attributable to lipids. Analysis by GC-MS of the carboxylic acid methyl esters prepared from the sample extracts, indicated that the main organic constituents were saturated (C(16:0) and C(18:0)) fatty acids in relative abundance characteristic of degraded animal fat. The presence of minor C(15:0) and C(17:0) fatty acids and branched-chain iso-C(16:0) pointed to a ruminant animal source.

  17. Quali-quantitative analysis of the phenolic fraction of the flowers of Corylus avellana, source of the Italian PGI product "Nocciola di Giffoni": Isolation of antioxidant diarylheptanoids.

    PubMed

    Masullo, Milena; Mari, Angela; Cerulli, Antonietta; Bottone, Alfredo; Kontek, Bogdan; Olas, Beata; Pizza, Cosimo; Piacente, Sonia

    2016-10-01

    There is only limited information available on the chemical composition of the non-edible parts of Corylus avellana, source of the Italian PGI product "Nocciola di Giffoni" (hazelnut). An initial LC-MS profile of the methanolic extract of the male flowers of C. avellana, cultivar 'Tonda di Giffoni' led to the isolation of 12 compounds, of which the structures were elucidated by NMR spectroscopy. These were identified as three previously undescribed diarylheptanoids, named giffonins Q-S, along with nine known compounds. Furthermore, the quantitative determination of the main compounds occurring in the methanolic extract of C. avellana flowers was carried out by an analytical approach based on LC-ESI(QqQ)MS, using the Multiple Reaction Monitoring (MRM) experiment. In order to explore the antioxidant ability of C. avellana flowers, the methanolic extract and the isolated compounds were evaluated for their inhibitory effects on human plasma lipid peroxidation induced by H2O2 and H2O2/Fe(2+), by measuring the concentration of TBARS. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Extracting remanent magnetization from magnetic data inversion

    NASA Astrophysics Data System (ADS)

    Liu, S.; Fedi, M.; Baniamerian, J.; Hu, X.

    2017-12-01

    Remanent magnetization is an important vector parameter of rocks' and ores' magnetism, which is related to the intensity and direction of primary geomagnetic fields at all geological periods and hence shows critical evidences of geological tectonic movement and sedimentary evolution. We extract the remanence information from the distributions of the inverted magnetization vector. Firstly, directions of total magnetization vector are estimated from reduced-to-pole anomaly (max-min algorithm) and by its correlations with other magnitude magnetic transforms such as magnitude magnetic anomaly and normalized source strength. Then we invert data for the magnetization intensity and finally the intensity and direction of the remanent magnetization are separated from the total magnetization vector with a generalized formula of the apparent susceptibility based on a priori information on the Koenigsberger ratio. Our approach is used to investigate the targeted resources and geologic processes of the mining areas in China.

  19. Mining biomedical images towards valuable information retrieval in biomedical and life sciences.

    PubMed

    Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas

    2016-01-01

    Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. © The Author(s) 2016. Published by Oxford University Press.

  20. Information Extraction for System-Software Safety Analysis: Calendar Year 2007 Year-End Report

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.

    2008-01-01

    This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis on the models to identify possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations; 4) perform discrete-time-based simulation on the models to investigate scenarios where these paths may play a role in failures and mishaps; and 5) identify resulting candidate scenarios for software integration testing. This paper describes new challenges in a NASA abort system case, and enhancements made to develop the integrated tool set.

  1. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  2. Systematically Extracting Metal- and Solvent-Related Occupational Information from Free-Text Responses to Lifetime Occupational History Questionnaires

    PubMed Central

    Friesen, Melissa C.; Locke, Sarah J.; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A.; Purdue, Mark; Colt, Joanne S.

    2014-01-01

    Objectives: Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants’ jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Methods: Our study population comprised 2408 subjects, reporting 11991 jobs, from a case–control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert’s independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Results: Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44–51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9–14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Conclusions: Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. PMID:24590110

  3. Usability-driven pruning of large ontologies: the case of SNOMED CT.

    PubMed

    López-García, Pablo; Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan

    2012-06-01

    To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Graph-traversal heuristics provided high coverage (71-96% of terms in the test sets of discharge summaries) at the expense of subset size (17-51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24-55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available.

  4. Investigating the Capability to Extract Impulse Response Functions From Ambient Seismic Noise Using a Mine Collapse Event

    NASA Astrophysics Data System (ADS)

    Kwak, Sangmin; Song, Seok Goo; Kim, Geunyoung; Cho, Chang Soo; Shin, Jin Soo

    2017-10-01

    Using recordings of a mine collapse event (Mw 4.2) in South Korea in January 2015, we demonstrated that the phase and amplitude information of impulse response functions (IRFs) can be effectively retrieved using seismic interferometry. This event is equivalent to a single downward force at shallow depth. Using quantitative metrics, we compared three different seismic interferometry techniques—deconvolution, coherency, and cross correlation—to extract the IRFs between two distant stations with ambient seismic noise data. The azimuthal dependency of the source distribution of the ambient noise was also evaluated. We found that deconvolution is the best method for extracting IRFs from ambient seismic noise within the period band of 2-10 s. The coherency method is also effective if appropriate spectral normalization or whitening schemes are applied during the data processing.

  5. New approaches to health promotion and informatics education using Internet in the Czech Republic.

    PubMed

    Zvárová, J

    2005-01-01

    The paper describes nowadays information technology skills in the Czech Republic. It focuses on informatics education using Internet, ECDL concept and the links between computer literacy among health care professionals and quality of health care. Everyone understands that the main source of wealth of any nation is information management and the efficient transformation of information into knowledge. There appear completely new decisive factors for the economics of the near future based on circulation and exchange information. It is clear that modern health care cannot be built without information and communication technologies. We discuss several approaches how to contribute to some topics of information society in health care, namely the role of electronic health record, structured information, extraction of information from free medical texts and sharing knowledge stored in medical guidelines.

  6. Evaluation of airborne thermal infrared imagery for locating mine drainage sites in the Lower Kettle Creek and Cooks Run Basins, Pennsylvania, USA

    USGS Publications Warehouse

    Sams, James I.; Veloski, Garret

    2003-01-01

    High-resolution airborne thermal infrared (TIR) imagery data were collected over 90.6 km2 (35 mi2) of remote and rugged terrain in the Kettle Creek and Cooks Run Basins, tributaries of the West Branch of the Susquehanna River in north-central Pennsylvania. The purpose of this investigation was to evaluate the effectiveness of TIR for identifying sources of acid mine drainage (AMD) associated with abandoned coal mines. Coal mining from the late 1800s resulted in many AMD sources from abandoned mines in the area. However, very little detailed mine information was available, particularly on the source locations of AMD sites. Potential AMD sources were extracted from airborne TIR data employing custom image processing algorithms and GIS data analysis. Based on field reconnaissance of 103 TIR anomalies, 53 sites (51%) were classified as AMD. The AMD sources had low pH (<4) and elevated concentrations of iron and aluminum. Of the 53 sites, approximately 26 sites could be correlated with sites previously documented as AMD. The other 27 mine discharges identified in the TIR data were previously undocumented. This paper presents a summary of the procedures used to process the TIR data and extract potential mine drainage sites, methods used for field reconnaissance and verification of TIR data, and a brief summary of water-quality data.

  7. Antimicrobial potential of Australian macrofungi extracts against foodborne and other pathogens.

    PubMed

    Bala, Neeraj; Aitken, Elizabeth A B; Cusack, Andrew; Steadman, Kathryn J

    2012-03-01

    Basidiomycetous macrofungi have therapeutic potential due to antimicrobial activity but little information is available for Australian macrofungi. Therefore, the present study investigated 12 Australian basidiomycetous macrofungi, previously shown to have promising activity against Staphylococcus aureus and Escherichia coli, for their antimicrobial potential against a range of other clinically relevant micro-organisms. Fruiting bodies were collected from across Queensland, Australia, freeze-dried and sequentially extracted with water and ethanol. The crude extracts were tested at 10 mg/mL and 1 mg/mL against six pathogens including two Gram-positive and two Gram-negative bacteria along with two fungi using a high throughput 96-well microplate bioassay. A degree of specificity in activity was exhibited by the water extract of Ramaria sp. (Gomphaceae) and the ethanol extracts of Psathyrella sp. (Psathyrellaceae) and Hohenbuehelia sp., which inhibited the growth of the two fungal pathogens used in the assay. Similarly, the ethanol extract of Fomitopsis lilacinogilva (Fomitopsidaceae) was active against the Gram-positive bacteria B. cereus only. Activity against a wider range of the microorganisms used in the assay was exhibited by the ethanol extract of Ramaria sp. and the water extract of Hohenbuehelia sp. (Pleurotaceae). These macrofungi can serve as new sources for the discovery and development of much needed new antimicrobials. Copyright © 2011 John Wiley & Sons, Ltd.

  8. Botany, ethnomedicines, phytochemistry and pharmacology of Himalayan paeony (Paeonia emodi Royle.).

    PubMed

    Ahmad, Mushtaq; Malik, Khafsa; Tariq, Akash; Zhang, Guolin; Yaseen, Ghulam; Rashid, Neelam; Sultana, Shazia; Zafar, Muhammad; Ullah, Kifayat; Khan, Muhammad Pukhtoon Zada

    2018-06-28

    Himalayan paeony (Paeonia emodi Royle.) is an important species used to treat various diseases. This study aimed to compile the detailed traditional medicinal uses, phytochemistry, pharmacology and toxicological investigations on P. emodi. This study also highlights taxonomic validity, quality of experimental designs and shortcomings in previously reported information on Himalayan paeony. The data was extracted from unpublished theses (Pakistan, China, India and Nepal), and different published research articles confined to pharmacology, phytochemistry and antimicrobial activities using different databases through specific keywords. The relevant information regarding medicinal uses, taxonomic/common names, part used, collection and identification source, authentication, voucher specimen number, plant extracts and their characterization, isolation and identification of phytochemicals, methods of study in silico, in vivo or in vitro, model organism used, dose and duration, minimal active concentration, zone of inhibition (antimicrobial study), bioactive compound(s), mechanism of action on single or multiple targets, and toxicological information. P. emodi is reported for diverse medicinal uses with pharmacological properties like antioxidant, nephroprotective, lipoxygenase inhibitory, cognition and oxidative stress release, cytotoxic, anti-inflammatory, antiepileptic, anticonvulsant, haemaglutination, alpha-chymotrypsin inhibitory, hepatoprotective, hepatic chromes and pharmacokinetics of carbamazepine expression, β-glucuronidase inhibitory, spasmolytic and spasmogenic, and airway relaxant. Data confined to its taxonomic validity, shows 10% studies with correct taxonomic name while 90% studies with incorrect taxonomic, pharmacopeial and common names. The literature reviewed, shows lack of collection source (11 reports), without proper source of identification (15 reports), 33 studies without voucher specimen number, 26 reports lack information on authentic herbarium submission and most of the studies (90%) without validation of taxonomic names using recognized databases. In reported methods, 67% studies without characterization of extracts, 25% lack proper dose, 40% without duration and 31% reports lack information on proper controls. Similarly, only 18% studies reports active compound(s) responsible for pharmacological activities, 14% studies show minimal active concentration, only 2.5% studies report mechanism of action on target while none of the reports mentioned in silico approach. P. emodi is endemic to Himalayan region (Pakistan, China, India and Nepal) with diverse traditional therapeutic uses. Majority of reviewed studies showed confusion in its taxonomic validity, incomplete methodologies and ambiguous findings. Keeping in view the immense uses of P. emodi in various traditional medicinal systems, holistic pharmacological approaches in combination with reverse pharmacology, system biology, and "omics" technologies are recommended to improve the quality of research which leads to natural drug discovery development at global perspectives. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database.

    PubMed

    Cotto, Kelsy C; Wagner, Alex H; Feng, Yang-Yang; Kiwala, Susanna; Coffman, Adam C; Spies, Gregory; Wollam, Alex; Spies, Nicholas C; Griffith, Obi L; Griffith, Malachi

    2018-01-04

    The drug-gene interaction database (DGIdb, www.dgidb.org) consolidates, organizes and presents drug-gene interactions and gene druggability information from papers, databases and web resources. DGIdb normalizes content from 30 disparate sources and allows for user-friendly advanced browsing, searching and filtering for ease of access through an intuitive web user interface, application programming interface (API) and public cloud-based server image. DGIdb v3.0 represents a major update of the database. Nine of the previously included 24 sources were updated. Six new resources were added, bringing the total number of sources to 30. These updates and additions of sources have cumulatively resulted in 56 309 interaction claims. This has also substantially expanded the comprehensive catalogue of druggable genes and anti-neoplastic drug-gene interactions included in the DGIdb. Along with these content updates, v3.0 has received a major overhaul of its codebase, including an updated user interface, preset interaction search filters, consolidation of interaction information into interaction groups, greatly improved search response times and upgrading the underlying web application framework. In addition, the expanded API features new endpoints which allow users to extract more detailed information about queried drugs, genes and drug-gene interactions, including listings of PubMed IDs, interaction type and other interaction metadata.

  10. Decisional needs assessment regarding Down syndrome prenatal testing: a systematic review of the perceptions of women, their partners and health professionals.

    PubMed

    St-Jacques, Sylvie; Grenier, Sonya; Charland, Marc; Forest, Jean-Claude; Rousseau, François; Légaré, France

    2008-12-01

    To identify decisional needs of women, their partners and health professionals regarding prenatal testing for Down syndrome through a systematic review. Articles reporting original data from real clinical situations on sources of difficulty and/or ease in making decisions regarding prenatal testing for Down syndrome were selected. Data were extracted using a taxonomy adapted from the Ottawa Decision-Support Framework and the quality of the studies was assessed using Qualsyst validated tools. In all 40 publications covering 32 unique studies were included. The majority concerned women. The most often reported sources of difficulty for decision-making in women were pressure from others, emotions and lack of information; in partners, emotion; in health professionals, lack of information, length of consultation, and personal values. The most important sources of ease were, in women, personal values, understanding and confidence in the medical system; in partners, personal values, information from external sources, and income; in health professionals, peer support and scientific meetings. Interventions regarding a decision about prenatal testing for Down syndrome should address many decisional needs, which may indeed vary among the parties involved, whether women, their partners or health professionals. Very little is known about the decisional needs of partners and health professionals.

  11. Studies on the Extraction Region of the Type VI RF Driven H- Ion Source

    NASA Astrophysics Data System (ADS)

    McNeely, P.; Bandyopadhyay, M.; Franzen, P.; Heinemann, B.; Hu, C.; Kraus, W.; Riedl, R.; Speth, E.; Wilhelm, R.

    2002-11-01

    IPP Garching has spent several years developing a RF driven H- ion source intended to be an alternative to the current ITER (International Thermonuclear Experimental Reactor) reference design ion source. A RF driven source offers a number of advantages to ITER in terms of reduced costs and maintenance requirements. Although the RF driven ion source has shown itself to be competitive with a standard arc filament ion source for positive ions many questions still remain on the physics behind the production of the H- ion beam extracted from the source. With the improvements that have been implemented to the BATMAN (Bavarian Test Machine for Negative Ions) facility over the last two years it is now possible to study both the extracted ion beam and the plasma in the vicinity of the extraction grid in greater detail. This paper will show the effect of changing the extraction and acceleration voltage on both the current and shape of the beam as measured on the calorimeter some 1.5 m downstream from the source. The extraction voltage required to operate in the plasma limit is 3 kV. The perveance optimum for the extraction system was determined to be 2.2 x 10-6 A/V3/2 and occurs at 2.7 kV extraction voltage. The horizontal and vertical beam half widths vary as a function of the extracted ion current and the horizontal half width is generally smaller than the vertical. The effect of reducing the co-extracted electron current via plasma grid biasing on the H- current extractable and the beam profile from the source is shown. It is possible in the case of a silver contaminated plasma to reduce the co-extracted electron current to 20% of the initial value by applying a bias of 12 V. In the case where argon is present in the plasma, biasing is observed to have minimal effect on the beam half width but in a pure hydrogen plasma the beam half width increases as the bias voltage increases. New Langmuir probe studies that have been carried out parallel to the plasma grid (in the vicinity of the peak of the external magnetic filter field) and changes to source parameters as a function of power, and argon addition are reported. The behaviour of the electron density is different when the plasma is argon seeded showing a strong increase with RF power. The plasma potential is decreased by 2 V when argon is added to the plasma. The effect of the presence of unwanted silver sputtered from the Faraday screen by Ar+ ions on both the source performance and the plasma parameters is also presented. The silver dramatically downgraded source performance in terms of current density and produced an early saturation of current with applied RF power. Recently, collaboration was begun with the Technical University of Augsburg to perform spectroscopic measurements on the Type VI ion source. The final results of this analysis are not yet ready but some interesting initial observations on the gas temperature, disassociation degree and impurity ions will be presented.

  12. A technique for automatically extracting useful field of view and central field of view images.

    PubMed

    Pandey, Anil Kumar; Sharma, Param Dev; Aheer, Deepak; Kumar, Jay Prakash; Sharma, Sanjay Kumar; Patel, Chetan; Kumar, Rakesh; Bal, Chandra Sekhar

    2016-01-01

    It is essential to ensure the uniform response of the single photon emission computed tomography gamma camera system before using it for the clinical studies by exposing it to uniform flood source. Vendor specific acquisition and processing protocol provide for studying flood source images along with the quantitative uniformity parameters such as integral and differential uniformity. However, a significant difficulty is that the time required to acquire a flood source image varies from 10 to 35 min depending both on the activity of Cobalt-57 flood source and the pre specified counts in the vendors protocol (usually 4000K-10,000K counts). In case the acquired total counts are less than the total prespecified counts, and then the vendor's uniformity processing protocol does not precede with the computation of the quantitative uniformity parameters. In this study, we have developed and verified a technique for reading the flood source image, remove unwanted information, and automatically extract and save the useful field of view and central field of view images for the calculation of the uniformity parameters. This was implemented using MATLAB R2013b running on Ubuntu Operating system and was verified by subjecting it to the simulated and real flood sources images. The accuracy of the technique was found to be encouraging, especially in view of practical difficulties with vendor-specific protocols. It may be used as a preprocessing step while calculating uniformity parameters of the gamma camera in lesser time with fewer constraints.

  13. The intermediate wavelength magnetic anomaly field of the north Pacific and possible source distributions

    NASA Technical Reports Server (NTRS)

    Labrecque, J. L.; Cande, S. C.; Jarrard, R. D. (Principal Investigator)

    1983-01-01

    A technique that eliminates external field sources and the effects of strike aliasing was used to extract from marine survey data the intermediate wavelength magnetic anomaly field for (B) in the North Pacific. A strong correlation exists between this field and the MAGSAT field although a directional sensitivity in the MAGSAT field can be detected. The intermediate wavelength field is correlated to tectonic features. Island arcs appear as positive anomalies of induced origin likely due to variations in crustal thickness. Seamount chains and oceanic plateaus also are manifested by strong anomalies. The primary contribution to many of these anomalies appears to be due to a remanent magnetization. The source parameters for the remainder of these features are presently unidentified ambiguous. Results indicate that the sea surface field is a valuable source of information for secular variation analysis and the resolution of intermediate wavelength source parameters.

  14. Where does streamwater come from in low-relief forested watersheds? A dual-isotope approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klaus, J.; McDonnell, J. J.; Jackson, C. R.

    The time and geographic sources of streamwater in low-relief watersheds are poorly understood. This is partly due to the difficult combination of low runoff coefficients and often damped streamwater isotopic signals precluding traditional hydrograph separation and convolution integral approaches. Here we present a dual-isotope approach involving 18O and 2H of water in a low-angle forested watershed to determine streamwater source components and then build a conceptual model of streamflow generation. We focus on three headwater lowland sub-catchments draining the Savannah River Site in South Carolina, USA. Our results for a 3-year sampling period show that the slopes of the meteoricmore » water lines/evaporation water lines (MWLs/EWLs) of the catchment water sources can be used to extract information on runoff sources in ways not considered before. Our dual-isotope approach was able to identify unique hillslope, riparian and deep groundwater, and streamflow compositions. Thus, the streams showed strong evaporative enrichment compared to the local meteoric water line (δ 2H = 7.15 · δ 18O +9.28‰) with slopes of 2.52, 2.84, and 2.86. Based on the unique and unambiguous slopes of the EWLs of the different water cycle components and the isotopic time series of the individual components, we were able to show how the riparian zone controls baseflow in this system and how the riparian zone "resets" the stable isotope composition of the observed streams in our low-angle, forested watersheds. Although this approach is limited in terms of quantifying mixing percentages between different end-members, our dual-isotope approach enabled the extraction of hydrologically useful information in a region with little change in individual isotope time series.« less

  15. Where does streamwater come from in low-relief forested watersheds? A dual-isotope approach

    DOE PAGES

    Klaus, J.; McDonnell, J. J.; Jackson, C. R.; ...

    2015-01-08

    The time and geographic sources of streamwater in low-relief watersheds are poorly understood. This is partly due to the difficult combination of low runoff coefficients and often damped streamwater isotopic signals precluding traditional hydrograph separation and convolution integral approaches. Here we present a dual-isotope approach involving 18O and 2H of water in a low-angle forested watershed to determine streamwater source components and then build a conceptual model of streamflow generation. We focus on three headwater lowland sub-catchments draining the Savannah River Site in South Carolina, USA. Our results for a 3-year sampling period show that the slopes of the meteoricmore » water lines/evaporation water lines (MWLs/EWLs) of the catchment water sources can be used to extract information on runoff sources in ways not considered before. Our dual-isotope approach was able to identify unique hillslope, riparian and deep groundwater, and streamflow compositions. Thus, the streams showed strong evaporative enrichment compared to the local meteoric water line (δ 2H = 7.15 · δ 18O +9.28‰) with slopes of 2.52, 2.84, and 2.86. Based on the unique and unambiguous slopes of the EWLs of the different water cycle components and the isotopic time series of the individual components, we were able to show how the riparian zone controls baseflow in this system and how the riparian zone "resets" the stable isotope composition of the observed streams in our low-angle, forested watersheds. Although this approach is limited in terms of quantifying mixing percentages between different end-members, our dual-isotope approach enabled the extraction of hydrologically useful information in a region with little change in individual isotope time series.« less

  16. Study on the extraction method of tidal flat area in northern Jiangsu Province based on remote sensing waterlines

    NASA Astrophysics Data System (ADS)

    Zhang, Yuanyuan; Gao, Zhiqiang; Liu, Xiangyang; Xu, Ning; Liu, Chaoshun; Gao, Wei

    2016-09-01

    Reclamation caused a significant dynamic change in the coastal zone, the tidal flat zone is an unstable reserve land resource, it has important significance for its research. In order to realize the efficient extraction of the tidal flat area information, this paper takes Rudong County in Jiangsu Province as the research area, using the HJ1A/1B images as the data source, on the basis of previous research experience and literature review, the paper chooses the method of object-oriented classification as a semi-automatic extraction method to generate waterlines. Then waterlines are analyzed by DSAS software to obtain tide points, automatic extraction of outer boundary points are followed under the use of Python to determine the extent of tidal flats in 2014 of Rudong County, the extraction area was 55182hm2, the confusion matrix is used to verify the accuracy and the result shows that the kappa coefficient is 0.945. The method could improve deficiencies of previous studies and its available free nature on the Internet makes a generalization.

  17. Monitoring of potentially toxic cyanobacteria using an online multi-probe in drinking water sources.

    PubMed

    Zamyadi, A; McQuaid, N; Prévost, M; Dorner, S

    2012-02-01

    Toxic cyanobacteria threaten the water quality of drinking water sources across the globe. Two such water bodies in Canada (a reservoir on the Yamaska River and a bay of Lake Champlain in Québec) were monitored using a YSI 6600 V2-4 (YSI, Yellow Springs, Ohio, USA) submersible multi-probe measuring in vivo phycocyanin (PC) and chlorophyll-a (Chl-a) fluorescence, pH, dissolved oxygen, conductivity, temperature, and turbidity in parallel. The linearity of the in vivo fluorescence PC and Chl-a probe measurements were validated in the laboratory with Microcystis aeruginosa (r(2) = 0.96 and r(2) = 0.82 respectively). Under environmental conditions, in vivo PC fluorescence was strongly correlated with extracted PC (r = 0.79) while in vivo Chl-a fluorescence had a weaker relationship with extracted Chl-a (r = 0.23). Multiple regression analysis revealed significant correlations between extracted Chl-a, extracted PC and cyanobacterial biovolume and in vivo fluorescence parameters measured by the sensors (i.e. turbidity and pH). This information will help water authorities select the in vivo parameters that are the most useful indicators for monitoring cyanobacteria. Despite highly toxic cyanobacterial bloom development 10 m from the drinking water treatment plant's (DWTP) intake on several sampling dates, low in vivo PC fluorescence, cyanobacterial biovolume, and microcystin concentrations were detected in the plant's untreated water. The reservoir's hydrodynamics appear to have prevented the transport of toxins and cells into the DWTP which would have deteriorated the water quality. The multi-probe readings and toxin analyses provided critical evidence that the DWTP's untreated water was unaffected by the toxic cyanobacterial blooms present in its source water.

  18. Oxygen isotopes as a tracer of phosphate sources and cycling in aquatic systems (Invited)

    NASA Astrophysics Data System (ADS)

    Young, M. B.; Kendall, C.; Paytan, A.

    2013-12-01

    The oxygen isotopic composition of phosphate can provide valuable information about sources and processes affecting phosphorus as it moves through hydrologic systems. Applications of this technique in soil and water have become more common in recent years due to improvements in extraction methods and instrument capabilities, and studies in multiple aquatic environments have demonstrated that some phosphorus sources may have distinct isotopic compositions within a given system. Under normal environmental conditions, the oxygen-phosphorus bonds in dissolved inorganic phosphate (DIP) can only be broken by enzymatic activity. Biological cycling of DIP will bring the phosphate oxygen into a temperature-dependent equilibrium with the surrounding water, overprinting any existing isotopic source signals. However, studies conducted in a wide range of estuarine, freshwater, and groundwater systems have found that the phosphate oxygen is often out of biological equilibrium with the water, suggesting that it is common for at least a partial isotopic source signal to be retained in aquatic systems. Oxygen isotope analysis on various potential phosphate sources such as synthetic and organic fertilizers, animal waste, detergents, and septic/wastewater treatment plant effluents show that these sources span a wide range of isotopic compositions, and although there is considerable overlap between the source groups, sources may be isotopically distinct within a given study area. Recent soil studies have shown that isotopic analysis of phosphate oxygen is also useful for understanding microbial cycling across different phosphorus pools, and may provide insights into controls on phosphorus leaching. Combining stable isotope information from soil and water studies will greatly improve our understanding of complex phosphate cycling, and the increasing use of this isotopic technique across different environments will provide new information regarding anthropogenic phosphate inputs and controls on biological cycling within hydrologic systems.

  19. Spatial-spectral preprocessing for endmember extraction on GPU's

    NASA Astrophysics Data System (ADS)

    Jimenez, Luis I.; Plaza, Javier; Plaza, Antonio; Li, Jun

    2016-10-01

    Spectral unmixing is focused in the identification of spectrally pure signatures, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Mainly focused on the spectral information contained in the hyperspectral images, endmember extraction techniques have recently included spatial information to achieve more accurate results. Several algorithms have been developed for automatic or semi-automatic identification of endmembers using spatial and spectral information, including the spectral-spatial endmember extraction (SSEE) where, within a preprocessing step in the technique, both sources of information are extracted from the hyperspectral image and equally used for this purpose. Previous works have implemented the SSEE technique in four main steps: 1) local eigenvectors calculation in each sub-region in which the original hyperspectral image is divided; 2) computation of the maxima and minima projection of all eigenvectors over the entire hyperspectral image in order to obtain a candidates pixels set; 3) expansion and averaging of the signatures of the candidate set; 4) ranking based on the spectral angle distance (SAD). The result of this method is a list of candidate signatures from which the endmembers can be extracted using various spectral-based techniques, such as orthogonal subspace projection (OSP), vertex component analysis (VCA) or N-FINDR. Considering the large volume of data and the complexity of the calculations, there is a need for efficient implementations. Latest- generation hardware accelerators such as commodity graphics processing units (GPUs) offer a good chance for improving the computational performance in this context. In this paper, we develop two different implementations of the SSEE algorithm using GPUs. Both are based on the eigenvectors computation within each sub-region of the first step, one using the singular value decomposition (SVD) and another one using principal component analysis (PCA). Based on our experiments with hyperspectral data sets, high computational performance is observed in both cases.

  20. Elevational Variation in Soil Amino Acid and Inorganic Nitrogen Concentrations in Taibai Mountain, China.

    PubMed

    Cao, Xiaochuang; Ma, Qingxu; Zhong, Chu; Yang, Xin; Zhu, Lianfeng; Zhang, Junhua; Jin, Qianyu; Wu, Lianghuan

    2016-01-01

    Amino acids are important sources of soil organic nitrogen (N), which is essential for plant nutrition, but detailed information about which amino acids predominant and whether amino acid composition varies with elevation is lacking. In this study, we hypothesized that the concentrations of amino acids in soil would increase and their composition would vary along the elevational gradient of Taibai Mountain, as plant-derived organic matter accumulated and N mineralization and microbial immobilization of amino acids slowed with reduced soil temperature. Results showed that the concentrations of soil extractable total N, extractable organic N and amino acids significantly increased with elevation due to the accumulation of soil organic matter and the greater N content. Soil extractable organic N concentration was significantly greater than that of the extractable inorganic N (NO3--N + NH4+-N). On average, soil adsorbed amino acid concentration was approximately 5-fold greater than that of the free amino acids, which indicates that adsorbed amino acids extracted with the strong salt solution likely represent a potential source for the replenishment of free amino acids. We found no appreciable evidence to suggest that amino acids with simple molecular structure were dominant at low elevations, whereas amino acids with high molecular weight and complex aromatic structure dominated the high elevations. Across the elevational gradient, the amino acid pool was dominated by alanine, aspartic acid, glycine, glutamic acid, histidine, serine and threonine. These seven amino acids accounted for approximately 68.9% of the total hydrolyzable amino acid pool. The proportions of isoleucine, tyrosine and methionine varied with elevation, while soil major amino acid composition (including alanine, arginine, aspartic acid, glycine, histidine, leucine, phenylalanine, serine, threonine and valine) did not vary appreciably with elevation (p>0.10). The compositional similarity of many amino acids across the elevational gradient suggests that soil amino acids likely originate from a common source or through similar biochemical processes.

  1. Elevational Variation in Soil Amino Acid and Inorganic Nitrogen Concentrations in Taibai Mountain, China

    PubMed Central

    Yang, Xin; Zhu, Lianfeng; Zhang, Junhua; Jin, Qianyu; Wu, Lianghuan

    2016-01-01

    Amino acids are important sources of soil organic nitrogen (N), which is essential for plant nutrition, but detailed information about which amino acids predominant and whether amino acid composition varies with elevation is lacking. In this study, we hypothesized that the concentrations of amino acids in soil would increase and their composition would vary along the elevational gradient of Taibai Mountain, as plant-derived organic matter accumulated and N mineralization and microbial immobilization of amino acids slowed with reduced soil temperature. Results showed that the concentrations of soil extractable total N, extractable organic N and amino acids significantly increased with elevation due to the accumulation of soil organic matter and the greater N content. Soil extractable organic N concentration was significantly greater than that of the extractable inorganic N (NO3−-N + NH4+-N). On average, soil adsorbed amino acid concentration was approximately 5-fold greater than that of the free amino acids, which indicates that adsorbed amino acids extracted with the strong salt solution likely represent a potential source for the replenishment of free amino acids. We found no appreciable evidence to suggest that amino acids with simple molecular structure were dominant at low elevations, whereas amino acids with high molecular weight and complex aromatic structure dominated the high elevations. Across the elevational gradient, the amino acid pool was dominated by alanine, aspartic acid, glycine, glutamic acid, histidine, serine and threonine. These seven amino acids accounted for approximately 68.9% of the total hydrolyzable amino acid pool. The proportions of isoleucine, tyrosine and methionine varied with elevation, while soil major amino acid composition (including alanine, arginine, aspartic acid, glycine, histidine, leucine, phenylalanine, serine, threonine and valine) did not vary appreciably with elevation (p>0.10). The compositional similarity of many amino acids across the elevational gradient suggests that soil amino acids likely originate from a common source or through similar biochemical processes. PMID:27337100

  2. KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process

    NASA Technical Reports Server (NTRS)

    Gettig, Gary A.

    1988-01-01

    Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.

  3. ExaCT: automatic extraction of clinical trial characteristics from journal publications

    PubMed Central

    2010-01-01

    Background Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). Methods ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. Results We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. Conclusions Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols). PMID:20920176

  4. Web-Scale Search-Based Data Extraction and Integration

    DTIC Science & Technology

    2011-10-17

    differently, posing challenges for aggregating this information. For example, for the task of finding population for cities in Benin, we were faced with...merged record. Our GeoMerging algorithm attempts to address various ambiguity challenges : • For name: The name of a hospital is not a unique...departments in the same building. For agent-extractor results from structured sources, our GeoMerging algorithm overcomes these challenges using a two

  5. Parsing and Tagging of Bilingual Dictionary

    DTIC Science & Technology

    2003-09-01

    LAMP-TR-106 CAR-TR-991 CS-TR-4529 UMIACS-TR-2003-97 PARSING ANS TAGGING OF BILINGUAL DICTIONARY Huanfeng Ma1,2, Burcu Karagol-Ayan1,2, David... dictionaries hold great potential as a source of lexical resources for training and testing automated systems for optical character recognition, machine...translation, and cross-language information retrieval. In this paper, we describe a system for extracting term lexicons from printed bilingual dictionaries

  6. SPECTRAL LINE DE-CONFUSION IN AN INTENSITY MAPPING SURVEY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheng, Yun-Ting; Bock, James; Bradford, C. Matt

    2016-12-01

    Spectral line intensity mapping (LIM) has been proposed as a promising tool to efficiently probe the cosmic reionization and the large-scale structure. Without detecting individual sources, LIM makes use of all available photons and measures the integrated light in the source confusion limit to efficiently map the three-dimensional matter distribution on large scales as traced by a given emission line. One particular challenge is the separation of desired signals from astrophysical continuum foregrounds and line interlopers. Here we present a technique to extract large-scale structure information traced by emission lines from different redshifts, embedded in a three-dimensional intensity mapping data cube.more » The line redshifts are distinguished by the anisotropic shape of the power spectra when projected onto a common coordinate frame. We consider the case where high-redshift [C ii] lines are confused with multiple low-redshift CO rotational lines. We present a semi-analytic model for [C ii] and CO line estimates based on the cosmic infrared background measurements, and show that with a modest instrumental noise level and survey geometry, the large-scale [C ii] and CO power spectrum amplitudes can be successfully extracted from a confusion-limited data set, without external information. We discuss the implications and limits of this technique for possible LIM experiments.« less

  7. STSHV a teleinformatic system for historic seismology in Venezuela

    NASA Astrophysics Data System (ADS)

    Choy, J. E.; Palme, C.; Altez, R.; Aranguren, R.; Guada, C.; Silva, J.

    2013-05-01

    From 1997 on, when the first "Jornadas Venezolanas de Sismicidad Historica" took place, a big interest awoke in Venezuela to organize the available information related to historic earthquakes. At that moment only existed one published historic earthquake catalogue, that from Centeno Grau published the first time in 1949. That catalogue had no references about the sources of information. Other catalogues existed but they were internal reports for the petroleum companies and therefore difficult to access. In 2000 Grases et al reedited the Centeno-Grau catalogue, it ended up in a new, very complete catalogue with all the sources well referenced and updated. The next step to organize historic seismicity data was, from 2004 to 2008, the creation of the STSHV (Sistema de teleinformacion de Sismologia Historica Venezolana, http://sismicidad.hacer.ula.ve ). The idea was to bring together all information about destructive historic earthquakes in Venezuela in one place in the internet so it could be accessed easily by a widespread public. There are two ways to access the system. The first one, selecting an earthquake or a list of earthquakes, and the second one, selecting an information source or a list of sources. For each earthquake there is a summary of general information and additional materials: a list with the source parameters published by different authors, a list with intensities assessed by different authors, a list of information sources, a short text summarizing the historic situation at the time of the earthquake and a list of pictures if available. There are searching facilities for the seismic events and dynamic maps can be created. The information sources are classified in: books, handwritten documents, transcription of handwritten documents, documents published in books, journals and congress memories, newspapers, seismologic catalogues and electronic sources. There are facilities to find specific documents or lists of documents with common characteristics. For each document general information is displayed together with an extract of the information relating to the earthquake. If the complete document was available and no problem with the publishers rights a pdf copy of the document was included. We found this system extremely useful for studying historic earthquakes, as one can access immediately previous research works about an earthquake and it allows to check easily the historic information and so to validate the intensity data. So far, the intensity data have not been completed for earthquakes after 2000. This information would be important for improving calibration of intensity - magnitude calibrations of historic events, and is a work in progress. On the other hand, it is important to mention that "El Catálogo Sismológico Venezolano del siglo XX" (The Seismological Venezuelan Catalog), published in 2012, updates seismic information up to 2007, and that the STSHV was one of its primary sources of information.

  8. PyEEG: an open source Python module for EEG/MEG feature extraction.

    PubMed

    Bao, Forrest Sheng; Liu, Xin; Zhang, Christina

    2011-01-01

    Computer-aided diagnosis of neural diseases from EEG signals (or other physiological signals that can be treated as time series, e.g., MEG) is an emerging field that has gained much attention in past years. Extracting features is a key component in the analysis of EEG signals. In our previous works, we have implemented many EEG feature extraction functions in the Python programming language. As Python is gaining more ground in scientific computing, an open source Python module for extracting EEG features has the potential to save much time for computational neuroscientists. In this paper, we introduce PyEEG, an open source Python module for EEG feature extraction.

  9. PyEEG: An Open Source Python Module for EEG/MEG Feature Extraction

    PubMed Central

    Bao, Forrest Sheng; Liu, Xin; Zhang, Christina

    2011-01-01

    Computer-aided diagnosis of neural diseases from EEG signals (or other physiological signals that can be treated as time series, e.g., MEG) is an emerging field that has gained much attention in past years. Extracting features is a key component in the analysis of EEG signals. In our previous works, we have implemented many EEG feature extraction functions in the Python programming language. As Python is gaining more ground in scientific computing, an open source Python module for extracting EEG features has the potential to save much time for computational neuroscientists. In this paper, we introduce PyEEG, an open source Python module for EEG feature extraction. PMID:21512582

  10. Comparison of ONIX simulation results with experimental data from the BATMAN testbed for the study of negative ion extraction

    NASA Astrophysics Data System (ADS)

    Mochalskyy, Serhiy; Fantz, Ursel; Wünderlich, Dirk; Minea, Tiberiu

    2016-10-01

    The development of negative ion (NI) sources for the ITER neutral beam injector is strongly accompanied by modelling activities. The ONIX (Orsay Negative Ion eXtraction) code simulates the formation and extraction of negative hydrogen ions and co-extracted electrons produced in caesiated sources. In this paper the 3D geometry of the BATMAN extraction system, and the source characteristics such as the extraction and bias potential, and the 3D magnetic field were integrated in the model. Calculations were performed using plasma parameters experimentally obtained on BATMAN. The comparison of the ONIX calculated extracted NI density with the experimental results suggests that predictive calculations of the extraction of NIs are possible. The results show that for an ideal status of the Cs conditioning the extracted hydrogen NI current density could reach ~30 mA cm-2 at 10 kV and ~20 mA cm-2 at 5 kV extraction potential, with an electron/NI current density ratio of about 1, as measured in the experiments under the same plasma and source conditions. The dependency of the extracted NI current on the NI density in the bulk plasma region from both the modeling and the experiment was investigated. The separate distributions composing the NI beam originating from the plasma bulk region and the PG surface are presented for different NI plasma volume densities and NI emission rates from the plasma grid (PG) wall, respectively. The extracted current from the NIs produced at the Cs covered PG surface, initially moving towards the bulk plasma and then being bent towards the extraction surfaces, is lower compared to the extracted NI current from directly extracted surface produced ions.

  11. Assessment of spatial information for hyperspectral imaging of lesion

    NASA Astrophysics Data System (ADS)

    Yang, Xue; Li, Gang; Lin, Ling

    2016-10-01

    Multiple diseases such as breast tumor poses a great threat to women's health and life, while the traditional detection method is complex, costly and unsuitable for frequently self-examination, therefore, an inexpensive, convenient and efficient method for tumor self-inspection is needed urgently, and lesion localization is an important step. This paper proposes an self-examination method for positioning of a lesion. The method adopts transillumination to acquire the hyperspectral images and to assess the spatial information of lesion. Firstly, multi-wavelength sources are modulated with frequency division, which is advantageous to separate images of different wavelength, meanwhile, the source serves as fill light to each other to improve the sensitivity in the low-lightlevel imaging. Secondly, the signal-to-noise ratio of transmitted images after demodulation are improved by frame accumulation technology. Next, gray distributions of transmitted images are analyzed. The gray-level differences is constituted by the actual transmitted images and fitting transmitted images of tissue without lesion, which is to rule out individual differences. Due to scattering effect, there will be transition zones between tissue and lesion, and the zone changes with wavelength change, which will help to identify the structure details of lesion. Finally, image segmentation is adopted to extract the lesion and the transition zones, and the spatial features of lesion are confirmed according to the transition zones and the differences of transmitted light intensity distributions. Experiment using flat-shaped tissue as an example shows that the proposed method can extract the space information of lesion.

  12. PM2.5 pollution from household solid fuel burning practices in Central India: 2. Application of receptor models for source apportionment.

    PubMed

    Matawle, Jeevan Lal; Pervez, Shamsh; Deb, Manas Kanti; Shrivastava, Anjali; Tiwari, Suresh

    2018-02-01

    USEPA's UNMIX, positive matrix factorization (PMF) and effective variance-chemical mass balance (EV-CMB) receptor models were applied to chemically speciated profiles of 125 indoor PM 2.5 measurements, sampled longitudinally during 2012-2013 in low-income group households of Central India which uses solid fuels for cooking practices. Three step source apportionment studies were carried out to generate more confident source characterization. Firstly, UNMIX6.0 extracted initial number of source factors, which were used to execute PMF5.0 to extract source-factor profiles in second step. Finally, factor analog locally derived source profiles were supplemented to EV-CMB8.2 with indoor receptor PM 2.5 chemical profile to evaluate source contribution estimates (SCEs). The results of combined use of three receptor models clearly describe that UNMIX and PMF are useful tool to extract types of source categories within small receptor dataset and EV-CMB can pick those locally derived source profiles for source apportionment which are analog to PMF-extracted source categories. The source apportionment results have also shown three fold higher relative contribution of solid fuel burning emissions to indoor PM 2.5 compared to those measurements reported for normal households with LPG stoves. The previously reported influential source marker species were found to be comparatively similar to those extracted from PMF fingerprint plots. The comparison between PMF and CMB SCEs results were also found to be qualitatively similar. The performance fit measures of all three receptor models were cross-verified and validated and support each other to gain confidence in source apportionment results.

  13. Synthesising quantitative and qualitative research in evidence‐based patient information

    PubMed Central

    Goldsmith, Megan R; Bankhead, Clare R; Austoker, Joan

    2007-01-01

    Background Systematic reviews have, in the past, focused on quantitative studies and clinical effectiveness, while excluding qualitative evidence. Qualitative research can inform evidence‐based practice independently of other research methodologies but methods for the synthesis of such data are currently evolving. Synthesising quantitative and qualitative research in a single review is an important methodological challenge. Aims This paper describes the review methods developed and the difficulties encountered during the process of updating a systematic review of evidence to inform guidelines for the content of patient information related to cervical screening. Methods Systematic searches of 12 electronic databases (January 1996 to July 2004) were conducted. Studies that evaluated the content of information provided to women about cervical screening or that addressed women's information needs were assessed for inclusion. A data extraction form and quality assessment criteria were developed from published resources. A non‐quantitative synthesis was conducted and a tabular evidence profile for each important outcome (eg “explain what the test involves”) was prepared. The overall quality of evidence for each outcome was then assessed using an approach published by the GRADE working group, which was adapted to suit the review questions and modified to include qualitative research evidence. Quantitative and qualitative studies were considered separately for every outcome. Results 32 papers were included in the systematic review following data extraction and assessment of methodological quality. The review questions were best answered by evidence from a range of data sources. The inclusion of qualitative research, which was often highly relevant and specific to many components of the screening information materials, enabled the production of a set of recommendations that will directly affect policy within the NHS Cervical Screening Programme. Conclusions A practical example is provided of how quantitative and qualitative data sources might successfully be brought together and considered in one review. PMID:17325406

  14. Development of a data entry auditing protocol and quality assurance for a tissue bank database.

    PubMed

    Khushi, Matloob; Carpenter, Jane E; Balleine, Rosemary L; Clarke, Christine L

    2012-03-01

    Human transcription error is an acknowledged risk when extracting information from paper records for entry into a database. For a tissue bank, it is critical that accurate data are provided to researchers with approved access to tissue bank material. The challenges of tissue bank data collection include manual extraction of data from complex medical reports that are accessed from a number of sources and that differ in style and layout. As a quality assurance measure, the Breast Cancer Tissue Bank (http:\\\\www.abctb.org.au) has implemented an auditing protocol and in order to efficiently execute the process, has developed an open source database plug-in tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range from 0.01% when entering donor's clinical follow-up details, to 0.53% when entering pathological details, highlighting the importance of an audit protocol tool such as eAuditor in a tissue bank database. eAuditor was developed and tested on the Caisis open source clinical-research database; however, it can be integrated in other databases where similar functionality is required.

  15. Text-in-context: a method for extracting findings in mixed-methods mixed research synthesis studies.

    PubMed

    Sandelowski, Margarete; Leeman, Jennifer; Knafl, Kathleen; Crandell, Jamie L

    2013-06-01

    Our purpose in this paper is to propose a new method for extracting findings from research reports included in mixed-methods mixed research synthesis studies. International initiatives in the domains of systematic review and evidence synthesis have been focused on broadening the conceptualization of evidence, increased methodological inclusiveness and the production of evidence syntheses that will be accessible to and usable by a wider range of consumers. Initiatives in the general mixed-methods research field have been focused on developing truly integrative approaches to data analysis and interpretation. The data extraction challenges described here were encountered, and the method proposed for addressing these challenges was developed, in the first year of the ongoing (2011-2016) study: Mixed-Methods Synthesis of Research on Childhood Chronic Conditions and Family. To preserve the text-in-context of findings in research reports, we describe a method whereby findings are transformed into portable statements that anchor results to relevant information about sample, source of information, time, comparative reference point, magnitude and significance and study-specific conceptions of phenomena. The data extraction method featured here was developed specifically to accommodate mixed-methods mixed research synthesis studies conducted in nursing and other health sciences, but reviewers might find it useful in other kinds of research synthesis studies. This data extraction method itself constitutes a type of integration to preserve the methodological context of findings when statements are read individually and in comparison to each other. © 2012 Blackwell Publishing Ltd.

  16. Removal of power line interference of space bearing vibration signal based on the morphological filter and blind source separation

    NASA Astrophysics Data System (ADS)

    Dong, Shaojiang; Sun, Dihua; Xu, Xiangyang; Tang, Baoping

    2017-06-01

    Aiming at the problem that it is difficult to extract the feature information from the space bearing vibration signal because of different noise, for example the running trend information, high-frequency noise and especially the existence of lot of power line interference (50Hz) and its octave ingredients of the running space simulated equipment in the ground. This article proposed a combination method to eliminate them. Firstly, the EMD is used to remove the running trend item information of the signal, the running trend that affect the signal processing accuracy is eliminated. Then the morphological filter is used to eliminate high-frequency noise. Finally, the components and characteristics of the power line interference are researched, based on the characteristics of the interference, the revised blind source separation model is used to remove the power line interferences. Through analysis of simulation and practical application, results suggest that the proposed method can effectively eliminate those noise.

  17. GlycoRDF: an ontology to standardize glycomics data in RDF

    PubMed Central

    Ranzinger, Rene; Aoki-Kinoshita, Kiyoko F.; Campbell, Matthew P.; Kawano, Shin; Lütteke, Thomas; Okuda, Shujiro; Shinmachi, Daisuke; Shikanai, Toshihide; Sawaki, Hiromichi; Toukach, Philip; Matsubara, Masaaki; Yamada, Issaku; Narimatsu, Hisashi

    2015-01-01

    Motivation: Over the last decades several glycomics-based bioinformatics resources and databases have been created and released to the public. Unfortunately, there is no common standard in the representation of the stored information or a common machine-readable interface allowing bioinformatics groups to easily extract and cross-reference the stored information. Results: An international group of bioinformatics experts in the field of glycomics have worked together to create a standard Resource Description Framework (RDF) representation for glycomics data, focused on glycan sequences and related biological source, publications and experimental data. This RDF standard is defined by the GlycoRDF ontology and will be used by database providers to generate common machine-readable exports of the data stored in their databases. Availability and implementation: The ontology, supporting documentation and source code used by database providers to generate standardized RDF are available online (http://www.glycoinfo.org/GlycoRDF/). Contact: rene@ccrc.uga.edu or kkiyoko@soka.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388145

  18. [Instrumental, directive, and affective communication in hospital leaflets].

    PubMed

    Vasconcellos-Silva, Paulo Roberto; Uribe Rivera, Francisco Javier; Castiel, Luis David

    2003-01-01

    This study focuses on the typical semantic systems extracted from hospital staff communicative resources which attempt to validate information as an "object" to be transferred to patients. We describe the models of textual communication in 58 patient information leaflets from five hospital units in Brazil, gathered from 1996 to 2002. Three categories were identified, based on the theory of speech acts (Austin, Searle, and Habermas): 1) cognitive-instrumental utterances: descriptions by means of technical terms validated by self-referred, incomplete, or inaccessible argumentation, with an implicit educational function; 2) technical-directive utterances: self-referred (to the context of the source domains), with a shifting of everyday acts to a technical terrain with a disciplinary function and impersonal features; and 3) expressive modulations: need for inter-subjective connections to strengthen bonds of trust and a tendency to use childish arguments. We conclude that the three categories displayed: fragmentary sources; assumption of univocal messages and invariable use of information (idealized motivations and interests, apart from individualized perspectives); and assumption of universal interests as generators of knowledge.

  19. Social media as an information source for rapid flood inundation mapping

    NASA Astrophysics Data System (ADS)

    Fohringer, J.; Dransch, D.; Kreibich, H.; Schröter, K.

    2015-12-01

    During and shortly after a disaster, data about the hazard and its consequences are scarce and not readily available. Information provided by eyewitnesses via social media is a valuable information source, which should be explored in a~more effective way. This research proposes a methodology that leverages social media content to support rapid inundation mapping, including inundation extent and water depth in the case of floods. The novelty of this approach is the utilization of quantitative data that are derived from photos from eyewitnesses extracted from social media posts and their integration with established data. Due to the rapid availability of these posts compared to traditional data sources such as remote sensing data, areas affected by a flood, for example, can be determined quickly. The challenge is to filter the large number of posts to a manageable amount of potentially useful inundation-related information, as well as to interpret and integrate the posts into mapping procedures in a timely manner. To support rapid inundation mapping we propose a methodology and develop "PostDistiller", a tool to filter geolocated posts from social media services which include links to photos. This spatial distributed contextualized in situ information is further explored manually. In an application case study during the June 2013 flood in central Europe we evaluate the utilization of this approach to infer spatial flood patterns and inundation depths in the city of Dresden.

  20. Social media as an information source for rapid flood inundation mapping

    NASA Astrophysics Data System (ADS)

    Fohringer, J.; Dransch, D.; Kreibich, H.; Schröter, K.

    2015-07-01

    During and shortly after a disaster data about the hazard and its consequences are scarce and not readily available. Information provided by eye-witnesses via social media are a valuable information source, which should be explored in a more effective way. This research proposes a methodology that leverages social media content to support rapid inundation mapping, including inundation extent and water depth in case of floods. The novelty of this approach is the utilization of quantitative data that are derived from photos from eye-witnesses extracted from social media posts and its integration with established data. Due to the rapid availability of these posts compared to traditional data sources such as remote sensing data, for example areas affected by a flood can be determined quickly. The challenge is to filter the large number of posts to a manageable amount of potentially useful inundation-related information as well as their timely interpretation and integration in mapping procedures. To support rapid inundation mapping we propose a methodology and develop a tool to filter geo-located posts from social media services which include links to photos. This spatial distributed contextualized in-situ information is further explored manually. In an application case study during the June 2013 flood in central Europe we evaluate the utilization of this approach to infer spatial flood patterns and inundation depths in the city of Dresden.

  1. Rapid Extraction of Lexical Tone Phonology in Chinese Characters: A Visual Mismatch Negativity Study

    PubMed Central

    Wang, Xiao-Dong; Liu, A-Ping; Wu, Yin-Yuan; Wang, Peng

    2013-01-01

    Background In alphabetic languages, emerging evidence from behavioral and neuroimaging studies shows the rapid and automatic activation of phonological information in visual word recognition. In the mapping from orthography to phonology, unlike most alphabetic languages in which there is a natural correspondence between the visual and phonological forms, in logographic Chinese, the mapping between visual and phonological forms is rather arbitrary and depends on learning and experience. The issue of whether the phonological information is rapidly and automatically extracted in Chinese characters by the brain has not yet been thoroughly addressed. Methodology/Principal Findings We continuously presented Chinese characters differing in orthography and meaning to adult native Mandarin Chinese speakers to construct a constant varying visual stream. In the stream, most stimuli were homophones of Chinese characters: The phonological features embedded in these visual characters were the same, including consonants, vowels and the lexical tone. Occasionally, the rule of phonology was randomly violated by characters whose phonological features differed in the lexical tone. Conclusions/Significance We showed that the violation of the lexical tone phonology evoked an early, robust visual response, as revealed by whole-head electrical recordings of the visual mismatch negativity (vMMN), indicating the rapid extraction of phonological information embedded in Chinese characters. Source analysis revealed that the vMMN was involved in neural activations of the visual cortex, suggesting that the visual sensory memory is sensitive to phonological information embedded in visual words at an early processing stage. PMID:23437235

  2. Visual cues in low-level flight - Implications for pilotage, training, simulation, and enhanced/synthetic vision systems

    NASA Technical Reports Server (NTRS)

    Foyle, David C.; Kaiser, Mary K.; Johnson, Walter W.

    1992-01-01

    This paper reviews some of the sources of visual information that are available in the out-the-window scene and describes how these visual cues are important for routine pilotage and training, as well as the development of simulator visual systems and enhanced or synthetic vision systems for aircraft cockpits. It is shown how these visual cues may change or disappear under environmental or sensor conditions, and how the visual scene can be augmented by advanced displays to capitalize on the pilot's excellent ability to extract visual information from the visual scene.

  3. Electronic health information quality challenges and interventions to improve public health surveillance data and practice.

    PubMed

    Dixon, Brian E; Siegel, Jason A; Oemig, Tanya V; Grannis, Shaun J

    2013-01-01

    We examined completeness, an attribute of data quality, in the context of electronic laboratory reporting (ELR) of notifiable disease information to public health agencies. We extracted more than seven million ELR messages from multiple clinical information systems in two states. We calculated and compared the completeness of various data fields within the messages that were identified to be important to public health reporting processes. We compared unaltered, original messages from source systems with similar messages from another state as well as messages enriched by a health information exchange (HIE). Our analysis focused on calculating completeness (i.e., the number of nonmissing values) for fields deemed important for inclusion in notifiable disease case reports. The completeness of data fields for laboratory transactions varied across clinical information systems and jurisdictions. Fields identifying the patient and test results were usually complete (97%-100%). Fields containing patient demographics, patient contact information, and provider contact information were suboptimal (6%-89%). Transactions enhanced by the HIE were found to be more complete (increases ranged from 2% to 25%) than the original messages. ELR data from clinical information systems can be of suboptimal quality. Public health monitoring of data sources and augmentation of ELR message content using HIE services can improve data quality.

  4. Use of a Recombinant Fluorescent Substrate with Cleavage Sites for All Botulinum Neurotoxins in High-Throughput Screening of Natural Product Extracts for Inhibitors of Serotypes A, B, and E7

    DTIC Science & Technology

    2007-12-14

    contained dried residues from a collection of terrestrial plants , marine inver- tebrates, and various fungi. NCI plate numbers, sources of extracts, and... plants ), while Fig. 3B displays results from row G of the same plate. In these examples, wells B3, B5, B9, G9, and G12 were selected for further...sources of extracts Plate no. Source Extraction solvent 96110120 Terrestrial plants Water 96110125 Terrestrial plants CH3OH-CH2Cl2 12000707 Marine

  5. Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection

    PubMed Central

    Lossio-Ventura, Juan Antonio; Hogan, William; Modave, François; Hicks, Amanda; Hanna, Josh; Guo, Yi; He, Zhe; Bian, Jiang

    2017-01-01

    Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993. PMID:28503356

  6. Palatability of water-soluble extracts of protein sources and replacement of fishmeal by a selected mixture of protein sources for juvenile turbot ( Scophthalmus maximus)

    NASA Astrophysics Data System (ADS)

    Dong, Chun; He, Gen; Mai, Kangsen; Zhou, Huihui; Xu, Wei

    2016-06-01

    Poor palatability is a limiting factor for replacing fishmeal with other protein sources in aquaculture. The water-soluble molecules with low molecular weights are the major determinants of the palatability of diets. The present study was conducted to investigate the palatability of water-soluble extracts from single protein source (single extract pellets) and the mixture of these extracts with different proportions (blended extract pellets) in juvenile turbot ( Scophthalmus maximus). Then according to the palatability of blended extract pellets, an optimal mixture proportion was selected, and a new protein source made from raw protein materials with the selected proportion was formulated to replace fishmeal. Summarily, the palatability of single extract pellets for turbot was descendent from fishmeal to pet-food grade poultry by-product meal, wheat gluten meal, soybean meal, peanut meal, meat and bone meal, and corn gluten meal. Subsequently, according to the palatability of single extract pellets, 52 kinds of blended extract pellets were designed to test their palatability. The results showed that the pellets presented remarkably different palatability, and the optimal one was diet 52 (wheat gluten meal: pet-food grade poultry by-product meal: meat and bone meal: corn gluten meal = 1:6:1:2). The highest ingestion ratio (the number of pellets ingested/the number of pellets fed) was 0.73 ± 0.03, which was observed in Diet 52. Then five isonitrogenous (52% crude protein) and isocaloric (20 kJ g-1 gross energy) diets were formulated by replacing 0 (control), 35%, 50%, 65% and 80% of fishmeal with No.52 blending proportion. After a 10-weeks feeding trial, a consistent feed intake was found among all replacement treatments. Replacement level of fishmeal up to 35% did not significantly influence final body weight, specific growth rate, feed efficiency ratio, and protein efficiency ratio of turbot. Therefore, the water-soluble extracts of protein sources play an important role in improving the palatability of non-fishmeal protein sources in aquafeed.

  7. Two sources of meaning in infant communication: preceding action contexts and act-accompanying characteristics

    PubMed Central

    Liszkowski, Ulf

    2014-01-01

    How do infants communicate before they have acquired a language? This paper supports the hypothesis that infants possess social–cognitive skills that run deeper than language alone, enabling them to understand others and make themselves understood. I suggested that infants, like adults, use two sources of extralinguistic information to communicate meaningfully and react to and express communicative intentions appropriately. In support, a review of relevant experiments demonstrates, first, that infants use information from preceding shared activities to tailor their comprehension and production of communication. Second, a series of novel findings from our laboratory shows that in the absence of distinguishing information from preceding routines or activities, infants use accompanying characteristics (such as prosody and posture) that mark communicative intentions to extract and transmit meaning. Findings reveal that before infants begin to speak they communicate in meaningful ways by binding preceding and simultaneous multisensory information to a communicative act. These skills are not only a precursor to language, but also an outcome of social–cognitive development and social experience in the first year of life. PMID:25092662

  8. Can multilinguality improve Biomedical Word Sense Disambiguation?

    PubMed

    Duque, Andres; Martinez-Romo, Juan; Araujo, Lourdes

    2016-12-01

    Ambiguity in the biomedical domain represents a major issue when performing Natural Language Processing tasks over the huge amount of available information in the field. For this reason, Word Sense Disambiguation is critical for achieving accurate systems able to tackle complex tasks such as information extraction, summarization or document classification. In this work we explore whether multilinguality can help to solve the problem of ambiguity, and the conditions required for a system to improve the results obtained by monolingual approaches. Also, we analyze the best ways to generate those useful multilingual resources, and study different languages and sources of knowledge. The proposed system, based on co-occurrence graphs containing biomedical concepts and textual information, is evaluated on a test dataset frequently used in biomedicine. We can conclude that multilingual resources are able to provide a clear improvement of more than 7% compared to monolingual approaches, for graphs built from a small number of documents. Also, empirical results show that automatically translated resources are a useful source of information for this particular task. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. A Visual Analytics Framework for Identifying Topic Drivers in Media Events.

    PubMed

    Lu, Yafeng; Wang, Hong; Landis, Steven; Maciejewski, Ross

    2017-09-14

    Media data has been the subject of large scale analysis with applications of text mining being used to provide overviews of media themes and information flows. Such information extracted from media articles has also shown its contextual value of being integrated with other data, such as criminal records and stock market pricing. In this work, we explore linking textual media data with curated secondary textual data sources through user-guided semantic lexical matching for identifying relationships and data links. In this manner, critical information can be identified and used to annotate media timelines in order to provide a more detailed overview of events that may be driving media topics and frames. These linked events are further analyzed through an application of causality modeling to model temporal drivers between the data series. Such causal links are then annotated through automatic entity extraction which enables the analyst to explore persons, locations, and organizations that may be pertinent to the media topic of interest. To demonstrate the proposed framework, two media datasets and an armed conflict event dataset are explored.

  10. Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

    PubMed Central

    Sun, Wencheng; Li, Yangyang; Liu, Fang; Fang, Shengqun; Wang, Guoyan

    2018-01-01

    Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work. PMID:29849998

  11. Determination of acoustic waveguide invariant using ships as sources of opportunity in a shallow water marine environment.

    PubMed

    Verlinden, Christopher M A; Sarkar, J; Cornuelle, B D; Kuperman, W A

    2017-02-01

    The waveguide invariant (WGI) is a property that can be used to localize acoustic radiators and extract information about the environment. Here the WGI is determined using ships as sources of opportunity, tracked using the Automatic Identification System (AIS). The relationship between range, acoustic intensity, and frequency for a ship in a known position is used to determine the WGI parameter β. These β values are interpolated and a map of β is generated. The method is demonstrated using data collected in a field experiment on a single hydrophone in a shallow water environment off the coast of Southern California.

  12. PropBase Query Layer: a single portal to UK subsurface physical property databases

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.

  13. Building Facade Reconstruction by Fusing Terrestrial Laser Points and Images

    PubMed Central

    Pu, Shi; Vosselman, George

    2009-01-01

    Laser data and optical data have a complementary nature for three dimensional feature extraction. Efficient integration of the two data sources will lead to a more reliable and automated extraction of three dimensional features. This paper presents a semiautomatic building facade reconstruction approach, which efficiently combines information from terrestrial laser point clouds and close range images. A building facade's general structure is discovered and established using the planar features from laser data. Then strong lines in images are extracted using Canny extractor and Hough transformation, and compared with current model edges for necessary improvement. Finally, textures with optimal visibility are selected and applied according to accurate image orientations. Solutions to several challenge problems throughout the collaborated reconstruction, such as referencing between laser points and multiple images and automated texturing, are described. The limitations and remaining works of this approach are also discussed. PMID:22408539

  14. Mining marine shellfish wastes for bioactive molecules: chitin and chitosan--Part A: extraction methods.

    PubMed

    Hayes, Maria; Carney, Brian; Slater, John; Brück, Wolfram

    2008-07-01

    Legal restrictions, high costs and environmental problems regarding the disposal of marine processing wastes have led to amplified interest in biotechnology research concerning the identification and extraction of additional high grade, low-volume by-products produced from shellfish waste treatments. Shellfish waste consisting of crustacean exoskeletons is currently the main source of biomass for chitin production. Chitin is a polysaccharide composed of N-acetyl-D-glucosamine units and the multidimensional utilization of chitin derivatives including chitosan, a deacetylated derivative of chitin, is due to a number of characteristics including: their polyelectrolyte and cationic nature, the presence of reactive groups, high adsorption capacities, bacteriostatic and fungistatic influences, making them very versatile biomolecules. Part A of this review aims to consolidate useful information concerning the methods used to extract and characterize chitin, chitosan and glucosamine obtained through industrial, microbial and enzymatic hydrolysis of shellfish waste.

  15. [Investigation of metal element content of some European and Far Eastern herbs].

    PubMed

    Süle, Krisztina; Kurucz, Dóra; Kajári, Ágnes; May, Zoltán

    2015-08-02

    Metal elements and their excess intake have significant influence on general health. There is only little information how Far Eastern herbs resemble European's regarding their purity and essential metal element content. The aim of the authors was to determine metal elements in different Chinese and European herbs and extracts. The studied European herbs included Calendula officinalis petals, Achillea millefolium, Epilobium parviflorum herba, Urtica dioica leaves, Crataegus monogyna flowers while Far Eastern herbs were Cordyceps sinensis, Ganoderma lucidum, Ginkgo biloba leaves, Panax ginseng and Curcuma longa roots. The analysis was performed using inductively coupled plasma optical emission spectroscopy. There was no considerable difference in essential metal elements and the Ca:Mg concentration ratio between European and Far Eastern drugs and extracts. The extracts are preferential metal element sources and their magnesium content are also advantageous, because of a shift of the Ca:Mg concentration ratio towards magnesium.

  16. Underestimated prevalence of heart failure in hospital inpatients: a comparison of ICD codes and discharge letter information.

    PubMed

    Kaspar, Mathias; Fette, Georg; Güder, Gülmisal; Seidlmayer, Lea; Ertl, Maximilian; Dietrich, Georg; Greger, Helmut; Puppe, Frank; Störk, Stefan

    2018-04-17

    Heart failure is the predominant cause of hospitalization and amongst the leading causes of death in Germany. However, accurate estimates of prevalence and incidence are lacking. Reported figures originating from different information sources are compromised by factors like economic reasons or documentation quality. We implemented a clinical data warehouse that integrates various information sources (structured parameters, plain text, data extracted by natural language processing) and enables reliable approximations to the real number of heart failure patients. Performance of ICD-based diagnosis in detecting heart failure was compared across the years 2000-2015 with (a) advanced definitions based on algorithms that integrate various sources of the hospital information system, and (b) a physician-based reference standard. Applying these methods for detecting heart failure in inpatients revealed that relying on ICD codes resulted in a marked underestimation of the true prevalence of heart failure, ranging from 44% in the validation dataset to 55% (single year) and 31% (all years) in the overall analysis. Percentages changed over the years, indicating secular changes in coding practice and efficiency. Performance was markedly improved using search and permutation algorithms from the initial expert-specified query (F1 score of 81%) to the computer-optimized query (F1 score of 86%) or, alternatively, optimizing precision or sensitivity depending on the search objective. Estimating prevalence of heart failure using ICD codes as the sole data source yielded unreliable results. Diagnostic accuracy was markedly improved using dedicated search algorithms. Our approach may be transferred to other hospital information systems.

  17. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  18. Developing the RAL front end test stand source to deliver a 60 mA, 50 Hz, 2 ms H- beam

    NASA Astrophysics Data System (ADS)

    Faircloth, Dan; Lawrie, Scott; Letchford, Alan; Gabor, Christoph; Perkins, Mike; Whitehead, Mark; Wood, Trevor; Tarvainen, Olli; Komppula, Jani; Kalvas, Taneli; Dudnikov, Vadim; Pereira, Hugo; Izaola, Zunbeltz; Simkin, John

    2013-02-01

    All the Front End Test Stand (FETS) beam requirements have been achieved, but not simultaneously [1]. At 50 Hz repetition rates beam current droop becomes unacceptable for pulse lengths longer than 1 ms. This is fundamental limitation of the present source design. Previous researchers [2] have demonstrated that using a physically larger Penning surface plasma source should overcome these limitations. The scaled source development strategy is outlined in this paper. A study of time-varying plasma behavior has been performed using a V-UV spectrometer. Initial experiments to test scaled plasma volumes are outlined. A dedicated plasma and extraction test stand (VESPA-Vessel for Extraction and Source Plasma Analysis) is being developed to allow new source and extraction designs to be appraised. The experimental work is backed up by modeling and simulations. A detailed ANSYS thermal model has been developed. IBSimu is being used to design extraction and beam transport. A novel 3D plasma modeling code using beamlets is being developed by Cobham Vector Fields using SCALA OPERA, early source modeling results are very promising. Hardware on FETS is also being developed in preparation to run the scaled source. A new 2 ms, 50 Hz, 25 kV pulsed extraction voltage power supply has been constructed and a new discharge power supply is being designed. The design of the post acceleration electrode assembly has been improved.

  19. An exploratory analysis of the nature of informal knowledge underlying theories of planned action used for public health oriented knowledge translation.

    PubMed

    Kothari, Anita; Boyko, Jennifer A; Campbell-Davison, Andrea

    2015-09-09

    Informal knowledge is used in public health practice to make sense of research findings. Although knowledge translation theories highlight the importance of informal knowledge, it is not clear to what extent the same literature provides guidance in terms of how to use it in practice. The objective of this study was to address this gap by exploring what planned action theories suggest in terms of using three types of informal knowledge: local, experiential and expert. We carried out an exploratory secondary analysis of the planned action theories that informed the development of a popular knowledge translation theory. Our sample included twenty-nine (n = 29) papers. We extracted information from these papers about sources of and guidance for using informal knowledge, and then carried out a thematic analysis. We found that theories of planned action provide guidance (including sources of, methods for identifying, and suggestions for use) for using local, experiential and expert knowledge. This study builds on previous knowledge translation related work to provide insight into the practical use of informal knowledge. Public health practitioners can refer to the guidance summarized in this paper to inform their decision-making. Further research about how to use informal knowledge in public health practice is needed given the value being accorded to using informal knowledge in public health decision-making processes.

  20. Development of the front end test stand and vessel for extraction and source plasma analyses negative hydrogen ion sources at the Rutherford Appleton Laboratory.

    PubMed

    Lawrie, S R; Faircloth, D C; Letchford, A P; Perkins, M; Whitehead, M O; Wood, T; Gabor, C; Back, J

    2014-02-01

    The ISIS pulsed spallation neutron and muon facility at the Rutherford Appleton Laboratory (RAL) in the UK uses a Penning surface plasma negative hydrogen ion source. Upgrade options for the ISIS accelerator system demand a higher current, lower emittance beam with longer pulse lengths from the injector. The Front End Test Stand is being constructed at RAL to meet the upgrade requirements using a modified ISIS ion source. A new 10% duty cycle 25 kV pulsed extraction power supply has been commissioned and the first meter of 3 MeV radio frequency quadrupole has been delivered. Simultaneously, a Vessel for Extraction and Source Plasma Analyses is under construction in a new laboratory at RAL. The detailed measurements of the plasma and extracted beam characteristics will allow a radical overhaul of the transport optics, potentially yielding a simpler source configuration with greater output and lifetime.

  1. Systematically extracting metal- and solvent-related occupational information from free-text responses to lifetime occupational history questionnaires.

    PubMed

    Friesen, Melissa C; Locke, Sarah J; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A; Purdue, Mark; Colt, Joanne S

    2014-06-01

    Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants' jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Our study population comprised 2408 subjects, reporting 11991 jobs, from a case-control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert's independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44-51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9-14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.

  2. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature

    PubMed Central

    Jimeno Yepes, Antonio; Verspoor, Karin

    2014-01-01

    As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature. PMID:25285203

  3. Beam Profile Studies for a One Eighth Betatron Wavelength Final Focusing Cell Following Phase Mixed Transport

    DTIC Science & Technology

    1988-10-26

    concentrated into this off- axis peak is then considered. Estimates of the source brightness ( extraction ion diode source current density divided by the square...radioactive contamination of the accelerator. One possible scheme for avoiding this problem is to use extraction geometry ion diodes to focus the ion beams...annular region. These results will be coupled to two simple models of extraction ion diodes to determihe the ion source brightness requirements. These

  4. Coherent Wave Measurement Buoy Arrays to Support Wave Energy Extraction

    NASA Astrophysics Data System (ADS)

    Spada, F.; Chang, G.; Jones, C.; Janssen, T. T.; Barney, P.; Roberts, J.

    2016-02-01

    Wave energy is the most abundant form of hydrokinetic energy in the United States and wave energy converters (WECs) are being developed to extract the maximum possible power from the prevailing wave climate. However, maximum wave energy capture is currently limited by the narrow banded frequency response of WECs as well as extended protective shutdown requirements during periods of large waves. These limitations must be overcome in order to maximize energy extraction, thus significantly decreasing the cost of wave energy and making it a viable energy source. Techno-economic studies of several WEC devices have shown significant potential to improve wave energy capture efficiency through operational control strategies that incorporate real-time information about local surface wave motions. Integral Consulting Inc., with ARPA-E support, is partnering with Sandia National Laboratories and Spoondrift LLC to develop a coherent array of wave-measuring devices to relay and enable the prediction of wave-resolved surface dynamics at a WEC location ahead of real time. This capability will provide necessary information to optimize power production of WECs through control strategies, thereby allowing for a single WEC design to perform more effectively across a wide range of wave environments. The information, data, or work presented herein was funded in part by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award Number DE-AR0000514.

  5. Review on the Extraction Methods of Crude oil from all Generation Biofuels in last few Decades

    NASA Astrophysics Data System (ADS)

    Bhargavi, G.; Nageswara Rao, P.; Renganathan, S.

    2018-03-01

    The ever growing demand for the energy fuels, economy of oil, depletion of energy resources and environmental protection are the inevitable challenges required to be solved meticulously in future decades in order to sustain the life of humans and other creatures. Switching to alternate fuels that are renewable, biodegradable, economically and environmentally friendly can quench the minimum thirst of fuel demands, in addition to mitigation of climate changes. At this moment, production of biofuels has got prominence. The term biofuels broadly refer to the fuels derived from living matter either animals or plants. Among the competent biofuels, biodiesel is one of the promising alternates for diesel engines. Biodiesel is renewable, environmentally friendly, safe to use with wide applications and biodegradable. Due to which, it has become a major focus of intensive global research and development of alternate energy. The present review has been focused specifically on biodiesel. Concerning to the biodiesel production, the major steps includes lipid extraction followed by esterification/transesterification. For the extraction of lipids, several extraction techniques have been put forward irrespective of the generations and feed stocks used. This review provides theoretical background on the two major extraction methods, mechanical and chemical extraction methods. The practical issues of each extraction method such as efficiency of extraction, extraction time, oil sources and its pros and cons are discussed. It is conceived that congregating information on oil extraction methods may helpful in further research advancements to ease biofuel production.

  6. Intelligent multi-sensor integrations

    NASA Technical Reports Server (NTRS)

    Volz, Richard A.; Jain, Ramesh; Weymouth, Terry

    1989-01-01

    Growth in the intelligence of space systems requires the use and integration of data from multiple sensors. Generic tools are being developed for extracting and integrating information obtained from multiple sources. The full spectrum is addressed for issues ranging from data acquisition, to characterization of sensor data, to adaptive systems for utilizing the data. In particular, there are three major aspects to the project, multisensor processing, an adaptive approach to object recognition, and distributed sensor system integration.

  7. Usability-driven pruning of large ontologies: the case of SNOMED CT

    PubMed Central

    Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan

    2012-01-01

    Objectives To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Materials and Methods Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Results Graph-traversal heuristics provided high coverage (71–96% of terms in the test sets of discharge summaries) at the expense of subset size (17–51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24–55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Discussion Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Conclusion Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available. PMID:22268217

  8. Effects of nitrogen and carbon sources on the production of inulinase from strain Bacillus sp. SG113

    NASA Astrophysics Data System (ADS)

    Gavrailov, Simeon; Ivanova, Viara

    2016-03-01

    The effects of the carbon and nitrogen substrates on the growth of Bacillus sp. SG113 strain were studied. The use of organic nitrogen sources (peptone, beef extract, yeast extract, casein) leads to rapid cellular growth and the best results for the Bacillus strain were obtained with casein hydrolysate. From the inorganic nitrogen sources studied, the (NH4) 2SO4 proved to be the best nitrogen source. Casein hydrolysate and (NH4) 2SO4 stimulated the invertase synthesis. In the presence of Jerusalem artichoke, onion and garlic extracts as carbon sources the strain synthesized from 6 to 10 times more inulinase.

  9. A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text

    PubMed Central

    Basaruddin, T.

    2016-01-01

    One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple token representations for a single drug name. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, that is, MLP. The second technique involves two deep network classifiers, that is, DBN and SAE. The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, that is, LSTM. In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645. PMID:27843447

  10. [Application of regular expression in extracting key information from Chinese medicine literatures about re-evaluation of post-marketing surveillance].

    PubMed

    Wang, Zhifei; Xie, Yanming; Wang, Yongyan

    2011-10-01

    Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.

  11. Extracting data from figures with software was faster, with higher interrater reliability than manual extraction.

    PubMed

    Jelicic Kadic, Antonia; Vucic, Katarina; Dosenovic, Svjetlana; Sapunar, Damir; Puljak, Livia

    2016-06-01

    To compare speed and accuracy of graphical data extraction using manual estimation and open source software. Data points from eligible graphs/figures published in randomized controlled trials (RCTs) from 2009 to 2014 were extracted by two authors independently, both by manual estimation and with the Plot Digitizer, open source software. Corresponding authors of each RCT were contacted up to four times via e-mail to obtain exact numbers that were used to create graphs. Accuracy of each method was compared against the source data from which the original graphs were produced. Software data extraction was significantly faster, reducing time for extraction for 47%. Percent agreement between the two raters was 51% for manual and 53.5% for software data extraction. Percent agreement between the raters and original data was 66% vs. 75% for the first rater and 69% vs. 73% for the second rater, for manual and software extraction, respectively. Data extraction from figures should be conducted using software, whereas manual estimation should be avoided. Using software for data extraction of data presented only in figures is faster and enables higher interrater reliability. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Development of an information retrieval tool for biomedical patents.

    PubMed

    Alves, Tiago; Rodrigues, Rúben; Costa, Hugo; Rocha, Miguel

    2018-06-01

    The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipeline's main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. A system for verifying models and classification maps by extraction of information from a variety of data sources

    NASA Technical Reports Server (NTRS)

    Norikane, L.; Freeman, A.; Way, J.; Okonek, S.; Casey, R.

    1992-01-01

    Recent updates to a geographical information system (GIS) called VICAR (Video Image Communication and Retrieval)/IBIS are described. The system is designed to handle data from many different formats (vector, raster, tabular) and many different sources (models, radar images, ground truth surveys, optical images). All the data are referenced to a single georeference plane, and average or typical values for parameters defined within a polygonal region are stored in a tabular file, called an info file. The info file format allows tracking of data in time, maintenance of links between component data sets and the georeference image, conversion of pixel values to `actual' values (e.g., radar cross-section, luminance, temperature), graph plotting, data manipulation, generation of training vectors for classification algorithms, and comparison between actual measurements and model predictions (with ground truth data as input).

  14. Internet Resources for Radio Astronomy

    NASA Astrophysics Data System (ADS)

    Andernach, H.

    A subjective overview of Internet resources for radio-astronomical information is presented. Basic observing techniques and their implications for the interpretation of publicly available radio data are described, followed by a discussion of existing radio surveys, their level of optical identification, and nomenclature of radio sources. Various collections of source catalogues and databases for integrated radio source parameters are reviewed and compared, as well as the web interfaces to interrogate the current and ongoing large-area surveys. Links to radio observatories with archives of raw (uv-) data are presented, as well as services providing images, both of individual objects or extracts (``cutouts'') from large-scale surveys. While the emphasis is on radio continuum data, a brief list of sites providing spectral line data, and atomic or molecular information is included. The major radio telescopes and surveys under construction or planning are outlined. A summary is given of a search for previously unknown optically bright radio sources, as performed by the students as an exercise, using Internet resources only. Over 200 different links are mentioned and were verified, but despite the attempt to make this report up-to-date, it can only provide a snapshot of the situation as of mid-1998.

  15. Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes

    PubMed Central

    Zhang, Yaoyun; Li, Hee-Jin; Wang, Jingqi; Cohen, Trevor; Roberts, Kirk; Xu, Hua

    2018-01-01

    Mental health is increasingly recognized an important topic in healthcare. Information concerning psychiatric symptoms is critical for the timely diagnosis of mental disorders, as well as for the personalization of interventions. However, the diversity and sparsity of psychiatric symptoms make it challenging for conventional natural language processing techniques to automatically extract such information from clinical text. To address this problem, this study takes the initiative to use and adapt word embeddings from four source domains – intensive care, biomedical literature, Wikipedia and Psychiatric Forum – to recognize symptoms in the target domain of psychiatry. We investigated four different approaches including 1) only using word embeddings of the source domain, 2) directly combining data of the source and target to generate word embeddings, 3) assigning different weights to word embeddings, and 4) retraining the word embedding model of the source domain using a corpus of the target domain. To the best of our knowledge, this is the first work of adapting multiple word embeddings of external domains to improve psychiatric symptom recognition in clinical text. Experimental results showed that the last two approaches outperformed the baseline methods, indicating the effectiveness of our new strategies to leverage embeddings from other domains. PMID:29888086

  16. Benchmarking for On-Scalp MEG Sensors.

    PubMed

    Xie, Minshu; Schneiderman, Justin F; Chukharkin, Maxim L; Kalabukhov, Alexei; Riaz, Bushra; Lundqvist, Daniel; Whitmarsh, Stephen; Hamalainen, Matti; Jousmaki, Veikko; Oostenveld, Robert; Winkler, Dag

    2017-06-01

    We present a benchmarking protocol for quantitatively comparing emerging on-scalp magnetoencephalography (MEG) sensor technologies to their counterparts in state-of-the-art MEG systems. As a means of validation, we compare a high-critical-temperature superconducting quantum interference device (high T c SQUID) with the low- T c SQUIDs of an Elekta Neuromag TRIUX system in MEG recordings of auditory and somatosensory evoked fields (SEFs) on one human subject. We measure the expected signal gain for the auditory-evoked fields (deeper sources) and notice some unfamiliar features in the on-scalp sensor-based recordings of SEFs (shallower sources). The experimental results serve as a proof of principle for the benchmarking protocol. This approach is straightforward, general to various on-scalp MEG sensors, and convenient to use on human subjects. The unexpected features in the SEFs suggest on-scalp MEG sensors may reveal information about neuromagnetic sources that is otherwise difficult to extract from state-of-the-art MEG recordings. As the first systematically established on-scalp MEG benchmarking protocol, magnetic sensor developers can employ this method to prove the utility of their technology in MEG recordings. Further exploration of the SEFs with on-scalp MEG sensors may reveal unique information about their sources.

  17. Microlensing as a possible probe of event-horizon structure in quasars

    NASA Astrophysics Data System (ADS)

    Tomozeiu, Mihai; Mohammed, Irshad; Rabold, Manuel; Saha, Prasenjit; Wambsganss, Joachim

    2018-04-01

    In quasars which are lensed by galaxies, the point-like images sometimes show sharp and uncorrelated brightness variations (microlensing). These brightness changes are associated with the innermost region of the quasar passing through a complicated pattern of caustics produced by the stars in the lensing galaxy. In this paper, we study whether the universal properties of optical caustics could enable extraction of shape information about the central engine of quasars. We present a toy model with a crescent-shaped source crossing a fold caustic. The silhouette of a black hole over an accretion disc tends to produce roughly crescent sources. When a crescent-shaped source crosses a fold caustic, the resulting light curve is noticeably different from the case of a circular luminosity profile or Gaussian source. With good enough monitoring data, the crescent parameters, apart from one degeneracy, can be recovered.

  18. Microlensing as a Possible Probe of Event-Horizon Structure in Quasars

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tomozeiu, Mihai; Mohammed, Irshad; Rabold, Manuel

    In quasars which are lensed by galaxies, the point-like images sometimes show sharp and uncorrelated brightness variations (microlensing). These brightness changes are associated with the innermost region of the quasar passing through a complicated pattern of caustics produced by the stars in the lensing galaxy. In this paper, we study whether the universal properties of optical caustics could enable extraction of shape information about the central engine of quasars. We present a toy model with a crescent-shaped source crossing a fold caustic. The silhouette of a black hole over an accretion disk tends to produce roughly crescent sources. When amore » crescent-shaped source crosses a fold caustic, the resulting light curve is noticeably different from the case of a circular luminosity profile or Gaussian source. With good enough monitoring data, the crescent parameters, apart from one degeneracy, can be recovered.« less

  19. Microlensing as a Possible Probe of Event-Horizon Structure in Quasars

    DOE PAGES

    Tomozeiu, Mihai; Mohammed, Irshad; Rabold, Manuel; ...

    2017-12-08

    In quasars which are lensed by galaxies, the point-like images sometimes show sharp and uncorrelated brightness variations (microlensing). These brightness changes are associated with the innermost region of the quasar passing through a complicated pattern of caustics produced by the stars in the lensing galaxy. In this paper, we study whether the universal properties of optical caustics could enable extraction of shape information about the central engine of quasars. We present a toy model with a crescent-shaped source crossing a fold caustic. The silhouette of a black hole over an accretion disk tends to produce roughly crescent sources. When amore » crescent-shaped source crosses a fold caustic, the resulting light curve is noticeably different from the case of a circular luminosity profile or Gaussian source. With good enough monitoring data, the crescent parameters, apart from one degeneracy, can be recovered.« less

  20. A review on potential use of low-temperature water in the urban environment as a thermal-energy source

    NASA Astrophysics Data System (ADS)

    Laanearu, J.; Borodinecs, A.; Rimeika, M.; Palm, B.

    2017-10-01

    The thermal-energy potential of urban water sources is largely unused to accomplish the up-to-date requirements of the buildings energy demands in the cities of Baltic Sea Region. A reason is that the natural and excess-heat water sources have a low temperature and heat that should be upgraded before usage. The demand for space cooling should increase in near future with thermal insulation of buildings. There are a number of options to recover heat also from wastewater. It is proposed that a network of heat extraction and insertion including the thermal-energy recovery schemes has potential to be broadly implemented in the region with seasonally alternating temperature. The mapping of local conditions is essential in finding the suitable regions (hot spots) for future application of a heat recovery schemes by combining information about demands with information about available sources. The low-temperature water in the urban environment is viewed as a potential thermal-energy source. To recover thermal energy efficiently, it is also essential to ensure that it is used locally, and adverse effects on environment and industrial processes are avoided. Some characteristics reflecting the energy usage are discussed in respect of possible improvements of energy efficiency.

  1. Cortical Transformation of Spatial Processing for Solving the Cocktail Party Problem: A Computational Model123

    PubMed Central

    Dong, Junzi; Colburn, H. Steven

    2016-01-01

    In multisource, “cocktail party” sound environments, human and animal auditory systems can use spatial cues to effectively separate and follow one source of sound over competing sources. While mechanisms to extract spatial cues such as interaural time differences (ITDs) are well understood in precortical areas, how such information is reused and transformed in higher cortical regions to represent segregated sound sources is not clear. We present a computational model describing a hypothesized neural network that spans spatial cue detection areas and the cortex. This network is based on recent physiological findings that cortical neurons selectively encode target stimuli in the presence of competing maskers based on source locations (Maddox et al., 2012). We demonstrate that key features of cortical responses can be generated by the model network, which exploits spatial interactions between inputs via lateral inhibition, enabling the spatial separation of target and interfering sources while allowing monitoring of a broader acoustic space when there is no competition. We present the model network along with testable experimental paradigms as a starting point for understanding the transformation and organization of spatial information from midbrain to cortex. This network is then extended to suggest engineering solutions that may be useful for hearing-assistive devices in solving the cocktail party problem. PMID:26866056

  2. Cortical Transformation of Spatial Processing for Solving the Cocktail Party Problem: A Computational Model(1,2,3).

    PubMed

    Dong, Junzi; Colburn, H Steven; Sen, Kamal

    2016-01-01

    In multisource, "cocktail party" sound environments, human and animal auditory systems can use spatial cues to effectively separate and follow one source of sound over competing sources. While mechanisms to extract spatial cues such as interaural time differences (ITDs) are well understood in precortical areas, how such information is reused and transformed in higher cortical regions to represent segregated sound sources is not clear. We present a computational model describing a hypothesized neural network that spans spatial cue detection areas and the cortex. This network is based on recent physiological findings that cortical neurons selectively encode target stimuli in the presence of competing maskers based on source locations (Maddox et al., 2012). We demonstrate that key features of cortical responses can be generated by the model network, which exploits spatial interactions between inputs via lateral inhibition, enabling the spatial separation of target and interfering sources while allowing monitoring of a broader acoustic space when there is no competition. We present the model network along with testable experimental paradigms as a starting point for understanding the transformation and organization of spatial information from midbrain to cortex. This network is then extended to suggest engineering solutions that may be useful for hearing-assistive devices in solving the cocktail party problem.

  3. Cross-Matching Source Observations from the Palomar Transient Factory (PTF)

    NASA Astrophysics Data System (ADS)

    Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.

    2009-01-01

    Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).

  4. Model-based Bayesian signal extraction algorithm for peripheral nerves

    NASA Astrophysics Data System (ADS)

    Eggers, Thomas E.; Dweiri, Yazan M.; McCallum, Grant A.; Durand, Dominique M.

    2017-10-01

    Objective. Multi-channel cuff electrodes have recently been investigated for extracting fascicular-level motor commands from mixed neural recordings. Such signals could provide volitional, intuitive control over a robotic prosthesis for amputee patients. Recent work has demonstrated success in extracting these signals in acute and chronic preparations using spatial filtering techniques. These extracted signals, however, had low signal-to-noise ratios and thus limited their utility to binary classification. In this work a new algorithm is proposed which combines previous source localization approaches to create a model based method which operates in real time. Approach. To validate this algorithm, a saline benchtop setup was created to allow the precise placement of artificial sources within a cuff and interference sources outside the cuff. The artificial source was taken from five seconds of chronic neural activity to replicate realistic recordings. The proposed algorithm, hybrid Bayesian signal extraction (HBSE), is then compared to previous algorithms, beamforming and a Bayesian spatial filtering method, on this test data. An example chronic neural recording is also analyzed with all three algorithms. Main results. The proposed algorithm improved the signal to noise and signal to interference ratio of extracted test signals two to three fold, as well as increased the correlation coefficient between the original and recovered signals by 10-20%. These improvements translated to the chronic recording example and increased the calculated bit rate between the recovered signals and the recorded motor activity. Significance. HBSE significantly outperforms previous algorithms in extracting realistic neural signals, even in the presence of external noise sources. These results demonstrate the feasibility of extracting dynamic motor signals from a multi-fascicled intact nerve trunk, which in turn could extract motor command signals from an amputee for the end goal of controlling a prosthetic limb.

  5. Phytochemical screening and in vitro bioactivities of the extracts of aerial part of Boerhavia diffusa Linn.

    PubMed

    Apu, Apurba Sarker; Liza, Mahmuda Sultana; Jamaluddin, A T M; Howlader, Md Amran; Saha, Repon Kumer; Rizwan, Farhana; Nasrin, Nishat

    2012-09-01

    To investigate the bioactivities of crude n-hexane, ethyl acetate and methanol extracts of aerial part of Boerhavia diffusa Linn. (B. diffusa) and its phytochemical analysis. The identification of phytoconstituents and assay of antioxidant, thrombolytic, cytotoxic, antimicrobial activities were conducted using specific standard in vitro procedures. The results showed that the plant extracts were a rich source of phytoconstituents. Methanol extract showed higher antioxidant, thrombolytic activity and less cytotoxic activity than those of n-hexane and ethyl acetate extracts of B. diffusa. Among the bioactivities, antioxidant activity was the most notable compared to the positive control and thus could be a potential rich source of natural antioxidant. In case of antimicrobial screening, crude extracts of the plant showed remarkable antibacterial activity against tested microorganisms. All the extracts showed significant inhibitory activity against Candida albicuns, at a concentration of 1000 µg/disc. The present findings suggest that, the plant widely available in Bangladesh, could be a prominent source of medicinally important natural compounds.

  6. Induced lexico-syntactic patterns improve information extraction from online medical forums.

    PubMed

    Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D

    2014-01-01

    To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  7. Kurtosis-Based Blind Source Extraction of Complex Non-Circular Signals with Application in EEG Artifact Removal in Real-Time

    PubMed Central

    Javidi, Soroush; Mandic, Danilo P.; Took, Clive Cheong; Cichocki, Andrzej

    2011-01-01

    A new class of complex domain blind source extraction algorithms suitable for the extraction of both circular and non-circular complex signals is proposed. This is achieved through sequential extraction based on the degree of kurtosis and in the presence of non-circular measurement noise. The existence and uniqueness analysis of the solution is followed by a study of fast converging variants of the algorithm. The performance is first assessed through simulations on well understood benchmark signals, followed by a case study on real-time artifact removal from EEG signals, verified using both qualitative and quantitative metrics. The results illustrate the power of the proposed approach in real-time blind extraction of general complex-valued sources. PMID:22319461

  8. A study on prevention of an electric discharge at an extraction electrode of an electron cyclotron resonance ion source for cancer therapy.

    PubMed

    Kishii, Y; Kawasaki, S; Kitagawa, A; Muramatsu, M; Uchida, T

    2014-02-01

    A compact ECR ion source has utilized for carbon radiotherapy. In order to increase beam intensity with higher electric field at the extraction electrode and be better ion supply stability for long periods, electric geometry and surface conditions of an extraction electrode have been studied. Focusing attention on black deposited substances on the extraction electrode, which were observed around the extraction electrode after long-term use, the relation between black deposited substances and the electrical insulation property is investigated. The black deposited substances were inspected for the thickness of deposit, surface roughness, structural arrangement examined using Raman spectroscopy, and characteristics of electric discharge in a test bench, which was set up to simulate the ECR ion source.

  9. Time-correlated neutron analysis of a multiplying HEU source

    NASA Astrophysics Data System (ADS)

    Miller, E. C.; Kalter, J. M.; Lavelle, C. M.; Watson, S. M.; Kinlaw, M. T.; Chichester, D. L.; Noonan, W. A.

    2015-06-01

    The ability to quickly identify and characterize special nuclear material remains a national security challenge. In counter-proliferation applications, identifying the neutron multiplication of a sample can be a good indication of the level of threat. Currently neutron multiplicity measurements are performed with moderated 3He proportional counters. These systems rely on the detection of thermalized neutrons, a process which obscures both energy and time information from the source. Fast neutron detectors, such as liquid scintillators, have the ability to detect events on nanosecond time scales, providing more information on the temporal structure of the arriving signal, and provide an alternative method for extracting information from the source. To explore this possibility, a series of measurements were performed on the Idaho National Laboratory's MARVEL assembly, a configurable HEU source. The source assembly was measured in a variety of different HEU configurations and with different reflectors, covering a range of neutron multiplications from 2 to 8. The data was collected with liquid scintillator detectors and digitized for offline analysis. A gap based approach for identifying the bursts of detected neutrons associated with the same fission chain was used. Using this approach, we are able to study various statistical properties of individual fission chains. One of these properties is the distribution of neutron arrival times within a given burst. We have observed two interesting empirical trends. First, this distribution exhibits a weak, but definite, dependence on source multiplication. Second, there are distinctive differences in the distribution depending on the presence and type of reflector. Both of these phenomena might prove to be useful when assessing an unknown source. The physical origins of these phenomena can be illuminated with help of MCNPX-PoliMi simulations.

  10. Utilising social media contents for flood inundation mapping

    NASA Astrophysics Data System (ADS)

    Schröter, Kai; Dransch, Doris; Fohringer, Joachim; Kreibich, Heidi

    2016-04-01

    Data about the hazard and its consequences are scarce and not readily available during and shortly after a disaster. An information source which should be explored in a more efficient way is eyewitness accounts via social media. This research presents a methodology that leverages social media content to support rapid inundation mapping, including inundation extent and water depth in the case of floods. It uses quantitative data that are estimated from photos extracted from social media posts and their integration with established data. Due to the rapid availability of these posts compared to traditional data sources such as remote sensing data, areas affected by a flood, for example, can be determined quickly. Key challenges are to filter the large number of posts to a manageable amount of potentially useful inundation-related information, and to interpret and integrate the posts into mapping procedures in a timely manner. We present a methodology and a tool ("PostDistiller") to filter geo-located posts from social media services which include links to photos and to further explore this spatial distributed contextualized in situ information for inundation mapping. The June 2013 flood in Dresden is used as an application case study in which we evaluate the utilization of this approach and compare the resulting spatial flood patterns and inundation depths to 'traditional' data sources and mapping approaches like water level observations and remote sensing flood masks. The outcomes of the application case are encouraging. Strengths of the proposed procedure are that information for the estimation of inundation depth is rapidly available, particularly in urban areas where it is of high interest and of great value because alternative information sources like remote sensing data analysis do not perform very well. The uncertainty of derived inundation depth data and the uncontrollable availability of the information sources are major threats to the utility of the approach.

  11. Beyond seismic interferometry: imaging the earth's interior with virtual sources and receivers inside the earth

    NASA Astrophysics Data System (ADS)

    Wapenaar, C. P. A.; Van der Neut, J.; Thorbecke, J.; Broggini, F.; Slob, E. C.; Snieder, R.

    2015-12-01

    Imagine one could place seismic sources and receivers at any desired position inside the earth. Since the receivers would record the full wave field (direct waves, up- and downward reflections, multiples, etc.), this would give a wealth of information about the local structures, material properties and processes in the earth's interior. Although in reality one cannot place sources and receivers anywhere inside the earth, it appears to be possible to create virtual sources and receivers at any desired position, which accurately mimics the desired situation. The underlying method involves some major steps beyond standard seismic interferometry. With seismic interferometry, virtual sources can be created at the positions of physical receivers, assuming these receivers are illuminated isotropically. Our proposed method does not need physical receivers at the positions of the virtual sources; moreover, it does not require isotropic illumination. To create virtual sources and receivers anywhere inside the earth, it suffices to record the reflection response with physical sources and receivers at the earth's surface. We do not need detailed information about the medium parameters; it suffices to have an estimate of the direct waves between the virtual-source positions and the acquisition surface. With these prerequisites, our method can create virtual sources and receivers, anywhere inside the earth, which record the full wave field. The up- and downward reflections, multiples, etc. in the virtual responses are extracted directly from the reflection response at the surface. The retrieved virtual responses form an ideal starting point for accurate seismic imaging, characterization and monitoring.

  12. Anticandidal, antibacterial, cytotoxic and antioxidant activities of Calendula arvensis flowers.

    PubMed

    Abudunia, A-M; Marmouzi, I; Faouzi, M E A; Ramli, Y; Taoufik, J; El Madani, N; Essassi, E M; Salama, A; Khedid, K; Ansar, M; Ibrahimi, A

    2017-03-01

    Calendula arvensis (CA) is one of the important plants used in traditional medicine in Morocco, due to its interesting chemical composition. The present study aimed to determine the anticandidal, antioxidant and antibacterial activities, and the effects of extracts of CA flowers on the growth of myeloid cancer cells. Also, to characterize the chemical composition of the plant. Flowers of CA were collected based on ethnopharmacological information from the villages around the region Rabat-Khemisset, Moroccco. The hexane and methanol extracts were obtained by soxhlet extraction, while aqueous extracts was obtained by maceration in cold water. CA extracts were assessed for antioxidant activity using four different methods (DPPH, FRAP, TEAC, β-carotene bleaching test). Furthermore, the phenolic and flavonoid contents were measured, also the antimicrobial activity has been evaluated by the well diffusion method using several bacterial and fungal strains. Finally, extracts cytotoxicity was assessed using MTT test. Phytochemical quantification of the methanolic and aqueous extracts revealed that they were rich with flavonoid and phenolic content and were found to possess considerable antioxidant activities. MIC values of methanolic extracts were 12.5-25μg/mL. While MIC values of hexanolic extracts were between 6.25-12.5μg/mL and were bacteriostatic for all bacteria while methanolic and aqueous extracts were bactericidal. In addition, the extracts exhibited no activity on Candida species except the methanolic extract, which showed antifungal activity onCandida tropicalis 1 and Candida famata 1. The methanolic and aqueous extracts also exhibited antimyeloid cancer activity (IC 50 of 31μg/mL). In our study, we conclude that the methanolic and aqueous extracts were a promising source of antioxidant, antimicrobial and cytotoxic agents. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  13. Challenges in Managing Information Extraction

    ERIC Educational Resources Information Center

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  14. Independent component analysis for automatic note extraction from musical trills

    NASA Astrophysics Data System (ADS)

    Brown, Judith C.; Smaragdis, Paris

    2004-05-01

    The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.

  15. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

    PubMed

    Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi

    2015-09-01

    Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. Extracting Metrics for Three-dimensional Root Systems: Volume and Surface Analysis from In-soil X-ray Computed Tomography Data.

    PubMed

    Suresh, Niraj; Stephens, Sean A; Adams, Lexor; Beck, Anthon N; McKinney, Adriana L; Varga, Tamas

    2016-04-26

    Plant roots play a critical role in plant-soil-microbe interactions that occur in the rhizosphere, as well as processes with important implications to climate change and crop management. Quantitative size information on roots in their native environment is invaluable for studying root growth and environmental processes involving plants. X-ray computed tomography (XCT) has been demonstrated to be an effective tool for in situ root scanning and analysis. We aimed to develop a costless and efficient tool that approximates the surface and volume of the root regardless of its shape from three-dimensional (3D) tomography data. The root structure of a Prairie dropseed (Sporobolus heterolepis) specimen was imaged using XCT. The root was reconstructed, and the primary root structure was extracted from the data using a combination of licensed and open-source software. An isosurface polygonal mesh was then created for ease of analysis. We have developed the standalone application imeshJ, generated in MATLAB(1), to calculate root volume and surface area from the mesh. The outputs of imeshJ are surface area (in mm(2)) and the volume (in mm(3)). The process, utilizing a unique combination of tools from imaging to quantitative root analysis, is described. A combination of XCT and open-source software proved to be a powerful combination to noninvasively image plant root samples, segment root data, and extract quantitative information from the 3D data. This methodology of processing 3D data should be applicable to other material/sample systems where there is connectivity between components of similar X-ray attenuation and difficulties arise with segmentation.

  17. Disclosure of terminal illness to patients and families: diversity of governing codes in 14 Islamic countries.

    PubMed

    Abdulhameed, Hunida E; Hammami, Muhammad M; Mohamed, Elbushra A Hameed

    2011-08-01

    The consistency of codes governing disclosure of terminal illness to patients and families in Islamic countries has not been studied until now. To review available codes on disclosure of terminal illness in Islamic countries. DATA SOURCE AND EXTRACTION: Data were extracted through searches on Google and PubMed. Codes related to disclosure of terminal illness to patients or families were abstracted, and then classified independently by the three authors. Codes for 14 Islamic countries were located. Five codes were silent regarding informing the patient, seven allowed concealment, one mandated disclosure and one prohibited disclosure. Five codes were silent regarding informing the family, four allowed disclosure and five mandated/recommended disclosure. The Islamic Organization for Medical Sciences code was silent on both issues. Codes regarding disclosure of terminal illness to patients and families differed markedly among Islamic countries. They were silent in one-third of the codes, and tended to favour a paternalistic/utilitarian, family-centred approach over an autonomous, patient-centred approach.

  18. Extending the spectrum of DNA sequences retrieved from ancient bones and teeth

    PubMed Central

    Glocke, Isabelle; Meyer, Matthias

    2017-01-01

    The number of DNA fragments surviving in ancient bones and teeth is known to decrease with fragment length. Recent genetic analyses of Middle Pleistocene remains have shown that the recovery of extremely short fragments can prove critical for successful retrieval of sequence information from particularly degraded ancient biological material. Current sample preparation techniques, however, are not optimized to recover DNA sequences from fragments shorter than ∼35 base pairs (bp). Here, we show that much shorter DNA fragments are present in ancient skeletal remains but lost during DNA extraction. We present a refined silica-based DNA extraction method that not only enables efficient recovery of molecules as short as 25 bp but also doubles the yield of sequences from longer fragments due to improved recovery of molecules with single-strand breaks. Furthermore, we present strategies for monitoring inefficiencies in library preparation that may result from co-extraction of inhibitory substances during DNA extraction. The combination of DNA extraction and library preparation techniques described here substantially increases the yield of DNA sequences from ancient remains and provides access to a yet unexploited source of highly degraded DNA fragments. Our work may thus open the door for genetic analyses on even older material. PMID:28408382

  19. Spectral Regression Based Fault Feature Extraction for Bearing Accelerometer Sensor Signals

    PubMed Central

    Xia, Zhanguo; Xia, Shixiong; Wan, Ling; Cai, Shiyu

    2012-01-01

    Bearings are not only the most important element but also a common source of failures in rotary machinery. Bearing fault prognosis technology has been receiving more and more attention recently, in particular because it plays an increasingly important role in avoiding the occurrence of accidents. Therein, fault feature extraction (FFE) of bearing accelerometer sensor signals is essential to highlight representative features of bearing conditions for machinery fault diagnosis and prognosis. This paper proposes a spectral regression (SR)-based approach for fault feature extraction from original features including time, frequency and time-frequency domain features of bearing accelerometer sensor signals. SR is a novel regression framework for efficient regularized subspace learning and feature extraction technology, and it uses the least squares method to obtain the best projection direction, rather than computing the density matrix of features, so it also has the advantage in dimensionality reduction. The effectiveness of the SR-based method is validated experimentally by applying the acquired vibration signals data to bearings. The experimental results indicate that SR can reduce the computation cost and preserve more structure information about different bearing faults and severities, and it is demonstrated that the proposed feature extraction scheme has an advantage over other similar approaches. PMID:23202017

  20. Electrical shielding box measurement of the negative hydrogen beam from Penning ion gauge ion source.

    PubMed

    Wang, T; Yang, Z; Dong, P; long, J D; He, X Z; Wang, X; Zhang, K Z; Zhang, L W

    2012-06-01

    The cold-cathode Penning ion gauge (PIG) type ion source has been used for generation of negative hydrogen (H(-)) ions as the internal ion source of a compact cyclotron. A novel method called electrical shielding box dc beam measurement is described in this paper, and the beam intensity was measured under dc extraction inside an electrical shielding box. The results of the trajectory simulation and dc H(-) beam extraction measurement were presented. The effect of gas flow rate, magnetic field strength, arc current, and extraction voltage were also discussed. In conclusion, the dc H(-) beam current of about 4 mA from the PIG ion source with the puller voltage of 40 kV and arc current of 1.31 A was extrapolated from the measurement at low extraction dc voltages.

  1. Development of a helicon ion source: Simulations and preliminary experiments.

    PubMed

    Afsharmanesh, M; Habibi, M

    2018-03-01

    In the present context, the extraction system of a helicon ion source has been simulated and constructed. Results of the ion source commissioning at up to 20 kV are presented as well as simulations of an ion beam extraction system. Argon current of more than 200 μA at up to 20 kV is extracted and is characterized with a Faraday cup and beam profile monitoring grid. By changing different ion source parameters such as RF power, extraction voltage, and working pressure, an ion beam with current distribution exhibiting a central core has been detected. Jump transition of ion beam current emerges at the RF power near to 700 W, which reveals that the helicon mode excitation has reached this power. Furthermore, measuring the emission line intensity of Ar ii at 434.8 nm is the other way we have used for demonstrating the mode transition from inductively coupled plasma to helicon. Due to asymmetrical longitudinal power absorption of a half-helix helicon antenna, it is used for the ion source development. The modeling of the plasma part of the ion source has been carried out using a code, HELIC. Simulations are carried out by taking into account a Gaussian radial plasma density profile and for plasma densities in range of 10 18 -10 19 m -3 . Power absorption spectrum and the excited helicon mode number are obtained. Longitudinal RF power absorption for two different antenna positions is compared. Our results indicate that positioning the antenna near to the plasma electrode is desirable for the ion beam extraction. The simulation of the extraction system was performed with the ion optical code IBSimu, making it the first helicon ion source extraction designed with the code. Ion beam emittance and Twiss parameters of the ellipse emittance are calculated at different iterations and mesh sizes, and the best values of the mesh size and iteration number have been obtained for the calculations. The simulated ion beam extraction system has been evaluated using optimized parameters such as the gap distance between electrodes, electrodes aperture, and extraction voltage. The gap distance, ground electrode aperture, and extraction voltage have been changed between 3 and 9 mm, 2-6.5 mm, and 10-35 kV in the simulations, respectively.

  2. Development of a helicon ion source: Simulations and preliminary experiments

    NASA Astrophysics Data System (ADS)

    Afsharmanesh, M.; Habibi, M.

    2018-03-01

    In the present context, the extraction system of a helicon ion source has been simulated and constructed. Results of the ion source commissioning at up to 20 kV are presented as well as simulations of an ion beam extraction system. Argon current of more than 200 μA at up to 20 kV is extracted and is characterized with a Faraday cup and beam profile monitoring grid. By changing different ion source parameters such as RF power, extraction voltage, and working pressure, an ion beam with current distribution exhibiting a central core has been detected. Jump transition of ion beam current emerges at the RF power near to 700 W, which reveals that the helicon mode excitation has reached this power. Furthermore, measuring the emission line intensity of Ar ii at 434.8 nm is the other way we have used for demonstrating the mode transition from inductively coupled plasma to helicon. Due to asymmetrical longitudinal power absorption of a half-helix helicon antenna, it is used for the ion source development. The modeling of the plasma part of the ion source has been carried out using a code, HELIC. Simulations are carried out by taking into account a Gaussian radial plasma density profile and for plasma densities in range of 1018-1019 m-3. Power absorption spectrum and the excited helicon mode number are obtained. Longitudinal RF power absorption for two different antenna positions is compared. Our results indicate that positioning the antenna near to the plasma electrode is desirable for the ion beam extraction. The simulation of the extraction system was performed with the ion optical code IBSimu, making it the first helicon ion source extraction designed with the code. Ion beam emittance and Twiss parameters of the ellipse emittance are calculated at different iterations and mesh sizes, and the best values of the mesh size and iteration number have been obtained for the calculations. The simulated ion beam extraction system has been evaluated using optimized parameters such as the gap distance between electrodes, electrodes aperture, and extraction voltage. The gap distance, ground electrode aperture, and extraction voltage have been changed between 3 and 9 mm, 2-6.5 mm, and 10-35 kV in the simulations, respectively.

  3. Binary Code Extraction and Interface Identification for Security Applications

    DTIC Science & Technology

    2009-10-02

    the functions extracted during the end-to-end applications and at the bottom some additional functions extracted from the OpenSSL library. fact that as...mentioned in Section 5.1 through Section 5.3 and some additional functions that we extract from the OpenSSL library for evaluation purposes. The... OpenSSL functions, the false positives and negatives are measured by comparison with the original C source code. For the malware samples, no source is

  4. Broadband Processing in a Noisy Shallow Ocean Environment: A Particle Filtering Approach

    DOE PAGES

    Candy, J. V.

    2016-04-14

    Here we report that when a broadband source propagates sound in a shallow ocean the received data can become quite complicated due to temperature-related sound-speed variations and therefore a highly dispersive environment. Noise and uncertainties disrupt this already chaotic environment even further because disturbances propagate through the same inherent acoustic channel. The broadband (signal) estimation/detection problem can be decomposed into a set of narrowband solutions that are processed separately and then combined to achieve more enhancement of signal levels than that available from a single frequency, thereby allowing more information to be extracted leading to a more reliable source detection.more » A Bayesian solution to the broadband modal function tracking, pressure-field enhancement, and source detection problem is developed that leads to nonparametric estimates of desired posterior distributions enabling the estimation of useful statistics and an improved processor/detector. In conclusion, to investigate the processor capabilities, we synthesize an ensemble of noisy, broadband, shallow-ocean measurements to evaluate its overall performance using an information theoretical metric for the preprocessor and the receiver operating characteristic curve for the detector.« less

  5. Detection of goal events in soccer videos

    NASA Astrophysics Data System (ADS)

    Kim, Hyoung-Gook; Roeber, Steffen; Samour, Amjad; Sikora, Thomas

    2005-01-01

    In this paper, we present an automatic extraction of goal events in soccer videos by using audio track features alone without relying on expensive-to-compute video track features. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The detection of soccer video highlights using audio contents comprises three steps: 1) extraction of audio features from a video sequence, 2) event candidate detection of highlight events based on the information provided by the feature extraction Methods and the Hidden Markov Model (HMM), 3) goal event selection to finally determine the video intervals to be included in the summary. For this purpose we compared the performance of the well known Mel-scale Frequency Cepstral Coefficients (MFCC) feature extraction method vs. MPEG-7 Audio Spectrum Projection feature (ASP) extraction method based on three different decomposition methods namely Principal Component Analysis( PCA), Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF). To evaluate our system we collected five soccer game videos from various sources. In total we have seven hours of soccer games consisting of eight gigabytes of data. One of five soccer games is used as the training data (e.g., announcers' excited speech, audience ambient speech noise, audience clapping, environmental sounds). Our goal event detection results are encouraging.

  6. a R-Shiny Based Phenology Analysis System and Case Study Using Digital Camera Dataset

    NASA Astrophysics Data System (ADS)

    Zhou, Y. K.

    2018-05-01

    Accurate extracting of the vegetation phenology information play an important role in exploring the effects of climate changes on vegetation. Repeated photos from digital camera is a useful and huge data source in phonological analysis. Data processing and mining on phenological data is still a big challenge. There is no single tool or a universal solution for big data processing and visualization in the field of phenology extraction. In this paper, we proposed a R-shiny based web application for vegetation phenological parameters extraction and analysis. Its main functions include phenological site distribution visualization, ROI (Region of Interest) selection, vegetation index calculation and visualization, data filtering, growth trajectory fitting, phenology parameters extraction, etc. the long-term observation photography data from Freemanwood site in 2013 is processed by this system as an example. The results show that: (1) this system is capable of analyzing large data using a distributed framework; (2) The combination of multiple parameter extraction and growth curve fitting methods could effectively extract the key phenology parameters. Moreover, there are discrepancies between different combination methods in unique study areas. Vegetation with single-growth peak is suitable for using the double logistic module to fit the growth trajectory, while vegetation with multi-growth peaks should better use spline method.

  7. Hybrid method for building extraction in vegetation-rich urban areas from very high-resolution satellite imagery

    NASA Astrophysics Data System (ADS)

    Jayasekare, Ajith S.; Wickramasuriya, Rohan; Namazi-Rad, Mohammad-Reza; Perez, Pascal; Singh, Gaurav

    2017-07-01

    A continuous update of building information is necessary in today's urban planning. Digital images acquired by remote sensing platforms at appropriate spatial and temporal resolutions provide an excellent data source to achieve this. In particular, high-resolution satellite images are often used to retrieve objects such as rooftops using feature extraction. However, high-resolution images acquired over built-up areas are associated with noises such as shadows that reduce the accuracy of feature extraction. Feature extraction heavily relies on the reflectance purity of objects, which is difficult to perfect in complex urban landscapes. An attempt was made to increase the reflectance purity of building rooftops affected by shadows. In addition to the multispectral (MS) image, derivatives thereof namely, normalized difference vegetation index and principle component (PC) images were incorporated in generating the probability image. This hybrid probability image generation ensured that the effect of shadows on rooftop extraction, particularly on light-colored roofs, is largely eliminated. The PC image was also used for image segmentation, which further increased the accuracy compared to segmentation performed on an MS image. Results show that the presented method can achieve higher rooftop extraction accuracy (70.4%) in vegetation-rich urban areas compared to traditional methods.

  8. Extraction of hexavalent chromium from chromated copper arsenate treated wood under alkaline conditions.

    PubMed

    Radivojevic, Suzana; Cooper, Paul A

    2008-05-15

    Information on chromium (Cr) oxidation states is essential for the assessment of environmental and health risks associated with the overall life-cycle of chromated copper arsenate (CCA) treated wood products because of differences in toxicity between trivalent [Cr(III)] and hexavalent [Cr(VI)] chromium compounds. Hypothetical Cr(VI) fixation products were investigated in CCA type C treated sawdust of aspen and red pine during or following preservative fixation by extraction with Cr(VI)-specific extractants. Cr(VI) was found only in alkaline extracts of treated wood. A major source of Cr(VI) was method-induced oxidation of fixed Cr(III) during alkaline extraction, as confirmed by demonstrated oxidation of Cr(III) from CrCl3 treated wood. Oxidation of nontoxic and immobile Cr(III) to toxic and mobile Cr(VI) was facilitated by the presence of wood at pH > 8.5. Thermodynamic equilibrium between Cr(III) and Cr(VI) is affected by pH, temperature, rates of dissolution of CrIII) compounds, and oxygen availability. Results of this study recommend against alkaline extraction protocols for determination of Cr(VI) in treated wood. This Cr oxidation mechanism can act as a previously unrecognized route for generation of hazardous Cr(VI) if CCA treated wood is exposed to alkaline conditions during its production, use, or waste management.

  9. In-situ continuous water analyzing module

    DOEpatents

    Thompson, Cyril V.; Wise, Marcus B.

    1998-01-01

    An in-situ continuous liquid analyzing system for continuously analyzing volatile components contained in a water source comprises: a carrier gas supply, an extraction container and a mass spectrometer. The carrier gas supply continuously supplies the carrier gas to the extraction container and is mixed with a water sample that is continuously drawn into the extraction container. The carrier gas continuously extracts the volatile components out of the water sample. The water sample is returned to the water source after the volatile components are extracted from it. The extracted volatile components and the carrier gas are delivered continuously to the mass spectometer and the volatile components are continuously analyzed by the mass spectrometer.

  10. Extraction of space-charge-dominated ion beams from an ECR ion source: Theory and simulation

    NASA Astrophysics Data System (ADS)

    Alton, G. D.; Bilheux, H.

    2004-05-01

    Extraction of high quality space-charge-dominated ion beams from plasma ion sources constitutes an optimization problem centered about finding an optimal concave plasma emission boundary that minimizes half-angular divergence for a given charge state, independent of the presence or lack thereof of a magnetic field in the extraction region. The curvature of the emission boundary acts to converge/diverge the low velocity beam during extraction. Beams of highest quality are extracted whenever the half-angular divergence, ω, is minimized. Under minimum half-angular divergence conditions, the plasma emission boundary has an optimum curvature and the perveance, P, current density, j+ext, and extraction gap, d, have optimum values for a given charge state, q. Optimum values for each of the independent variables (P, j+ext and d) are found to be in close agreement with those derived from elementary analytical theory for extraction with a simple two-electrode extraction system, independent of the presence of a magnetic field. The magnetic field only increases the emittances of beams through additional aberrational effects caused by increased angular divergences through coupling of the longitudinal to the transverse velocity components of particles as they pass though the mirror region of the electron cyclotron resonance (ECR) ion source. This article reviews the underlying theory of elementary extraction optics and presents results derived from simulation studies of extraction of space-charge dominated heavy-ion beams of varying mass, charge state, and intensity from an ECR ion source with emphasis on magnetic field induced effects.

  11. a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information

    NASA Astrophysics Data System (ADS)

    Lian, Shizhong; Chen, Jiangping; Luo, Minghai

    2016-06-01

    Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.

  12. Full-Scale Turbofan Engine Noise-Source Separation Using a Four-Signal Method

    NASA Technical Reports Server (NTRS)

    Hultgren, Lennart S.; Arechiga, Rene O.

    2016-01-01

    Contributions from the combustor to the overall propulsion noise of civilian transport aircraft are starting to become important due to turbofan design trends and expected advances in mitigation of other noise sources. During on-ground, static-engine acoustic tests, combustor noise is generally sub-dominant to other engine noise sources because of the absence of in-flight effects. Consequently, noise-source separation techniques are needed to extract combustor-noise information from the total noise signature in order to further progress. A novel four-signal source-separation method is applied to data from a static, full-scale engine test and compared to previous methods. The new method is, in a sense, a combination of two- and three-signal techniques and represents an attempt to alleviate some of the weaknesses of each of those approaches. This work is supported by the NASA Advanced Air Vehicles Program, Advanced Air Transport Technology Project, Aircraft Noise Reduction Subproject and the NASA Glenn Faculty Fellowship Program.

  13. The assessment of source attribution of soil pollution in a typical e-waste recycling town and its surrounding regions using the combined organic and inorganic dataset.

    PubMed

    Luo, Jie; Qi, Shihua; Xie, Xianming; Gu, X W Sophie; Wang, Jinji

    2017-01-01

    Guiyu is a well-known electronic waste dismantling and recycling town in south China. Concentrations and distribution of the 21 mineral elements and 16 polycyclic aromatic hydrocarbons (PAHs) collected there were evaluated. Principal component analyses (PCA) applied to the data matrix of PAHs in the soil extracted three major factors explaining 85.7% of the total variability identified as traffic emission, coal combustion, and an unidentified source. By using metallic or metalloid element concentrations as variables, five principal components (PCs) were identified and accounted for 70.4% of the information included in the initial data matrix, which can be denoted as e-waste dismantling-related contamination, two different geological origins, anthropogenic influenced source, and marine aerosols. Combining the 21 metallic and metalloid element datasets with the 16 PAH concentrations can narrow down the coarse source and decrease the unidentified contribution to soil in the present study and therefore effectively assists the source identification process.

  14. Entropic Profiler – detection of conservation in genomes using information theory

    PubMed Central

    Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

    2009-01-01

    Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538

  15. Essential oils (EOs), pressurized liquid extracts (PLE) and carbon dioxide supercritical fluid extracts (SFE-CO2) from Algerian Thymus munbyanus as valuable sources of antioxidants to be used on an industrial level.

    PubMed

    Bendif, Hamdi; Adouni, Khaoula; Miara, Mohamed Djamel; Baranauskienė, Renata; Kraujalis, Paulius; Venskutonis, Petras Rimantas; Nabavi, Seyed Mohammad; Maggi, Filippo

    2018-09-15

    The aim of this study was to demonstrate the potential of extracts from Algerian Thymus munbyanus as a valuable source of antioxidants for use on an industrial level. To this end, a study was conducted on the composition and antioxidant activities of essential oils (EOs), pressurized liquid extracts (PLE) and supercritical fluid extracts (SFE-CO 2 ) obtained from Thymus munbyanus subsp. coloratus (TMC) and subsp. munbyanus (TMM). EOs and SFE-CO 2 extracts were analysed by GC-FID and GC×GC-TOFMS revealing significant differences. A successive extraction of the solid SFE-CO 2 residue by PLE extraction with solvents of increasing polarity such as acetone, ethanol and water, was carried out. The extracts were evaluated for total phenolic content by Folin-Ciocalteu assay, while the antioxidant power was assessed by DPPH, FRAP, and ORAC assays. SFE-CO 2 extracts were also analysed for their tocopherol content. The antioxidant activity of PLE extracts was found to be higher than that of SFE-CO 2 extracts, and this increased with solvent polarity (water > ethanol > acetone). Overall, these results support the use of T. munbyanus as a valuable source of substances to be used on an industrial level as preservative agents. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Thalamic and cortical pathways supporting auditory processing

    PubMed Central

    Lee, Charles C.

    2012-01-01

    The neural processing of auditory information engages pathways that begin initially at the cochlea and that eventually reach forebrain structures. At these higher levels, the computations necessary for extracting auditory source and identity information rely on the neuroanatomical connections between the thalamus and cortex. Here, the general organization of these connections in the medial geniculate body (thalamus) and the auditory cortex is reviewed. In addition, we consider two models organizing the thalamocortical pathways of the non-tonotopic and multimodal auditory nuclei. Overall, the transfer of information to the cortex via the thalamocortical pathways is complemented by the numerous intracortical and corticocortical pathways. Although interrelated, the convergent interactions among thalamocortical, corticocortical, and commissural pathways enable the computations necessary for the emergence of higher auditory perception. PMID:22728130

  17. ABSOLUTE BUNCH LENGTH MEASUREMENTS AT THE ALS BY INCOHERENTSYNCHROTRON RADIATION FLUCTUATION ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sannibale, Fernando; Zolotorev, Max S.; Filippetto, Daniele

    2007-06-22

    By analysing the pulse to pulse intensity fluctuations ofthe radiation emitted by a charge particle in the incoherent part of thespectrum, it is possible to extract information about the spatialdistribution of the beam. At the Advanced Light Source (ALS) of theLawrence Berkeley National Laboratory, we have developed and tested asimple scheme based on this principle that allows for the absolutemeasurement of the bunch length. A description of the method and theexperimental results are presented.

  18. A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports.

    PubMed

    Liu, Xiao; Chen, Hsinchun

    2015-12-01

    Social media offer insights of patients' medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work. Published by Elsevier Inc.

  19. Irrigation network extraction methodology from LiDAR DTM using Whitebox and ArcGIS

    NASA Astrophysics Data System (ADS)

    Mahor, M. A. P.; De La Cruz, R. M.; Olfindo, N. T.; Perez, A. M. C.

    2016-10-01

    Irrigation networks are important in distributing water resources to areas where rainfall is not enough to sustain agriculture. They are also crucial when it comes to being able to redirect vast amounts of water to decrease the risks of flooding in flat areas, especially near sources of water. With the lack of studies about irrigation feature extraction, which range from wide canals to small ditches, this study aims to present a method of extracting these features from LiDAR-derived digital terrain models (DTMs) using Geographic Information Systems (GIS) tools such as ArcGIS and Whitebox Geospatial Analysis Tools (Whitebox GAT). High-resolution LiDAR DTMs with 1-meter horizontal and 0.25-meter vertical accuracies were processed to generate the gully depth map. This map was then reclassified, converted to vector, and filtered according to segment length, and sinuosity to be able to isolate these irrigation features. Initial results in the test area show that the extraction completeness is greater than 80% when compared with data obtained from the National Irrigation Administration (NIA).

  20. Human-machine interaction to disambiguate entities in unstructured text and structured datasets

    NASA Astrophysics Data System (ADS)

    Ward, Kevin; Davenport, Jack

    2017-05-01

    Creating entity network graphs is a manual, time consuming process for an intelligence analyst. Beyond the traditional big data problems of information overload, individuals are often referred to by multiple names and shifting titles as they advance in their organizations over time which quickly makes simple string or phonetic alignment methods for entities insufficient. Conversely, automated methods for relationship extraction and entity disambiguation typically produce questionable results with no way for users to vet results, correct mistakes or influence the algorithm's future results. We present an entity disambiguation tool, DRADIS, which aims to bridge the gap between human-centric and machinecentric methods. DRADIS automatically extracts entities from multi-source datasets and models them as a complex set of attributes and relationships. Entities are disambiguated across the corpus using a hierarchical model executed in Spark allowing it to scale to operational sized data. Resolution results are presented to the analyst complete with sourcing information for each mention and relationship allowing analysts to quickly vet the correctness of results as well as correct mistakes. Corrected results are used by the system to refine the underlying model allowing analysts to optimize the general model to better deal with their operational data. Providing analysts with the ability to validate and correct the model to produce a system they can trust enables them to better focus their time on producing higher quality analysis products.

  1. Use of corn steep liquor as an economical nitrogen source for biosuccinic acid production by Actinobacillus succinogenes

    NASA Astrophysics Data System (ADS)

    Tan, J. P.; Jahim, J. M.; Wu, T. Y.; Harun, S.; Mumtaz, T.

    2016-06-01

    Expensive raw materials are the driving force that leads to the shifting of the petroleum-based succinic acid production into bio-based succinic acid production by microorganisms. Cost of fermentation medium is among the main factors contributing to the total production cost of bio-succinic acid. After carbon source, nitrogen source is the second largest component of the fermentation medium, the cost of which has been overlooked for the past years. The current study aimed at replacing yeast extract- a costly nitrogen source with corn steep liquor for economical production of bio-succinic acid by Actinobacillus succinogenes 130Z. In this study, a final succinic acid concentration of 20.6 g/L was obtained from the use of corn steep liquor as the nitrogen source, which was comparable with the use of yeast extract as the nitrogen source that had a final succinate concentration of 21.4 g/l. In terms of economical wise, corn steep liquor was priced at 200 /ton, which was one fifth of the cost of yeast extract at 1000 /ton. Therefore, corn steep liquor can be considered as a potential nitrogen source in biochemical industries instead of the costly yeast extract.

  2. Deuterium results at the negative ion source test facility ELISE

    NASA Astrophysics Data System (ADS)

    Kraus, W.; Wünderlich, D.; Fantz, U.; Heinemann, B.; Bonomo, F.; Riedl, R.

    2018-05-01

    The ITER neutral beam system will be equipped with large radio frequency (RF) driven negative ion sources, with a cross section of 0.9 m × 1.9 m, which have to deliver extracted D- ion beams of 57 A at 1 MeV for 1 h. On the extraction from a large ion source experiment test facility, a source of half of this size is being operational since 2013. The goal of this experiment is to demonstrate a high operational reliability and to achieve the extracted current densities and beam properties required for ITER. Technical improvements of the source design and the RF system were necessary to provide reliable operation in steady state with an RF power of up to 300 kW. While in short pulses the required D- current density has almost been reached, the performance in long pulses is determined in particular in Deuterium by inhomogeneous and unstable currents of co-extracted electrons. By application of refined caesium evaporation and distribution procedures, and reduction and symmetrization of the electron currents, considerable progress has been made and up to 190 A/m2 D-, corresponding to 66% of the value required for ITER, have been extracted for 45 min.

  3. Tweets and Facebook Posts, the Novelty Techniques in the Creation of Origin-Destination Models

    NASA Astrophysics Data System (ADS)

    Malema, H. K.; Musakwa, W.

    2016-06-01

    Social media and big data have emerged to be a useful source of information that can be used for planning purposes, particularly transportation planning and trip-distribution studies. Cities in developing countries such as South Africa often struggle with out-dated, unreliable and cumbersome techniques such as traffic counts and household surveys to conduct origin and destination studies. The emergence of ubiquitous crowd sourced data, big data, social media and geolocation based services has shown huge potential in providing useful information for origin and destination studies. Perhaps such information can be utilised to determine the origin and destination of commuters using the Gautrain, a high-speed railway in Gauteng province South Africa. To date little is known about the origins and destinations of Gautrain commuters. Accordingly, this study assesses the viability of using geolocation-based services namely Facebook and Twitter in mapping out the network movements of Gautrain commuters. Explorative Spatial Data Analysis (ESDA), Echo-social and ArcGis software were used to extract social media data, i.e. tweets and Facebook posts as well as to visualize the concentration of Gautrain commuters. The results demonstrate that big data and geolocation based services have the significant potential to predict movement network patterns of commuters and this information can thus, be used to inform and improve transportation planning. Nevertheless use of crowd sourced data and big data has privacy concerns that still need to be addressed.

  4. Information extraction system

    DOEpatents

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  5. Neutron sources for investigations on extracted beams in Russia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aksenov, V. L.

    An overview is presented of the current status and prospects for the development of neutron sources intended for investigations on extracted beams in Russia. The participation of Russia in international scientific organizations is demonstrated.

  6. Prompt fission gamma-ray emission spectral data for 239Pu(n,f) using fast directional neutrons from the LICORNE neutron source

    NASA Astrophysics Data System (ADS)

    Qi, L.; Wilson, J. N.; Lebois, M.; Al-Adili, A.; Chatillon, A.; Choudhury, D.; Gatera, A.; Georgiev, G.; Göök, A.; Laurent, B.; Maj, A.; Matea, I.; Oberstedt, A.; Oberstedt, S.; Rose, S. J.; Schmitt, C.; Wasilewska, B.; Zeiser, F.

    2018-03-01

    Prompt fission gamma-ray spectra (PFGS) have been measured for the 239Pu(n,f) reaction using fast neutrons at Ēn=1.81 MeV produced by the LICORNE directional neutron source. The setup makes use of LaBr3 scintillation detectors and PARIS phoswich detectors to measure the emitted prompt fission gamma rays (PFG). The mean multiplicity, average total energy release per fission and average energy of photons are extracted from the unfolded PFGS. These new measurements provide complementary information to other recent work on thermal neutron induced fission of 239Pu and spontaneous fission of 252Cf.

  7. Progress toward the development and testing of source reconstruction methods for NIF neutron imaging.

    PubMed

    Loomis, E N; Grim, G P; Wilde, C; Wilson, D C; Morgan, G; Wilke, M; Tregillis, I; Merrill, F; Clark, D; Finch, J; Fittinghoff, D; Bower, D

    2010-10-01

    Development of analysis techniques for neutron imaging at the National Ignition Facility is an important and difficult task for the detailed understanding of high-neutron yield inertial confinement fusion implosions. Once developed, these methods must provide accurate images of the hot and cold fuels so that information about the implosion, such as symmetry and areal density, can be extracted. One method under development involves the numerical inversion of the pinhole image using knowledge of neutron transport through the pinhole aperture from Monte Carlo simulations. In this article we present results of source reconstructions based on simulated images that test the methods effectiveness with regard to pinhole misalignment.

  8. Parent experiences and information needs relating to procedural pain in children: a systematic review protocol.

    PubMed

    Gates, Allison; Shave, Kassi; Featherstone, Robin; Buckreus, Kelli; Ali, Samina; Scott, Shannon; Hartling, Lisa

    2017-06-06

    There exist many evidence-based interventions available to manage procedural pain in children and neonates, yet they are severely underutilized. Parents play an important role in the management of their child's pain; however, many do not possess adequate knowledge of how to effectively do so. The purpose of the planned study is to systematically review and synthesize current knowledge of the experiences and information needs of parents with regard to the management of their child's pain and distress related to medical procedures in the emergency department. We will conduct a systematic review using rigorous methods and reporting based on the PRISMA statement. We will conduct a comprehensive search of literature published between 2000 and 2016 reporting on parents' experiences and information needs with regard to helping their child manage procedural pain and distress. Ovid MEDLINE, Ovid PsycINFO, CINAHL, and PubMed will be searched. We will also search reference lists of key studies and gray literature sources. Two reviewers will screen the articles following inclusion criteria defined a priori. One reviewer will then extract the data from each article following a data extraction form developed by the study team. The second reviewer will check the data extraction for accuracy and completeness. Any disagreements with regard to study inclusion or data extraction will be resolved via discussion. Data from qualitative studies will be summarized thematically, while those from quantitative studies will be summarized narratively. The second reviewer will confirm the overarching themes resulting from the qualitative and quantitative data syntheses. The Critical Appraisal Skills Programme Qualitative Research Checklist and the Quality Assessment Tool for Quantitative Studies will be used to assess the quality of the evidence from each included study. To our knowledge, no published review exists that comprehensively reports on the experiences and information needs of parents related to the management of their child's procedural pain and distress. A systematic review of parents' experiences and information needs will help to inform strategies to empower them with the knowledge necessary to ensure their child's comfort during a painful procedure. PROSPERO CRD42016043698.

  9. Characterization of rhamnolipids by liquid chromatography/mass spectrometry after solid-phase extraction.

    PubMed

    Behrens, Beate; Engelen, Jeannine; Tiso, Till; Blank, Lars Mathias; Hayen, Heiko

    2016-04-01

    Rhamnolipids are surface-active agents with a broad application potential that are produced in complex mixtures by bacteria of the genus Pseudomonas. Analysis from fermentation broth is often characterized by laborious sample preparation and requires hyphenated analytical techniques like liquid chromatography coupled to mass spectrometry (LC-MS) to obtain detailed information about sample composition. In this study, an analytical procedure based on chromatographic method development and characterization of rhamnolipid sample material by LC-MS as well as a comparison of two sample preparation methods, i.e., liquid-liquid extraction and solid-phase extraction, is presented. Efficient separation was achieved under reversed-phase conditions using a mixed propylphenyl and octadecylsilyl-modified silica gel stationary phase. LC-MS/MS analysis of a supernatant from Pseudomonas putida strain KT2440 pVLT33_rhlABC grown on glucose as sole carbon source and purified by solid-phase extraction revealed a total of 20 congeners of di-rhamnolipids, mono-rhamnolipids, and their biosynthetic precursors 3-(3-hydroxyalkanoyloxy)alkanoic acids (HAAs) with different carbon chain lengths from C8 to C14, including three rhamnolipids with uncommon C9 and C11 fatty acid residues. LC-MS and the orcinol assay were used to evaluate the developed solid-phase extraction method in comparison with the established liquid-liquid extraction. Solid-phase extraction exhibited higher yields and reproducibility as well as lower experimental effort.

  10. [The influence of stinging nettle (Urtica dioica L.) extracts on the activity of catalase in THP1 monocytes/macrophages].

    PubMed

    Wolska, Jolanta; Janda, Katarzyna; Szkyrpan, Sylwia; Gutowska, Izabela

    2015-01-01

    Stinging nettle (Urtica dioicd L.) is one of the most valuable plants used in phytotherapy. The herbal raw material is a herb (Urticae herba), leaves (Urticae folium), roots (Urticae radix) and seeds (Urticae semina). This plant is a good source of vitamins, minerals, fibre, protein and biologically active compounds with antioxidant properties. The literature provides limited information about the chemical composition and properties of the seed heads. No papers are available on the effect of extracts of this plant on catalase activity in human cells. The aim of this study was to investigate the impact of stinging nettle (Urtica dioica L.) extracts on the antioxidant activity of catalase in THP1 macrophages. Two types of extracts: water and alcohol, at two different concentrations, were used in experiments. Nettle was collected in September and October in 2012 in the area of Szczecin. The collected plant material was frozen and lyophilized. After those procedures water and alcohol extracts of nettle were prepared and then added to THP1 cells. The antioxidant activity of catalase was established with the spectrophotometric method. The study showed that both extracts (water and alcohol) significantly increased the antioxidant activity of catalase in THP1 cells. The increase in catalase was directly proportional to the concentration of the added alcohol extract.

  11. Information Pre-Processing using Domain Meta-Ontology and Rule Learning System

    NASA Astrophysics Data System (ADS)

    Ranganathan, Girish R.; Biletskiy, Yevgen

    Around the globe, extraordinary amounts of documents are being created by Enterprises and by users outside these Enterprises. The documents created in the Enterprises constitute the main focus of the present chapter. These documents are used to perform numerous amounts of machine processing. While using thesedocuments for machine processing, lack of semantics of the information in these documents may cause misinterpretation of the information, thereby inhibiting the productiveness of computer assisted analytical work. Hence, it would be profitable to the Enterprises if they use well defined domain ontologies which will serve as rich source(s) of semantics for the information in the documents. These domain ontologies can be created manually, semi-automatically or fully automatically. The focus of this chapter is to propose an intermediate solution which will enable relatively easy creation of these domain ontologies. The process of extracting and capturing domain ontologies from these voluminous documents requires extensive involvement of domain experts and application of methods of ontology learning that are substantially labor intensive; therefore, some intermediate solutions which would assist in capturing domain ontologies must be developed. This chapter proposes a solution in this direction which involves building a meta-ontology that will serve as an intermediate information source for the main domain ontology. This chapter proposes a solution in this direction which involves building a meta-ontology as a rapid approach in conceptualizing a domain of interest from huge amount of source documents. This meta-ontology can be populated by ontological concepts, attributes and relations from documents, and then refined in order to form better domain ontology either through automatic ontology learning methods or some other relevant ontology building approach.

  12. Automated extraction of chemical structure information from digital raster images

    PubMed Central

    Park, Jungkap; Rosania, Gus R; Shedden, Kerby A; Nguyen, Mandee; Lyu, Naesung; Saitou, Kazuhiro

    2009-01-01

    Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles. PMID:19196483

  13. Seismoelectric data processing for surface surveys of shallow targets

    USGS Publications Warehouse

    Haines, S.S.; Guitton, A.; Biondi, B.

    2007-01-01

    The utility of the seismoelectric method relies on the development of methods to extract the signal of interest from background and source-generated coherent noise that may be several orders-of-magnitude stronger. We compare data processing approaches to develop a sequence of preprocessing and signal/noise separation and to quantify the noise level from which we can extract signal events. Our preferred sequence begins with the removal of power line harmonic noise and the use of frequency filters to minimize random and source-generated noise. Mapping to the linear Radon domain with an inverse process incorporating a sparseness constraint provides good separation of signal from noise, though it is ineffective on noise that shows the same dip as the signal. Similarly, the seismoelectric signal and noise do not separate cleanly in the Fourier domain, so f-k filtering can not remove all of the source-generated noise and it also disrupts signal amplitude patterns. We find that prediction-error filters provide the most effective method to separate signal and noise, while also preserving amplitude information, assuming that adequate pattern models can be determined for the signal and noise. These Radon-domain and prediction-error-filter methods successfully separate signal from <33 dB stronger noise in our test data. ?? 2007 Society of Exploration Geophysicists.

  14. Comparison of CO2 Emissions Data for 30 Cities from Different Sources

    NASA Astrophysics Data System (ADS)

    Nakagawa, Y.; Koide, D.; Ito, A.; Saito, M.; Hirata, R.

    2017-12-01

    Many sources suggest that cities account for a large proportion of global anthropogenic greenhouse gas emissions. Therefore, in search for the best ways to reduce total anthropogenic greenhouse gas emissions, a focus on the city emission is crucial. In this study, we collected CO2 emissions data in 30 cities during 1990-2015 and evaluated the degree of variance between data sources. The CO2 emissions data were obtained from academic papers, municipal reports, and high-resolution emissions maps (CIDIACv2016, EDGARv4.2, ODIACv2016, and FFDASv2.0). To extract urban CO2 emissions from the high-resolution emissions maps, urban fraction ranging from 0 to 1 was calculated for each 1×1 degree grid cell using the global land cover data (SYNMAP). Total CO2 emissions from the grid cells in which urban fraction occupies greater than or equal to 0.9 were regarded as urban CO2 emissions. The estimated CO2 emissions varied greatly depending on the information sources, even in the same year. There was a large difference between CO2 emissions collected from academic papers, municipal reports, and those extracted from high-resolution emissions maps. One reason is that they use different city boundaries. That is, the city proper (i.e. the political city boundary) is often defined as the city boundary in academic papers and municipal reports, whereas the urban area is used in the high-resolution emissions maps. Furthermore, there was a large variation in CO2 emissions collected from academic papers and municipal reports. These differences may be due to the difference in the assumptions such as allocation ratio of CO2 emissions to producers and consumers. In general, the consumption-based assignment of emissions gives higher estimates of urban CO2 emission in comparison with production-based assignment. Furthermore, there was also a large variation in CO2 emissions extracted from high-resolution emissions maps. This difference would be attributable to differences in information used in the spatial disaggregation of emissions. To identify the CO2 emissions from cities, it is necessary to determine common definitions of city boundaries, allocation ratio of CO2 emissions to consumption and production, and refined approach of the spatial disaggregation of CO2 emissions in high-resolution emissions maps.

  15. A novel method for assessing chronic cortisol concentrations in dogs using the nail as a source.

    PubMed

    Mack, Z; Fokidis, H B

    2017-04-01

    Cortisol, a glucocorticoid secreted in response to stress, is used to assess adrenal function and mental health in clinical settings. Current methods assess cortisol sources that reflect short-term secretion that can vary with current stress state. Here, we present a novel method for the extraction and quantification of cortisol from the dog nail using solid phase extraction coupled to enzyme-linked immunosorbent assay. Validation experiments demonstrated accuracy (r = 0.836, P < 0.001) precision (15.1% coefficients of variation), and repeatability (14.4% coefficients of variation) with this method. Furthermore, nail cortisol concentrations were positively correlated to an established hair cortisol method (r = 0.736, P < 0.001). Nail cortisol concentrations did not differ with dog sex, breed, age, or weights; however, sample size limitations may preclude statistical significance. Nail cortisol may provide information on cortisol secretion integrated over the time corresponding to nail growth and may be useful as a tool for diagnosing stress and adrenal disorders in dogs. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Joint source based analysis of multiple brain structures in studying major depressive disorder

    NASA Astrophysics Data System (ADS)

    Ramezani, Mahdi; Rasoulian, Abtin; Hollenstein, Tom; Harkness, Kate; Johnsrude, Ingrid; Abolmaesumi, Purang

    2014-03-01

    We propose a joint Source-Based Analysis (jSBA) framework to identify brain structural variations in patients with Major Depressive Disorder (MDD). In this framework, features representing position, orientation and size (i.e. pose), shape, and local tissue composition are extracted. Subsequently, simultaneous analysis of these features within a joint analysis method is performed to generate the basis sources that show signi cant di erences between subjects with MDD and those in healthy control. Moreover, in a cross-validation leave- one-out experiment, we use a Fisher Linear Discriminant (FLD) classi er to identify individuals within the MDD group. Results show that we can classify the MDD subjects with an accuracy of 76% solely based on the information gathered from the joint analysis of pose, shape, and tissue composition in multiple brain structures.

  17. An object-oriented design for automated navigation of semantic networks inside a medical data dictionary.

    PubMed

    Ruan, W; Bürkle, T; Dudeck, J

    2000-01-01

    In this paper we present a data dictionary server for the automated navigation of information sources. The underlying knowledge is represented within a medical data dictionary. The mapping between medical terms and information sources is based on a semantic network. The key aspect of implementing the dictionary server is how to represent the semantic network in a way that is easier to navigate and to operate, i.e. how to abstract the semantic network and to represent it in memory for various operations. This paper describes an object-oriented design based on Java that represents the semantic network in terms of a group of objects. A node and its relationships to its neighbors are encapsulated in one object. Based on such a representation model, several operations have been implemented. They comprise the extraction of parts of the semantic network which can be reached from a given node as well as finding all paths between a start node and a predefined destination node. This solution is independent of any given layout of the semantic structure. Therefore the module, called Giessen Data Dictionary Server can act independent of a specific clinical information system. The dictionary server will be used to present clinical information, e.g. treatment guidelines or drug information sources to the clinician in an appropriate working context. The server is invoked from clinical documentation applications which contain an infobutton. Automated navigation will guide the user to all the information relevant to her/his topic, which is currently available inside our closed clinical network.

  18. Mapping detailed 3D information onto high resolution SAR signatures

    NASA Astrophysics Data System (ADS)

    Anglberger, H.; Speck, R.

    2017-05-01

    Due to challenges in the visual interpretation of radar signatures or in the subsequent information extraction, a fusion with other data sources can be beneficial. The most accurate basis for a fusion of any kind of remote sensing data is the mapping of the acquired 2D image space onto the true 3D geometry of the scenery. In the case of radar images this is a challenging task because the coordinate system is based on the measured range which causes ambiguous regions due to layover effects. This paper describes a method that accurately maps the detailed 3D information of a scene to the slantrange-based coordinate system of imaging radars. Due to this mapping all the contributing geometrical parts of one resolution cell can be determined in 3D space. The proposed method is highly efficient, because computationally expensive operations can be directly performed on graphics card hardware. The described approach builds a perfect basis for sophisticated methods to extract data from multiple complimentary sensors like from radar and optical images, especially because true 3D information from whole cities will be available in the near future. The performance of the developed methods will be demonstrated with high resolution radar data acquired by the space-borne SAR-sensor TerraSAR-X.

  19. Interactive access to LP DAAC satellite data archives through a combination of open-source and custom middleware web services

    USGS Publications Warehouse

    Davis, Brian N.; Werpy, Jason; Friesz, Aaron M.; Impecoven, Kevin; Quenzer, Robert; Maiersperger, Tom; Meyer, David J.

    2015-01-01

    Current methods of searching for and retrieving data from satellite land remote sensing archives do not allow for interactive information extraction. Instead, Earth science data users are required to download files over low-bandwidth networks to local workstations and process data before science questions can be addressed. New methods of extracting information from data archives need to become more interactive to meet user demands for deriving increasingly complex information from rapidly expanding archives. Moving the tools required for processing data to computer systems of data providers, and away from systems of the data consumer, can improve turnaround times for data processing workflows. The implementation of middleware services was used to provide interactive access to archive data. The goal of this middleware services development is to enable Earth science data users to access remote sensing archives for immediate answers to science questions instead of links to large volumes of data to download and process. Exposing data and metadata to web-based services enables machine-driven queries and data interaction. Also, product quality information can be integrated to enable additional filtering and sub-setting. Only the reduced content required to complete an analysis is then transferred to the user.

  20. The negative hydrogen Penning ion gauge ion source for KIRAMS-13 cyclotron

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    An, D. H.; Jung, I. S.; Kang, J.

    2008-02-15

    The cold-cathode-type Penning ion gauge (PIG) ion source for the internal ion source of KIRAMS-13 cyclotron has been used for generation of negative hydrogen ions. The dc H-beam current of 650 {mu}A from the PIG ion source with the Dee voltage of 40 kV and arc current of 1.0 A is extrapolated from the measured dc extraction beam currents at the low extraction dc voltages. The output optimization of PIG ion source in the cyclotron has been carried out by using various chimneys with different sizes of the expansion gap between the plasma boundary and the chimney wall. This papermore » presents the results of the dc H-extraction measurement and the expansion gap experiment.« less

  1. Beam commission of the high intensity proton source developed at INFN-LNS for the European Spallation Source

    NASA Astrophysics Data System (ADS)

    Neri, L.; Celona, L.; Gammino, S.; Miraglia, A.; Leonardi, O.; Castro, G.; Torrisi, G.; Mascali, D.; Mazzaglia, M.; Allegra, L.; Amato, A.; Calabrese, G.; Caruso, A.; Chines, F.; Gallo, G.; Longhitano, A.; Manno, G.; Marletta, S.; Maugeri, A.; Passarello, S.; Pastore, G.; Seminara, A.; Spartà, A.; Vinciguerra, S.

    2017-07-01

    At the Istituto Nazionale di Fisica Nucleare - Laboratori Nazionali del Sud (INFN-LNS) the beam commissioning of the high intensity Proton Source for the European Spallation Source (PS-ESS) started in November 2016. Beam stability at high current intensity is one of the most important parameter for the first steps of the ongoing commissioning. Promising results were obtained since the first source start with a 6 mm diameter extraction hole. The increase of the extraction hole to 8 mm allowed improving PS-ESS performances and obtaining the values required by the ESS accelerator. In this work, extracted beam current characteristics together with Doppler shift and emittance measurements are presented, as well as the description of the next phases before the installation at ESS in Lund.

  2. Phytochemical screening and in vitro bioactivities of the extracts of aerial part of Boerhavia diffusa Linn.

    PubMed Central

    Apu, Apurba Sarker; Liza, Mahmuda Sultana; Jamaluddin, A.T.M.; Howlader, Md. Amran; Saha, Repon Kumer; Rizwan, Farhana; Nasrin, Nishat

    2012-01-01

    Objective To investigate the bioactivities of crude n-hexane, ethyl acetate and methanol extracts of aerial part of Boerhavia diffusa Linn. (B. diffusa) and its phytochemical analysis. Methods The identification of phytoconstituents and assay of antioxidant, thrombolytic, cytotoxic, antimicrobial activities were conducted using specific standard in vitro procedures. Results The results showed that the plant extracts were a rich source of phytoconstituents. Methanol extract showed higher antioxidant, thrombolytic activity and less cytotoxic activity than those of n-hexane and ethyl acetate extracts of B. diffusa. Among the bioactivities, antioxidant activity was the most notable compared to the positive control and thus could be a potential rich source of natural antioxidant. In case of antimicrobial screening, crude extracts of the plant showed remarkable antibacterial activity against tested microorganisms. All the extracts showed significant inhibitory activity against Candida albicuns, at a concentration of 1000 µg/disc. Conclusions The present findings suggest that, the plant widely available in Bangladesh, could be a prominent source of medicinally important natural compounds. PMID:23569993

  3. Comparative study of mobility extraction methods in p-type polycrystalline silicon thin film transistors

    NASA Astrophysics Data System (ADS)

    Liu, Kai; Liu, Yuan; Liu, Yu-Rong; En, Yun-Fei; Li, Bin

    2017-07-01

    Channel mobility in the p-type polycrystalline silicon thin film transistors (poly-Si TFTs) is extracted using Hoffman method, linear region transconductance method and multi-frequency C-V method. Due to the non-negligible errors when neglecting the dependence of gate-source voltage on the effective mobility, the extracted mobility results are overestimated using linear region transconductance method and Hoffman method, especially in the lower gate-source voltage region. By considering of the distribution of localized states in the band-gap, the frequency independent capacitance due to localized charges in the sub-gap states and due to channel free electron charges in the conduction band were extracted using multi-frequency C-V method. Therefore, channel mobility was extracted accurately based on the charge transport theory. In addition, the effect of electrical field dependent mobility degradation was also considered in the higher gate-source voltage region. In the end, the extracted mobility results in the poly-Si TFTs using these three methods are compared and analyzed.

  4. Hybrid Automatic Building Interpretation System

    NASA Astrophysics Data System (ADS)

    Pakzad, K.; Klink, A.; Müterthies, A.; Gröger, G.; Stroh, V.; Plümer, L.

    2011-09-01

    HABIS (Hybrid Automatic Building Interpretation System) is a system for an automatic reconstruction of building roofs used in virtual 3D building models. Unlike most of the commercially available systems, HABIS is able to work to a high degree automatically. The hybrid method uses different sources intending to exploit the advantages of the particular sources. 3D point clouds usually provide good height and surface data, whereas spatial high resolution aerial images provide important information for edges and detail information for roof objects like dormers or chimneys. The cadastral data provide important basis information about the building ground plans. The approach used in HABIS works with a multi-stage-process, which starts with a coarse roof classification based on 3D point clouds. After that it continues with an image based verification of these predicted roofs. In a further step a final classification and adjustment of the roofs is done. In addition some roof objects like dormers and chimneys are also extracted based on aerial images and added to the models. In this paper the used methods are described and some results are presented.

  5. Applying traditional signal processing techniques to social media exploitation for situational understanding

    NASA Astrophysics Data System (ADS)

    Abdelzaher, Tarek; Roy, Heather; Wang, Shiguang; Giridhar, Prasanna; Al Amin, Md. Tanvir; Bowman, Elizabeth K.; Kolodny, Michael A.

    2016-05-01

    Signal processing techniques such as filtering, detection, estimation and frequency domain analysis have long been applied to extract information from noisy sensor data. This paper describes the exploitation of these signal processing techniques to extract information from social networks, such as Twitter and Instagram. Specifically, we view social networks as noisy sensors that report events in the physical world. We then present a data processing stack for detection, localization, tracking, and veracity analysis of reported events using social network data. We show using a controlled experiment that the behavior of social sources as information relays varies dramatically depending on context. In benign contexts, there is general agreement on events, whereas in conflict scenarios, a significant amount of collective filtering is introduced by conflicted groups, creating a large data distortion. We describe signal processing techniques that mitigate such distortion, resulting in meaningful approximations of actual ground truth, given noisy reported observations. Finally, we briefly present an implementation of the aforementioned social network data processing stack in a sensor network analysis toolkit, called Apollo. Experiences with Apollo show that our techniques are successful at identifying and tracking credible events in the physical world.

  6. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    PubMed

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

    PubMed

    Qiu, John X; Yoon, Hong-Jun; Fearn, Paul A; Tourassi, Georgia D

    2018-01-01

    Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. In this study, we investigated deep learning and a convolutional neural network (CNN), for extracting ICD-O-3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro- and macro-F score increases of up to 0.132 and 0.226, respectively, when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on the CNN method and cancer site. These encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.

  8. How the variance of some extraction variables may affect the quality of espresso coffees served in coffee shops.

    PubMed

    Severini, Carla; Derossi, Antonio; Fiore, Anna G; De Pilli, Teresa; Alessandrino, Ofelia; Del Mastro, Arcangela

    2016-07-01

    To improve the quality of espresso coffee, the variables under the control of the barista, such as grinding grade, coffee quantity and pressure applied to the coffee cake, as well as their variance, are of great importance. A nonlinear mixed effect modeling was used to obtain information on the changes in chemical attributes of espresso coffee (EC) as a function of the variability of extraction conditions. During extraction, the changes in volume were well described by a logistic model, whereas the chemical attributes were better fit by a first-order kinetic. The major source of information was contained in the grinding grade, which accounted for 87-96% of the variance of the experimental data. The variability of the grinding produced changes in caffeine content in the range of 80.03 mg and 130.36 mg when using a constant grinding grade of 6.5. The variability in volume and chemical attributes of EC is large. Grinding had the most important effect as the variability in particle size distribution observed for each grinding level had a profound effect on the quality of EC. Standardization of grinding would be of crucial importance for obtaining all espresso coffees with a high quality. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  9. Screening of cocaine and its metabolites in human urine samples by direct analysis in real-time source coupled to time-of-flight mass spectrometry after online preconcentration utilizing microextraction by packed sorbent.

    PubMed

    Jagerdeo, Eshwar; Abdel-Rehim, Mohamed

    2009-05-01

    Microextraction by packed sorbent (MEPS) has been evaluated for fast screening of drugs of abuse with mass spectrometric detection. In this study, C8 (octyl-silica, useful for nonpolar to moderately polar compounds), ENV(+) (hydroxylated polystyrene-divinylbenzene copolymer, for extraction of aliphatic and aromatic polar compounds), Oasis MCX (sulfonic-poly(divinylbenzene-co-N-polyvinyl-pyrrolidone) copolymer), and Clean Screen DAU (mixed mode, ion exchanger for acidic and basic compounds) were used as sorbents for the MEPS. The focus was on fast extraction and preconcentration of the drugs with rapid analysis using a time-of-flight (TOF) mass spectrometer as the detector with direct analysis in a real-time (DART) source. The combination of an analysis time of less than 1 min and accurate mass of the first monoisotopic peak of the analyte and the relative abundances of the peaks in the isotopic clusters provided reliable information for identification. Furthermore, the study sought to demonstrate that it is possible to quantify the analyte of interest using a DART source when an internal standard is used. Of all the sorbents used in the study, Clean Screen DAU performed best for extraction of the analytes from urine. Using Clean Screen DAU to extract spiked samples containing the drugs, linearity was demonstrated for ecgonine methyl ester, benzoylecgonine, cocaine, and cocaethylene with average ranges of: 65-910, 75-1100, 95-1200, and 75-1100 ng/mL (n = 5), respectively. The limits of detection (LOD) for ecgonine methyl ester, benzoylecgonine, cocaine, and cocaethylene were 22.9 ng/mL, 23.7 ng/mL, 4.0 ng/mL, and 9.8 ng/mL respectively, using a signal-to-noise ratio of 3:1.

  10. 40 CFR 439.21 - Special definitions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... STANDARDS PHARMACEUTICAL MANUFACTURING POINT SOURCE CATEGORY Extraction Products § 439.21 Special definitions. For the purpose of this subpart: (a) Extraction means process operations that derive pharmaceutically active ingredients from natural sources such as plant roots and leaves, animal glands, and...

  11. 40 CFR 439.21 - Special definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... STANDARDS PHARMACEUTICAL MANUFACTURING POINT SOURCE CATEGORY Extraction Products § 439.21 Special definitions. For the purpose of this subpart: (a) Extraction means process operations that derive pharmaceutically active ingredients from natural sources such as plant roots and leaves, animal glands, and...

  12. Disaster Emergency Rapid Assessment Based on Remote Sensing and Background Data

    NASA Astrophysics Data System (ADS)

    Han, X.; Wu, J.

    2018-04-01

    The period from starting to the stable conditions is an important stage of disaster development. In addition to collecting and reporting information on disaster situations, remote sensing images by satellites and drones and monitoring results from disaster-stricken areas should be obtained. Fusion of multi-source background data such as population, geography and topography, and remote sensing monitoring information can be used in geographic information system analysis to quickly and objectively assess the disaster information. According to the characteristics of different hazards, the models and methods driven by the rapid assessment of mission requirements are tested and screened. Based on remote sensing images, the features of exposures quickly determine disaster-affected areas and intensity levels, and extract key disaster information about affected hospitals and schools as well as cultivated land and crops, and make decisions after emergency response with visual assessment results.

  13. Non-invasive lightweight integration engine for building EHR from autonomous distributed systems.

    PubMed

    Angulo, Carlos; Crespo, Pere; Maldonado, José A; Moner, David; Pérez, Daniel; Abad, Irene; Mandingorra, Jesús; Robles, Montserrat

    2007-12-01

    In this paper we describe Pangea-LE, a message-oriented lightweight data integration engine that allows homogeneous and concurrent access to clinical information from disperse and heterogeneous data sources. The engine extracts the information and passes it to the requesting client applications in a flexible XML format. The XML response message can be formatted on demand by appropriate Extensible Stylesheet Language (XSL) transformations in order to meet the needs of client applications. We also present a real deployment in a hospital where Pangea-LE collects and generates an XML view of all the available patient clinical information. The information is presented to healthcare professionals in an Electronic Health Record (EHR) viewer Web application with patient search and EHR browsing capabilities. Implantation in a real setting has been a success due to the non-invasive nature of Pangea-LE which respects the existing information systems.

  14. Non-invasive light-weight integration engine for building EHR from autonomous distributed systems.

    PubMed

    Crespo Molina, Pere; Angulo Fernández, Carlos; Maldonado Segura, José A; Moner Cano, David; Robles Viejo, Montserrat

    2006-01-01

    Pangea-LE is a message oriented light-weight integration engine, allowing concurrent access to clinical information from disperse and heterogeneous data sources. The engine extracts the information and serves it to the requester client applications in a flexible XML format. This XML response message can be formatted on demand by the appropriate XSL (Extensible Stylesheet Language) transformation in order to fit client application needs. In this article we present a real use case sample where Pangea-LE collects and generates "on the fly" a structured view of all the patient clinical information available in a healthcare organisation. This information is presented to healthcare professionals in an EHR (Electronic Health Record) viewer Web application with patient search and EHR browsing capabilities. Implantation in a real environment has been a notable success due to the non-invasive method which extremely respects the existing information systems.

  15. A graph-based approach to detect spatiotemporal dynamics in satellite image time series

    NASA Astrophysics Data System (ADS)

    Guttler, Fabio; Ienco, Dino; Nin, Jordi; Teisseire, Maguelonne; Poncelet, Pascal

    2017-08-01

    Enhancing the frequency of satellite acquisitions represents a key issue for Earth Observation community nowadays. Repeated observations are crucial for monitoring purposes, particularly when intra-annual process should be taken into account. Time series of images constitute a valuable source of information in these cases. The goal of this paper is to propose a new methodological framework to automatically detect and extract spatiotemporal information from satellite image time series (SITS). Existing methods dealing with such kind of data are usually classification-oriented and cannot provide information about evolutions and temporal behaviors. In this paper we propose a graph-based strategy that combines object-based image analysis (OBIA) with data mining techniques. Image objects computed at each individual timestamp are connected across the time series and generates a set of evolution graphs. Each evolution graph is associated to a particular area within the study site and stores information about its temporal evolution. Such information can be deeply explored at the evolution graph scale or used to compare the graphs and supply a general picture at the study site scale. We validated our framework on two study sites located in the South of France and involving different types of natural, semi-natural and agricultural areas. The results obtained from a Landsat SITS support the quality of the methodological approach and illustrate how the framework can be employed to extract and characterize spatiotemporal dynamics.

  16. Environmental monitoring and assessment of heavy metals in surface sediments at Coleroon River Estuary in Tamil Nadu, India.

    PubMed

    Venkatramanan, S; Chung, S Y; Ramkumar, T; Selvam, S

    2015-08-01

    The combined studies on grain size distribution, organic matter contents of sediments, sequential extraction and bulk concentration of heavy metals, statistical analysis, and ecological risk assessments were carried out to investigate the contamination sources and ecological risks of surface sediments at Coleroon River Estuary in Tamil Nadu, India. The sequential extraction of metals showed that a larger portion of the metals was associated with the residual phase and also in other fractions. The low concentrations of heavy metals were found in exchangeable and carbonate bounds (bioavailable phases). It revealed that sediments of Coleroon River Estuary were relatively unpolluted and were influenced mainly by natural sources. The observed order of bulk concentrations of heavy metals in the sediments was as follows: Fe > Mn > Zn > Cu > Pb > Cr > Ni > Co. Factor analyses represented that the enrichment of heavy metals was mostly resulted from lithogenic origins associated with anthropogenic sources. These sources were reconfirmed by cluster analysis. Risk assessment code (RAC) suggested that all metals were not harmful in monsoon season. However, Fe was in medium risk, and Mn and Cu were in low risk in summer. According to pollution load index (PLI) of sediments, all heavy metals were toxic. Cu might be related with adverse biological effects on the basis of sediment quality guidelines (SQG) in both seasons. These integrated approaches were very useful to identify the contamination sources and ecological risks of sediments in estuarine environment. It is expected that this research can give a useful information for the remediation of heavy metals in sediments.

  17. Distance biases in the estimation of the physical properties of Hi-GAL compact sources - I. Clump properties and the identification of high-mass star-forming candidates

    NASA Astrophysics Data System (ADS)

    Baldeschi, Adriano; Elia, D.; Molinari, S.; Pezzuto, S.; Schisano, E.; Gatti, M.; Serra, A.; Merello, M.; Benedettini, M.; Di Giorgio, A. M.; Liu, J. S.

    2017-04-01

    The degradation of spatial resolution in star-forming regions, observed at large distances (d ≳ 1 kpc) with Herschel, can lead to estimates of the physical parameters of the detected compact sources (clumps), which do not necessarily mirror the properties of the original population of cores. This paper aims at quantifying the bias introduced in the estimation of these parameters by the distance effect. To do so, we consider Herschel maps of nearby star-forming regions taken from the Herschel Gould Belt survey, and simulate the effect of increased distance to understand what amount of information is lost when a distant star-forming region is observed with Herschel resolution. In the maps displaced to different distances we extract compact sources, and we derive their physical parameters as if they were original Herschel infrared Galactic Plane Survey maps of the extracted source samples. In this way, we are able to discuss how the main physical properties change with distance. In particular, we discuss the ability of clumps to form massive stars: we estimate the fraction of distant sources that are classified as high-mass stars-forming objects due to their position in the mass versus radius diagram, that are only 'false positives'. We also give a threshold for high-mass star formation M>1282 (r/ [pc])^{1.42} M_{⊙}. In conclusion, this paper provides the astronomer dealing with Herschel maps of distant star-forming regions with a set of prescriptions to partially recover the character of the core population in unresolved clumps.

  18. Source Pulse Estimation of Mine Shock by Blind Deconvolution

    NASA Astrophysics Data System (ADS)

    Makowski, R.

    The objective of seismic signal deconvolution is to extract from the signal information concerning the rockmass or the signal in the source of the shock. In the case of blind deconvolution, we have to extract information regarding both quantities. Many methods of deconvolution made use of in prospective seismology were found to be of minor utility when applied to shock-induced signals recorded in the mines of the Lubin Copper District. The lack of effectiveness should be attributed to the inadequacy of the model on which the methods are based, with respect to the propagation conditions for that type of signal. Each of the blind deconvolution methods involves a number of assumptions; hence, only if these assumptions are fulfilled, we may expect reliable results.Consequently, we had to formulate a different model for the signals recorded in the copper mines of the Lubin District. The model is based on the following assumptions: (1) The signal emitted by the sh ock source is a short-term signal. (2) The signal transmitting system (rockmass) constitutes a parallel connection of elementary systems. (3) The elementary systems are of resonant type. Such a model seems to be justified by the geological structure as well as by the positions of the shock foci and seismometers. The results of time-frequency transformation also support the dominance of resonant-type propagation.Making use of the model, a new method for the blind deconvolution of seismic signals has been proposed. The adequacy of the new model, as well as the efficiency of the proposed method, has been confirmed by the results of blind deconvolution. The slight approximation errors obtained with a small number of approximating elements additionally corroborate the adequacy of the model.

  19. #Healthy Selfies: Exploration of Health Topics on Instagram.

    PubMed

    Muralidhara, Sachin; Paul, Michael J

    2018-06-29

    Social media provides a complementary source of information for public health surveillance. The dominate data source for this type of monitoring is the microblogging platform Twitter, which is convenient due to the free availability of public data. Less is known about the utility of other social media platforms, despite their popularity. This work aims to characterize the health topics that are prominently discussed in the image-sharing platform Instagram, as a step toward understanding how this data might be used for public health research. The study uses a topic modeling approach to discover topics in a dataset of 96,426 Instagram posts containing hashtags related to health. We use a polylingual topic model, initially developed for datasets in different natural languages, to model different modalities of data: hashtags, caption words, and image tags automatically extracted using a computer vision tool. We identified 47 health-related topics in the data (kappa=.77), covering ten broad categories: acute illness, alternative medicine, chronic illness and pain, diet, exercise, health care & medicine, mental health, musculoskeletal health and dermatology, sleep, and substance use. The most prevalent topics were related to diet (8,293/96,426; 8.6% of posts) and exercise (7,328/96,426; 7.6% of posts). A large and diverse set of health topics are discussed in Instagram. The extracted image tags were generally too coarse and noisy to be used for identifying posts but were in some cases accurate for identifying images relevant to studying diet and substance use. Instagram shows potential as a source of public health information, though limitations in data collection and metadata availability may limit its use in comparison to platforms like Twitter. ©Sachin Muralidhara, Michael J. Paul. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 29.06.2018.

  20. Large-scale combining signals from both biomedical literature and the FDA Adverse Event Reporting System (FAERS) to improve post-marketing drug safety signal detection

    PubMed Central

    2014-01-01

    Background Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. Results The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels. Conclusions We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance. PMID:24428898

  1. Large-scale combining signals from both biomedical literature and the FDA Adverse Event Reporting System (FAERS) to improve post-marketing drug safety signal detection.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2014-01-15

    Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels. We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance.

  2. Quantification of source impact to PM using three-dimensional weighted factor model analysis on multi-site data

    NASA Astrophysics Data System (ADS)

    Shi, Guoliang; Peng, Xing; Huangfu, Yanqi; Wang, Wei; Xu, Jiao; Tian, Yingze; Feng, Yinchang; Ivey, Cesunica E.; Russell, Armistead G.

    2017-07-01

    Source apportionment technologies are used to understand the impacts of important sources of particulate matter (PM) air quality, and are widely used for both scientific studies and air quality management. Generally, receptor models apportion speciated PM data from a single sampling site. With the development of large scale monitoring networks, PM speciation are observed at multiple sites in an urban area. For these situations, the models should account for three factors, or dimensions, of the PM, including the chemical species concentrations, sampling periods and sampling site information, suggesting the potential power of a three-dimensional source apportionment approach. However, the principle of three-dimensional Parallel Factor Analysis (Ordinary PARAFAC) model does not always work well in real environmental situations for multi-site receptor datasets. In this work, a new three-way receptor model, called "multi-site three way factor analysis" model is proposed to deal with the multi-site receptor datasets. Synthetic datasets were developed and introduced into the new model to test its performance. Average absolute error (AAE, between estimated and true contributions) for extracted sources were all less than 50%. Additionally, three-dimensional ambient datasets from a Chinese mega-city, Chengdu, were analyzed using this new model to assess the application. Four factors are extracted by the multi-site WFA3 model: secondary source have the highest contributions (64.73 and 56.24 μg/m3), followed by vehicular exhaust (30.13 and 33.60 μg/m3), crustal dust (26.12 and 29.99 μg/m3) and coal combustion (10.73 and 14.83 μg/m3). The model was also compared to PMF, with general agreement, though PMF suggested a lower crustal contribution.

  3. Generating disease-pertinent treatment vocabularies from MEDLINE citations.

    PubMed

    Wang, Liqin; Del Fiol, Guilherme; Bray, Bruce E; Haug, Peter J

    2017-01-01

    Healthcare communities have identified a significant need for disease-specific information. Disease-specific ontologies are useful in assisting the retrieval of disease-relevant information from various sources. However, building these ontologies is labor intensive. Our goal is to develop a system for an automated generation of disease-pertinent concepts from a popular knowledge resource for the building of disease-specific ontologies. A pipeline system was developed with an initial focus of generating disease-specific treatment vocabularies. It was comprised of the components of disease-specific citation retrieval, predication extraction, treatment predication extraction, treatment concept extraction, and relevance ranking. A semantic schema was developed to support the extraction of treatment predications and concepts. Four ranking approaches (i.e., occurrence, interest, degree centrality, and weighted degree centrality) were proposed to measure the relevance of treatment concepts to the disease of interest. We measured the performance of four ranks in terms of the mean precision at the top 100 concepts with five diseases, as well as the precision-recall curves against two reference vocabularies. The performance of the system was also compared to two baseline approaches. The pipeline system achieved a mean precision of 0.80 for the top 100 concepts with the ranking by interest. There were no significant different among the four ranks (p=0.53). However, the pipeline-based system had significantly better performance than the two baselines. The pipeline system can be useful for an automated generation of disease-relevant treatment concepts from the biomedical literature. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Accessing and Visualizing Satellite Data for Fisheries Managers in the Northeast Large Marine Ecosystem

    NASA Astrophysics Data System (ADS)

    Young Morse, R.; Mecray, E. L.; Pershing, A. J.

    2015-12-01

    As interest in the global change in temperatures and precipitation patterns grow, federal, state, and local agencies are turning to the delivery of 'actionable science and information' or 'information for decision-makers.' NOAA/National Centers for Environmental Information's Regional Climate Services program builds these bridges between the user of information and the producers of the information. With the Climate Data Records program, this study will present the extraction and use of the sea-surface temperature datasets specifically for access and use by fisheries managers in the north Atlantic. The work demonstrates the staged approach of accessing the records, converting their initial data formats into maps and charts, and the delivery of the data as a value-added information dashboard for use by managers. The questions to be reviewed include the ease of access, the delivery of open source software for visualizing the information, and a discussion on the roles of government and the private sector in the provision of climate information at different scales.

  5. Studies of the beam extraction system of the GTS-LHC electron cyclotron resonance ion source at CERN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Toivanen, V., E-mail: ville.aleksi.toivanen@cern.ch; Küchler, D.

    2016-02-15

    The 14.5 GHz GTS-LHC Electron Cyclotron Resonance Ion Source (ECRIS) provides multiply charged heavy ion beams for the CERN experimental program. The GTS-LHC beam formation has been studied extensively with lead, argon, and xenon beams with varied beam extraction conditions using the ion optical code IBSimu. The simulation model predicts self-consistently the formation of triangular and hollow beam structures which are often associated with ECRIS ion beams, as well as beam loss patterns which match the observed beam induced markings in the extraction region. These studies provide a better understanding of the properties of the extracted beams and a waymore » to diagnose the extraction system performance and limitations, which is otherwise challenging due to the lack of direct diagnostics in this region and the limited availability of the ion source for development work.« less

  6. Studies of the beam extraction system of the GTS-LHC electron cyclotron resonance ion source at CERN.

    PubMed

    Toivanen, V; Küchler, D

    2016-02-01

    The 14.5 GHz GTS-LHC Electron Cyclotron Resonance Ion Source (ECRIS) provides multiply charged heavy ion beams for the CERN experimental program. The GTS-LHC beam formation has been studied extensively with lead, argon, and xenon beams with varied beam extraction conditions using the ion optical code IBSimu. The simulation model predicts self-consistently the formation of triangular and hollow beam structures which are often associated with ECRIS ion beams, as well as beam loss patterns which match the observed beam induced markings in the extraction region. These studies provide a better understanding of the properties of the extracted beams and a way to diagnose the extraction system performance and limitations, which is otherwise challenging due to the lack of direct diagnostics in this region and the limited availability of the ion source for development work.

  7. Multiaperture ion beam extraction from gas-dynamic electron cyclotron resonance source of multicharged ions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sidorov, A.; Dorf, M.; Zorin, V.

    2008-02-15

    Electron cyclotron resonance ion source with quasi-gas-dynamic regime of plasma confinement (ReGIS), constructed at the Institute of Applied Physics, Russia, provides opportunities for extracting intense and high-brightness multicharged ion beams. Despite the short plasma lifetime in a magnetic trap of a ReGIS, the degree of multiple ionization may be significantly enhanced by the increase in power and frequency of the applied microwave radiation. The present work is focused on studying the intense beam quality of this source by the pepper-pot method. A single beamlet emittance measured by the pepper-pot method was found to be {approx}70 {pi} mm mrad, and themore » total extracted beam current obtained at 14 kV extraction voltage was {approx}25 mA. The results of the numerical simulations of ion beam extraction are found to be in good agreement with experimental data.« less

  8. Moderate pressure plasma source of nonthermal electrons

    NASA Astrophysics Data System (ADS)

    Gershman, S.; Raitses, Y.

    2018-06-01

    Plasma sources of electrons offer control of gas and surface chemistry without the need for complex vacuum systems. The plasma electron source presented here is based on a cold cathode glow discharge (GD) operating in a dc steady state mode in a moderate pressure range of 2–10 torr. Ion-induced secondary electron emission is the source of electrons accelerated to high energies in the cathode sheath potential. The source geometry is a key to the availability and the extraction of the nonthermal portion of the electron population. The source consists of a flat and a cylindrical electrode, 1 mm apart. Our estimates show that the length of the cathode sheath in the plasma source is commensurate (~0.5–1 mm) with the inter-electrode distance so the GD operates in an obstructed regime without a positive column. Estimations of the electron energy relaxation confirm the non-local nature of this GD, hence the nonthermal portion of the electron population is available for extraction outside of the source. The use of a cylindrical anode presents a simple and promising method of extracting the high energy portion of the electron population. Langmuir probe measurements and optical emission spectroscopy confirm the presence of electrons with energies ~15 eV outside of the source. These electrons become available for surface modification and radical production outside of the source. The extraction of the electrons of specific energies by varying the anode geometry opens exciting opportunities for future exploration.

  9. Origin of information-limiting noise correlations

    PubMed Central

    Kanitscheider, Ingmar; Coen-Cagli, Ruben; Pouget, Alexandre

    2015-01-01

    The ability to discriminate between similar sensory stimuli relies on the amount of information encoded in sensory neuronal populations. Such information can be substantially reduced by correlated trial-to-trial variability. Noise correlations have been measured across a wide range of areas in the brain, but their origin is still far from clear. Here we show analytically and with simulations that optimal computation on inputs with limited information creates patterns of noise correlations that account for a broad range of experimental observations while at same time causing information to saturate in large neural populations. With the example of a network of V1 neurons extracting orientation from a noisy image, we illustrate to our knowledge the first generative model of noise correlations that is consistent both with neurophysiology and with behavioral thresholds, without invoking suboptimal encoding or decoding or internal sources of variability such as stochastic network dynamics or cortical state fluctuations. We further show that when information is limited at the input, both suboptimal connectivity and internal fluctuations could similarly reduce the asymptotic information, but they have qualitatively different effects on correlations leading to specific experimental predictions. Our study indicates that noise at the sensory periphery could have a major effect on cortical representations in widely studied discrimination tasks. It also provides an analytical framework to understand the functional relevance of different sources of experimentally measured correlations. PMID:26621747

  10. Estimation of the limit of detection using information theory measures.

    PubMed

    Fonollosa, Jordi; Vergara, Alexander; Huerta, Ramón; Marco, Santiago

    2014-01-31

    Definitions of the limit of detection (LOD) based on the probability of false positive and/or false negative errors have been proposed over the past years. Although such definitions are straightforward and valid for any kind of analytical system, proposed methodologies to estimate the LOD are usually simplified to signals with Gaussian noise. Additionally, there is a general misconception that two systems with the same LOD provide the same amount of information on the source regardless of the prior probability of presenting a blank/analyte sample. Based upon an analogy between an analytical system and a binary communication channel, in this paper we show that the amount of information that can be extracted from an analytical system depends on the probability of presenting the two different possible states. We propose a new definition of LOD utilizing information theory tools that deals with noise of any kind and allows the introduction of prior knowledge easily. Unlike most traditional LOD estimation approaches, the proposed definition is based on the amount of information that the chemical instrumentation system provides on the chemical information source. Our findings indicate that the benchmark of analytical systems based on the ability to provide information about the presence/absence of the analyte (our proposed approach) is a more general and proper framework, while converging to the usual values when dealing with Gaussian noise. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia.

    PubMed

    Kavakiotis, Ioannis; Xochelli, Aliki; Agathangelidis, Andreas; Tsoumakas, Grigorios; Maglaveras, Nicos; Stamatopoulos, Kostas; Hadzidimitriou, Anastasia; Vlahavas, Ioannis; Chouvarda, Ioanna

    2016-06-06

    Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  12. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  13. Towards a realistic 3D simulation of the extraction region in ITER NBI relevant ion source

    NASA Astrophysics Data System (ADS)

    Mochalskyy, S.; Wünderlich, D.; Fantz, U.; Franzen, P.; Minea, T.

    2015-03-01

    The development of negative ion (NI) sources for ITER is strongly accompanied by modelling activities. The ONIX code addresses the physics of formation and extraction of negative hydrogen ions at caesiated sources as well as the amount of co-extracted electrons. In order to be closer to the experimental conditions the code has been improved. It includes now the bias potential applied to first grid (plasma grid) of the extraction system, and the presence of Cs+ ions in the plasma. The simulation results show that such aspects play an important role for the formation of an ion-ion plasma in the boundary region by reducing the depth of the negative potential well in vicinity to the plasma grid that limits the extraction of the NIs produced at the Cs covered plasma grid surface. The influence of the initial temperature of the surface produced NI and its emission rate on the NI density in the bulk plasma that in turn affects the beam formation region was analysed. The formation of the plasma meniscus, the boundary between the plasma and the beam, was investigated for the extraction potentials of 5 and 10 kV. At the smaller extraction potential the meniscus moves closer to the plasma grid but as in the case of 10 kV the deepest meniscus bend point is still outside of the aperture. Finally, a plasma containing the same amount of NI and electrons (nH- =ne =1017 m-3) , representing good source conditioning, was simulated. It is shown that at such conditions the extracted NI current can reach values of ˜32 mA cm-2 using ITER-relevant extraction potential of 10 kV and ˜19 mA cm-2 at 5 kV. These results are in good agreement with experimental measurements performed at the small scale ITER prototype source at the test facility BATMAN.

  14. Gait Recognition Based on Convolutional Neural Networks

    NASA Astrophysics Data System (ADS)

    Sokolova, A.; Konushin, A.

    2017-05-01

    In this work we investigate the problem of people recognition by their gait. For this task, we implement deep learning approach using the optical flow as the main source of motion information and combine neural feature extraction with the additional embedding of descriptors for representation improvement. In order to find the best heuristics, we compare several deep neural network architectures, learning and classification strategies. The experiments were made on two popular datasets for gait recognition, so we investigate their advantages and disadvantages and the transferability of considered methods.

  15. Absolute Bunch Length Measurements at the ALS by Incoherent Synchrotron Radiation Fluctuation Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Filippetto, D.; /Frascati; Sannibale, F.

    2008-01-24

    By analyzing the pulse to pulse intensity fluctuations of the radiation emitted by a charge particle in the incoherent part of the spectrum, it is possible to extract information about the spatial distribution of the beam. At the Advanced Light Source (ALS) of the Lawrence Berkeley National Laboratory, we have developed and tested a simple scheme based on this principle that allows for the absolute measurement of the bunch length. A description of the method and the experimental results are presented.

  16. Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data

    NASA Astrophysics Data System (ADS)

    Fard, Amin Milani

    Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.

  17. An ensemble method for extracting adverse drug events from social media.

    PubMed

    Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi

    2016-06-01

    Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Single photon source with individualized single photon certifications

    NASA Astrophysics Data System (ADS)

    Migdall, Alan L.; Branning, David A.; Castelletto, Stefania; Ware, M.

    2002-12-01

    As currently implemented, single-photon sources cannot be made to produce single photons with high probability, while simultaneously suppressing the probability of yielding two or more photons. Because of this, single photon sources cannot really produce single photons on demand. We describe a multiplexed system that allows the probabilities of producing one and more photons to be adjusted independently, enabling a much better approximation of a source of single photons on demand. The scheme uses a heralded photon source based on parametric downconversion, but by effectively breaking the trigger detector area into multiple regions, we are able to extract more information about a heralded photon than is possible with a conventional arrangement. This scheme allows photons to be produced along with a quantitative 'certification' that they are single photons. Some of the single-photon certifications can be significantly better than what is possible with conventional downconversion sources, as well as being better than faint laser sources. With such a source of more tightly certified single photons, it should be possible to improve the maximum secure bit rate possible over a quantum cryptographic link. We present an analysis of the relative merits of this method over the conventional arrangement.

  19. Two-Dimensional DOA and Polarization Estimation for a Mixture of Uncorrelated and Coherent Sources with Sparsely-Distributed Vector Sensor Array

    PubMed Central

    Si, Weijian; Zhao, Pinjiao; Qu, Zhiyu

    2016-01-01

    This paper presents an L-shaped sparsely-distributed vector sensor (SD-VS) array with four different antenna compositions. With the proposed SD-VS array, a novel two-dimensional (2-D) direction of arrival (DOA) and polarization estimation method is proposed to handle the scenario where uncorrelated and coherent sources coexist. The uncorrelated and coherent sources are separated based on the moduli of the eigenvalues. For the uncorrelated sources, coarse estimates are acquired by extracting the DOA information embedded in the steering vectors from estimated array response matrix of the uncorrelated sources, and they serve as coarse references to disambiguate fine estimates with cyclical ambiguity obtained from the spatial phase factors. For the coherent sources, four Hankel matrices are constructed, with which the coherent sources are resolved in a similar way as for the uncorrelated sources. The proposed SD-VS array requires only two collocated antennas for each vector sensor, thus the mutual coupling effects across the collocated antennas are reduced greatly. Moreover, the inter-sensor spacings are allowed beyond a half-wavelength, which results in an extended array aperture. Simulation results demonstrate the effectiveness and favorable performance of the proposed method. PMID:27258271

  20. Antimicrobial potential of macro and microalgae against pathogenic and spoilage microorganisms in food.

    PubMed

    Pina-Pérez, M C; Rivas, A; Martínez, A; Rodrigo, D

    2017-11-15

    Algae are a valuable and never-failing source of bioactive compounds. The increasing efforts to use ingredients that are as natural as possible in the formulation of innovative products has given rise to the introduction of macro and microalgae in food industry. To date, scarce information has been published about algae ingredients as antimicrobials in food. The antimicrobial potential of algae is highly dependent on: (i) type, brown algae being the most effective against foodborne bacteria; (ii) the solvent used in the extraction of bioactive compounds, ethanolic and methanolic extracts being highly effective against Gram-positive and Gram-negative bacteria; and (iii) the concentration of the extract. The present paper reviews the main antimicrobial potential of algal species and their bioactive compounds in reference and real food matrices. The validation of the algae antimicrobial potential in real food matrices is still a research niche, being meat and bakery products the most studied substrates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Automatic optical detection and classification of marine animals around MHK converters using machine vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brunton, Steven

    Optical systems provide valuable information for evaluating interactions and associations between organisms and MHK energy converters and for capturing potentially rare encounters between marine organisms and MHK device. The deluge of optical data from cabled monitoring packages makes expert review time-consuming and expensive. We propose algorithms and a processing framework to automatically extract events of interest from underwater video. The open-source software framework consists of background subtraction, filtering, feature extraction and hierarchical classification algorithms. This principle classification pipeline was validated on real-world data collected with an experimental underwater monitoring package. An event detection rate of 100% was achieved using robustmore » principal components analysis (RPCA), Fourier feature extraction and a support vector machine (SVM) binary classifier. The detected events were then further classified into more complex classes – algae | invertebrate | vertebrate, one species | multiple species of fish, and interest rank. Greater than 80% accuracy was achieved using a combination of machine learning techniques.« less

  2. Contour metrology using critical dimension atomic force microscopy

    NASA Astrophysics Data System (ADS)

    Orji, Ndubuisi G.; Dixson, Ronald G.; Vladár, András E.; Ming, Bin; Postek, Michael T.

    2012-03-01

    The critical dimension atomic force microscope (CD-AFM), which is used as a reference instrument in lithography metrology, has been proposed as a complementary instrument for contour measurement and verification. Although data from CD-AFM is inherently three dimensional, the planar two-dimensional data required for contour metrology is not easily extracted from the top-down CD-AFM data. This is largely due to the limitations of the CD-AFM method for controlling the tip position and scanning. We describe scanning techniques and profile extraction methods to obtain contours from CD-AFM data. We also describe how we validated our technique, and explain some of its limitations. Potential sources of error for this approach are described, and a rigorous uncertainty model is presented. Our objective is to show which data acquisition and analysis methods could yield optimum contour information while preserving some of the strengths of CD-AFM metrology. We present comparison of contours extracted using our technique to those obtained from the scanning electron microscope (SEM), and the helium ion microscope (HIM).

  3. Security of quantum key distribution with multiphoton components

    PubMed Central

    Yin, Hua-Lei; Fu, Yao; Mao, Yingqiu; Chen, Zeng-Bing

    2016-01-01

    Most qubit-based quantum key distribution (QKD) protocols extract the secure key merely from single-photon component of the attenuated lasers. However, with the Scarani-Acin-Ribordy-Gisin 2004 (SARG04) QKD protocol, the unconditionally secure key can be extracted from the two-photon component by modifying the classical post-processing procedure in the BB84 protocol. Employing the merits of SARG04 QKD protocol and six-state preparation, one can extract secure key from the components of single photon up to four photons. In this paper, we provide the exact relations between the secure key rate and the bit error rate in a six-state SARG04 protocol with single-photon, two-photon, three-photon, and four-photon sources. By restricting the mutual information between the phase error and bit error, we obtain a higher secure bit error rate threshold of the multiphoton components than previous works. Besides, we compare the performances of the six-state SARG04 with other prepare-and-measure QKD protocols using decoy states. PMID:27383014

  4. Methods of analysis for complex organic aerosol mixtures from urban emission sources of particulate carbon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mazurek, M.A.; Hildemann, L.M.; Cass, G.R.

    1990-04-01

    Extractable organic compounds having between 6 to 40 carbon atoms comprise an important mass fraction of the fine particulate matter samples from major urban emission sources. Depending on the emission source type, this solvent-soluble fraction accounts for <20% to 100% of the total organic aerosol mass, as measured by quantitative high-resolution has chromatography (HRGC) with flame ionization detection. In addition to total extract quantitation, HRGC can be applied to further analyses of the mass distributions of elutable organics present in the complex aerosol extract mixtures, thus generating profiles that serve as fingerprints'' for the sources of interest. This HRGC analyticalmore » method is applied to emission source samples that contain between 7 to 12,000 {mu}g/filter organic carbon. It is shown to be a sensitive technique for analysis of carbonaceous aerosol extract mixtures having diverse mass loadings and species distributions. This study describes the analytical chemical methods that have been applied to: the construction of chemical mass balances based on the mass of fine organic aerosol emitted for major urban sources of particulate carbon; and the generation of discrete emission source chemical profiles derived from chromatographic characteristics of the organic aerosol components. 21 refs., 1 fig., 2 tabs.« less

  5. High-Resolution Remote Sensing Image Building Extraction Based on Markov Model

    NASA Astrophysics Data System (ADS)

    Zhao, W.; Yan, L.; Chang, Y.; Gong, L.

    2018-04-01

    With the increase of resolution, remote sensing images have the characteristics of increased information load, increased noise, more complex feature geometry and texture information, which makes the extraction of building information more difficult. To solve this problem, this paper designs a high resolution remote sensing image building extraction method based on Markov model. This method introduces Contourlet domain map clustering and Markov model, captures and enhances the contour and texture information of high-resolution remote sensing image features in multiple directions, and further designs the spectral feature index that can characterize "pseudo-buildings" in the building area. Through the multi-scale segmentation and extraction of image features, the fine extraction from the building area to the building is realized. Experiments show that this method can restrain the noise of high-resolution remote sensing images, reduce the interference of non-target ground texture information, and remove the shadow, vegetation and other pseudo-building information, compared with the traditional pixel-level image information extraction, better performance in building extraction precision, accuracy and completeness.

  6. Industrially synthesized single-walled carbon nanotubes: compositional data for users, environmental risk assessments, and source apportionment

    NASA Astrophysics Data System (ADS)

    Plata, D. L.; Gschwend, P. M.; Reddy, C. M.

    2008-05-01

    Commercially available single-walled carbon nanotubes (SWCNTs) contain large percentages of metal and carbonaceous impurities. These fractions influence the SWCNT physical properties and performance, yet their chemical compositions are not well defined. This lack of information also precludes accurate environmental risk assessments for specific SWCNT stocks, which emerging local legislation requires of nanomaterial manufacturers. To address these needs, we measured the elemental, molecular, and stable carbon isotope compositions of commercially available SWCNTs. As expected, catalytic metals occurred at per cent levels (1.3-29%), but purified materials also contained unexpected metals (e.g., Cu, Pb at 0.1-0.3 ppt). Nitrogen contents (up to 0.48%) were typically greater in arc-produced SWCNTs than in those derived from chemical vapor deposition. Toluene-extractable materials contributed less than 5% of the total mass of the SWCNTs. Internal standard losses during dichloromethane extractions suggested that metals are available for reductive dehalogenation reactions, ultimately resulting in the degradation of aromatic internal standards. The carbon isotope content of the extracted material suggested that SWCNTs acquired much of their carbonaceous contamination from their storage environment. Some of the SWCNTs, themselves, were highly depleted in 13C relative to petroleum-derived chemicals. The distinct carbon isotopic signatures and unique metal 'fingerprints' may be useful as environmental tracers allowing assessment of SWCNT sources to the environment.

  7. Qualitative and quantitative analysis of Dibenzofuran, Alkyldibenzofurans, and Benzo[b]naphthofurans in crude oils and source rock extracts

    USGS Publications Warehouse

    Meijun Li,; Ellis, Geoffrey S.

    2015-01-01

    Dibenzofuran (DBF), its alkylated homologues, and benzo[b]naphthofurans (BNFs) are common oxygen-heterocyclic aromatic compounds in crude oils and source rock extracts. A series of positional isomers of alkyldibenzofuran and benzo[b]naphthofuran were identified in mass chromatograms by comparison with internal standards and standard retention indices. The response factors of dibenzofuran in relation to internal standards were obtained by gas chromatography-mass spectrometry analyses of a set of mixed solutions with different concentration ratios. Perdeuterated dibenzofuran and dibenzothiophene are optimal internal standards for quantitative analyses of furan compounds in crude oils and source rock extracts. The average concentration of the total DBFs in oils derived from siliciclastic lacustrine rock extracts from the Beibuwan Basin, South China Sea, was 518 μg/g, which is about 5 times that observed in the oils from carbonate source rocks in the Tarim Basin, Northwest China. The BNFs occur ubiquitously in source rock extracts and related oils of various origins. The results of this work suggest that the relative abundance of benzo[b]naphthofuran isomers, that is, the benzo[b]naphtho[2,1-d]furan/{benzo[b]naphtho[2,1-d]furan + benzo[b]naphtho[1,2-d]furan} ratio, may be a potential molecular geochemical parameter to indicate oil migration pathways and distances.

  8. Modeling of negative ion transport in a plasma source

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riz, David; Departement de Recherches sur la Fusion Controelee CE Cadarache, 13108 St Paul lez Durance; Pamela, Jerome

    1998-08-20

    A code called NIETZSCHE has been developed to simulate the negative ion transport in a plasma source, from their birth place to the extraction holes. The ion trajectory is calculated by numerically solving the 3-D motion equation, while the atomic processes of destruction, of elastic collision H{sup -}/H{sup +} and of charge exchange H{sup -}/H{sup 0} are handled at each time step by a Monte-Carlo procedure. This code can be used to calculate the extraction probability of a negative ion produced at any location inside the source. Calculations performed with NIETZSCHE have allowed to explain, either quantitatively or qualitatively, severalmore » phenomena observed in negative ion sources, such as the isotopic H{sup -}/D{sup -} effect, and the influence of the plasma grid bias or of the magnetic filter on the negative ion extraction. The code has also shown that in the type of sources contemplated for ITER, which operate at large arc power densities (>1 W cm{sup -3}), negative ions can reach the extraction region provided if they are produced at a distance lower than 2 cm from the plasma grid in the case of 'volume production' (dissociative attachment processes), or if they are produced at the plasma grid surface, in the vicinity of the extraction holes.« less

  9. Modeling of negative ion transport in a plasma source (invited)

    NASA Astrophysics Data System (ADS)

    Riz, David; Paméla, Jérôme

    1998-02-01

    A code called NIETZSCHE has been developed to simulate the negative ion transport in a plasma source, from their birth place to the extraction holes. The H-/D- trajectory is calculated by numerically solving the 3D motion equation, while the atomic processes of destruction, of elastic collision with H+/D+ and of charge exchange with H0/D0 are handled at each time step by a Monte Carlo procedure. This code can be used to calculate the extraction probability of a negative ion produced at any location inside the source. Calculations performed with NIETZSCHE have been allowed to explain, either quantitatively or qualitatively, several phenomena observed in negative ion sources, such as the isotopic H-/D- effect, and the influence of the plasma grid bias or of the magnetic filter on the negative ion extraction. The code has also shown that, in the type of sources contemplated for ITER, which operate at large arc power densities (>1 W cm-3), negative ions can reach the extraction region provided they are produced at a distance lower than 2 cm from the plasma grid in the case of volume production (dissociative attachment processes), or if they are produced at the plasma grid surface, in the vicinity of the extraction holes.

  10. Modeling of negative ion transport in a plasma source

    NASA Astrophysics Data System (ADS)

    Riz, David; Paméla, Jérôme

    1998-08-01

    A code called NIETZSCHE has been developed to simulate the negative ion transport in a plasma source, from their birth place to the extraction holes. The ion trajectory is calculated by numerically solving the 3-D motion equation, while the atomic processes of destruction, of elastic collision H-/H+ and of charge exchange H-/H0 are handled at each time step by a Monte-Carlo procedure. This code can be used to calculate the extraction probability of a negative ion produced at any location inside the source. Calculations performed with NIETZSCHE have allowed to explain, either quantitatively or qualitatively, several phenomena observed in negative ion sources, such as the isotopic H-/D- effect, and the influence of the plasma grid bias or of the magnetic filter on the negative ion extraction. The code has also shown that in the type of sources contemplated for ITER, which operate at large arc power densities (>1 W cm-3), negative ions can reach the extraction region provided if they are produced at a distance lower than 2 cm from the plasma grid in the case of «volume production» (dissociative attachment processes), or if they are produced at the plasma grid surface, in the vicinity of the extraction holes.

  11. Evaluation of Genotoxic and Mutagenic Activity of Organic Extracts from Drinking Water Sources

    PubMed Central

    Guan, Ying; Wang, Xiaodong; Wong, Minghung; Sun, Guoping; An, Taicheng; Guo, Jun

    2017-01-01

    An increasing number of industrial, agricultural and commercial chemicals in the aquatic environment lead to various deleterious effects on organisms, which is becoming a serious global health concern. In this study, the Ames test and SOS/umu test were conducted to investigate the potential genotoxicity and mutagenicity caused by organic extracts from drinking water sources. Organic content of source water was extracted with XAD-2 resin column and organic solvents. Four doses of the extract equivalent to 0.25, 0.5, 1 and 2L of source water were tested for toxicity. All the water samples were collected from six different locations in Guangdong province. The results of the Ames test and SOS/umu test showed that all the organic extracts from the water samples could induce different levels of DNA damage and mutagenic potentials at the dose of 2 L in the absence of S9 mix, which demonstrated the existence of genotoxicity and mutagenicity. Additionally, we found that Salmonella typhimurium strain TA98 was more sensitive for the mutagen. Correlation analysis between genotoxicity, Organochlorine Pesticides (OCPs) and Polycyclic Aromatic Hydrocarbons (PAHs) showed that most individual OCPs were frame shift toxicants in drinking water sources, and there was no correlation with total OCPs and PAHs. PMID:28125725

  12. Speech-Message Extraction from Interference Introduced by External Distributed Sources

    NASA Astrophysics Data System (ADS)

    Kanakov, V. A.; Mironov, N. A.

    2017-08-01

    The problem of this study involves the extraction of a speech signal originating from a certain spatial point and calculation of the intelligibility of the extracted voice message. It is solved by the method of decreasing the influence of interference from the speech-message sources on the extracted signal. This method is based on introducing the time delays, which depend on the spatial coordinates, to the recording channels. Audio records of the voices of eight different people were used as test objects during the studies. It is proved that an increase in the number of microphones improves intelligibility of the speech message which is extracted from interference.

  13. Effect of three extraction techniques on submitochondrial particle and Microtox bioassays for airborne particulate matter.

    PubMed

    Torres-Pérez, Mónica I; Jiménez-Velez, Braulio D; Mansilla-Rivera, Imar; Rodríguez-Sierra, Carlos J

    2005-03-01

    The effect that three extraction techniques (e.g., Soxhlet, ultrasound and microwave-assisted extraction) have on the toxicity, as measured by submitochondrial particle (SMP) and Microtox assays, of organic extracts was compared from three sources of airborne particulate matter (APM). The extraction technique influenced the toxicity response of APM extracts and it was dependent on the bioassay method, and APM sample source. APM extracts from microwave-assisted extraction (MAE) were similar or more toxic than the conventional extraction techniques of Soxhlet and ultrasound, thus, providing an alternate extraction method. The microwave extraction technique has the advantage of using less solvent volume, less extraction time, and the capacity to simultaneously extract twelve samples. The ordering of APM toxicity was generally urban dust > diesel dust > PM10 (particles with diameter < 10 microm), thus, reflecting different chemical composition of the samples. This study is the first to report the suitability of two standard in-vitro bioassays for the future toxicological characterization of APM collected from Puerto Rico, with the SMP generally showing better sensitivity to the well-known Microtox bioassay.

  14. EXTraS: Exploring the X-ray Transient and variable Sky

    NASA Astrophysics Data System (ADS)

    De Luca, A.; Salvaterra, R.; Tiengo, A.; D'Agostino, D.; Watson, M.; Haberl, F.; Wilms, J.

    2017-10-01

    The EXTraS project extracted all temporal domain information buried in the whole database collected by the EPIC cameras onboard the XMM-Newton mission. This included a search and characterisation of variability, both periodic and aperiodic, in hundreds of thousands of sources spanning more than eight orders of magnitude in time scale and six orders of magnitude in flux, as well as a search for fast transients, missed by standard image analysis. Phenomenological classification of variable sources, based on X-ray and multiwavelength information, has also been performed. All results and products of EXTraS are made available to the scientific community through a web public data archive. A dedicated science gateway will allow scientists to apply EXTraS pipelines on new observations. EXTraS is the most comprehensive analysis of variability, on the largest ever sample of soft X-ray sources. The resulting archive and tools disclose an enormous scientific discovery space to the community, with applications ranging from the search for rare events to population studies, with impact on the study of virtually all astrophysical source classes. EXTraS, funded within the EU/FP7 framework, is carried out by a collaboration including INAF (Italy), IUSS (Italy), CNR/IMATI (Italy), University of Leicester (UK), MPE (Germany) and ECAP (Germany).

  15. Quantifying sources of methane and light alkanes in the Los Angeles Basin, California

    NASA Astrophysics Data System (ADS)

    Peischl, Jeff; Ryerson, Thomas; Atlas, Elliot; Blake, Donald; Brioude, Jerome; Daube, Bruce; de Gouw, Joost; Frost, Gregory; Gentner, Drew; Gilman, Jessica; Goldstein, Allen; Harley, Robert; Holloway, John; Kuster, William; Santoni, Gregory; Trainer, Michael; Wofsy, Steven; Parrish, David

    2013-04-01

    We use ambient measurements to apportion the relative contributions of different source sectors to the methane (CH4) emissions budget of a U.S. megacity. This approach uses ambient measurements of methane and C2-C5 alkanes (ethane through pentanes) and includes source composition information to distinguish between methane emitted from landfills and feedlots, wastewater treatment plants, tailpipe emissions, leaks of dry natural gas in pipelines and/or local seeps, and leaks of locally produced (unprocessed) natural gas. Source composition information can be taken from existing tabulations or developed by direct sampling of emissions using a mobile platform. By including C2-C5 alkane information, a linear combination of these source signatures can be found to match the observed atmospheric enhancement ratios to determine relative emissions strengths. We apply this technique to apportion CH4 emissions in Los Angeles, CA (L.A.) using data from the CalNex field project in 2010. Our analysis of L.A. atmospheric data shows the two largest CH4 sources in the city are emissions of gas from pipelines and/or from geologic seeps (47%), and emissions from landfills (40%). Local oil and gas production is a relatively minor source of CH4, contributing 8% of total CH4 emissions in L.A. Absolute CH4 emissions rates are derived by multiplying the observed CH4/CO enhancement ratio by State of California inventory values for carbon monoxide (CO) emissions in Los Angeles. Apportioning this total suggests that emissions from the combined natural and anthropogenic gas sources account for the differences between top-down and bottom-up CH4 estimates previously published for Los Angeles. Further, total CH4 emission attributed in our analysis to local gas extraction represents 17% of local production. While a derived leak rate of 17% of local production may seem unrealistically high, it is qualitatively consistent with the 12% reported in a recent state inventory survey of the L.A. oil and gas industry.

  16. First results of the ITER-relevant negative ion beam test facility ELISE (invited).

    PubMed

    Fantz, U; Franzen, P; Heinemann, B; Wünderlich, D

    2014-02-01

    An important step in the European R&D roadmap towards the neutral beam heating systems of ITER is the new test facility ELISE (Extraction from a Large Ion Source Experiment) for large-scale extraction from a half-size ITER RF source. The test facility was constructed in the last years at Max-Planck-Institut für Plasmaphysik Garching and is now operational. ELISE is gaining early experience of the performance and operation of large RF-driven negative hydrogen ion sources with plasma illumination of a source area of 1 × 0.9 m(2) and an extraction area of 0.1 m(2) using 640 apertures. First results in volume operation, i.e., without caesium seeding, are presented.

  17. In-situ continuous water monitoring system

    DOEpatents

    Thompson, Cyril V.; Wise, Marcus B.

    1998-01-01

    An in-situ continuous liquid monitoring system for continuously analyzing volatile components contained in a water source comprises: a carrier gas supply, an extraction container and a mass spectrometer. The carrier gas supply continuously supplies the carrier gas to the extraction container and is mixed with a water sample that is continuously drawn into the extraction container by the flow of carrier gas into the liquid directing device. The carrier gas continuously extracts the volatile components out of the water sample. The water sample is returned to the water source after the volatile components are extracted from it. The extracted volatile components and the carrier gas are delivered continuously to the mass spectrometer and the volatile components are continuously analyzed by the mass spectrometer.

  18. In-situ continuous water monitoring system

    DOEpatents

    Thompson, C.V.; Wise, M.B.

    1998-03-31

    An in-situ continuous liquid monitoring system for continuously analyzing volatile components contained in a water source comprises: a carrier gas supply, an extraction container and a mass spectrometer. The carrier gas supply continuously supplies the carrier gas to the extraction container and is mixed with a water sample that is continuously drawn into the extraction container by the flow of carrier gas into the liquid directing device. The carrier gas continuously extracts the volatile components out of the water sample. The water sample is returned to the water source after the volatile components are extracted from it. The extracted volatile components and the carrier gas are delivered continuously to the mass spectrometer and the volatile components are continuously analyzed by the mass spectrometer. 2 figs.

  19. Woody biomass: Niche position as a source of sustainable renewable chemicals and energy and kinetics of hot-water extraction/hydrolysis.

    PubMed

    Liu, Shijie

    2010-01-01

    The conversion of biomass to chemicals and energy is imperative to sustaining our way of life as known to us today. Fossil chemical and energy sources are traditionally regarded as wastes from a distant past. Petroleum, natural gas, and coal are not being regenerated in a sustainable manner. However, biomass sources such as algae, grasses, bushes and forests are continuously being replenished. Woody biomass represents the most abundant and available biomass source. Woody biomass is a reliably sustainable source of chemicals and energy that could be replenished at a rate consistent with our needs. The biorefinery is a concept describing the collection of processes used to convert biomass to chemicals and energy. Woody biomass presents more challenges than cereal grains for conversion to platform chemicals due to its stereochemical structures. Woody biomass can be thought of as comprised of at least four components: extractives, hemicellulose, lignin and cellulose. Each of these four components has a different degree of resistance to chemical, thermal and biological degradation. The biorefinery concept proposed at ESF (State University of New York - College of Environmental Science and Forestry) aims at incremental sequential deconstruction, fractionation/conversion of woody biomass to achieve efficient separation of major components. The emphasis of this work is on the kinetics of hot-water extraction, filling the gap in the fundamental understanding, linking engineering developments, and completing the first step in the biorefinery processes. This first step removes extractives and hemicellulose fractions from woody biomass. While extractives and hemicellulose are largely removed in the extraction liquor, cellulose and lignin largely remain in the residual woody structure. Xylo-oligomers and acetic acid in the extract are the major components having the greatest potential value for development. Extraction/hydrolysis involves at least 16 general reactions that could be divided into four categories: adsorption of proton onto woody biomass, hydrolysis reactions on the woody biomass surface, dissolution of soluble substances into the extraction liquor, and hydrolysis and dehydration decomposition in the extraction liquor. The extraction/hydrolysis rates are significantly simplified when the reactivity of all the intermonomer bonds are regarded as identical within each macromolecule, and the overall reactivity are identical for all the extractable macromolecules on the surface. A pseudo-first order extraction rate expression has been derived based on concentrations in monomer units. The reaction rate constant is however lower at the beginning of the extraction than that towards the end of the extraction. Furthermore, the H-factor and/or severity factor can be applied to lump the effects of temperature and residence time on the extraction process, at least for short times. This provides a means to control and optimize the performance of the extraction process effectively. Copyright 2010 Elsevier Inc. All rights reserved.

  20. A new framing approach in guideline development to manage different sources of knowledge.

    PubMed

    Lukersmith, Sue; Hopman, Katherine; Vine, Kristina; Krahe, Lee; McColl, Alexander

    2017-02-01

    Contemporary guideline methodology struggles to consider context and information from different sources of knowledge besides quantitative research. Return to work programmes involve multiple components and stakeholders. If the guideline is to be relevant and practical for a complex intervention such as return to work, it is essential to use broad sources of knowledge. This paper reports on a new method in guideline development to manage different sources of knowledge. The method used framing for the return-to-work guidance within the Clinical Practice Guidelines for the Management of Rotator Cuff Syndrome in the Workplace. The development involved was a multi-disciplinary working party of experts including consumers. The researchers considered a broad range of research, expert (practice and experience) knowledge, the individual's and workplace contexts, and used framing with the International Classification of Functioning, Disability and Health. Following a systematic database search on four clinical questions, there were seven stages of knowledge management to extract, unpack, map and pack information to the ICF domains framework. Companion graded recommendations were developed. The results include practical examples, user and consumer guides, flow charts and six graded or consensus recommendations on best practice for return to work intervention. Our findings suggest using framing in guideline methodology with internationally accepted frames such as the ICF is a reliable and transparent framework to manage different sources of knowledge. Future research might examine other examples and methods for managing complexity and using different sources of knowledge in guideline development. © 2016 John Wiley & Sons, Ltd.

  1. Integrating GPCR-specific information with full text articles

    PubMed Central

    2011-01-01

    Background With the continued growth in the volume both of experimental G protein-coupled receptor (GPCR) data and of the related peer-reviewed literature, the ability of GPCR researchers to keep up-to-date is becoming increasingly curtailed. Results We present work that integrates the biological data and annotations in the GPCR information system (GPCRDB) with next-generation methods for intelligently exploring, visualising and interacting with the scientific articles used to disseminate them. This solution automatically retrieves relevant information from GPCRDB and displays it both within and as an adjunct to an article. Conclusions This approach allows researchers to extract more knowledge more swiftly from literature. Importantly, it allows reinterpretation of data in articles published before GPCR structure data became widely available, thereby rescuing these valuable data from long-dormant sources. PMID:21910883

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, T.; Yang, Z.; Dong, P.

    The cold-cathode Penning ion gauge (PIG) type ion source has been used for generation of negative hydrogen (H{sup -}) ions as the internal ion source of a compact cyclotron. A novel method called electrical shielding box dc beam measurement is described in this paper, and the beam intensity was measured under dc extraction inside an electrical shielding box. The results of the trajectory simulation and dc H{sup -} beam extraction measurement were presented. The effect of gas flow rate, magnetic field strength, arc current, and extraction voltage were also discussed. In conclusion, the dc H{sup -} beam current of aboutmore » 4 mA from the PIG ion source with the puller voltage of 40 kV and arc current of 1.31 A was extrapolated from the measurement at low extraction dc voltages.« less

  3. A rapid extraction of landslide disaster information research based on GF-1 image

    NASA Astrophysics Data System (ADS)

    Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na

    2015-08-01

    In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.

  4. An open-source framework for stress-testing non-invasive foetal ECG extraction algorithms.

    PubMed

    Andreotti, Fernando; Behar, Joachim; Zaunseder, Sebastian; Oster, Julien; Clifford, Gari D

    2016-05-01

    Over the past decades, many studies have been published on the extraction of non-invasive foetal electrocardiogram (NI-FECG) from abdominal recordings. Most of these contributions claim to obtain excellent results in detecting foetal QRS (FQRS) complexes in terms of location. A small subset of authors have investigated the extraction of morphological features from the NI-FECG. However, due to the shortage of available public databases, the large variety of performance measures employed and the lack of open-source reference algorithms, most contributions cannot be meaningfully assessed. This article attempts to address these issues by presenting a standardised methodology for stress testing NI-FECG algorithms, including absolute data, as well as extraction and evaluation routines. To that end, a large database of realistic artificial signals was created, totaling 145.8 h of multichannel data and over one million FQRS complexes. An important characteristic of this dataset is the inclusion of several non-stationary events (e.g. foetal movements, uterine contractions and heart rate fluctuations) that are critical for evaluating extraction routines. To demonstrate our testing methodology, three classes of NI-FECG extraction algorithms were evaluated: blind source separation (BSS), template subtraction (TS) and adaptive methods (AM). Experiments were conducted to benchmark the performance of eight NI-FECG extraction algorithms on the artificial database focusing on: FQRS detection and morphological analysis (foetal QT and T/QRS ratio). The overall median FQRS detection accuracies (i.e. considering all non-stationary events) for the best performing methods in each group were 99.9% for BSS, 97.9% for AM and 96.0% for TS. Both FQRS detections and morphological parameters were shown to heavily depend on the extraction techniques and signal-to-noise ratio. Particularly, it is shown that their evaluation in the source domain, obtained after using a BSS technique, should be avoided. Data, extraction algorithms and evaluation routines were released as part of the fecgsyn toolbox on Physionet under an GNU GPL open-source license. This contribution provides a standard framework for benchmarking and regulatory testing of NI-FECG extraction algorithms.

  5. Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

    ERIC Educational Resources Information Center

    Sun, Chong

    2012-01-01

    More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…

  6. Italian Opuntia ficus-indica Cladodes as Rich Source of Bioactive Compounds with Health-Promoting Properties

    PubMed Central

    Pellizzoni, Marco; Lucini, Luigi

    2018-01-01

    Natural by-products, especially phenolic compounds, are in great demand by the nutra-pharmaceutical and biomedical industries. An analytical study was performed to investigate, for the first time, the presence of antioxidant constituents and the corresponding in vitro antioxidant activity in the extract of cladodes from Ficodindia di San Cono (Opuntia ficus-indica) protected designation of origin (PDO). The cladode extracts were analysed for target determination of selected constituents, i.e., β-polysaccharides and total phenolic content. Moreover, the antioxidant activity of hydro-alcoholic extracts was assessed by means of two different methods: α, α-diphenyl-β-picrylhydrazyl (DPPH) free radical scavenging method and ferric reducing antioxidant power (FRAP) assay. An untargeted UHPLC-ESI-QTOF-MS profiling approach was used to depict the phenolic profile of hydro-alcoholic cladode extracts. Interestingly, over 2 g/kg of polyphenols were detected in this matrix, and these compounds were mainly responsible for the antioxidant properties, as shown by the strong correlation between phenolic classes and antioxidant scores. Finally, this study provides basic information on the presence of bioactive compounds and in vitro antioxidant activities in cladode extracts from cactus that might recommend their novel applications at the industrial level in the field of nutraceutical products. PMID:29463028

  7. Italian Opuntia ficus-indica Cladodes as Rich Source of Bioactive Compounds with Health-Promoting Properties.

    PubMed

    Rocchetti, Gabriele; Pellizzoni, Marco; Montesano, Domenico; Lucini, Luigi

    2018-02-18

    Natural by-products, especially phenolic compounds, are in great demand by the nutra-pharmaceutical and biomedical industries. An analytical study was performed to investigate, for the first time, the presence of antioxidant constituents and the corresponding in vitro antioxidant activity in the extract of cladodes from Ficodindia di San Cono ( Opuntia ficus-indica ) protected designation of origin (PDO). The cladode extracts were analysed for target determination of selected constituents, i.e. β-polysaccharides and total phenolic content. Moreover, the antioxidant activity of hydro-alcoholic extracts was assessed by means of two different methods: α, α-diphenyl-β-picrylhydrazyl (DPPH) free radical scavenging method and ferric reducing antioxidant power (FRAP) assay. An untargeted UHPLC-ESI-QTOF-MS profiling approach was used to depict the phenolic profile of hydro-alcoholic cladode extracts. Interestingly, over 2 g/kg of polyphenols were detected in this matrix, and these compounds were mainly responsible for the antioxidant properties, as shown by the strong correlation between phenolic classes and antioxidant scores. Finally, this study provides basic information on the presence of bioactive compounds and in vitro antioxidant activities in cladode extracts from cactus that might recommend their novel applications at the industrial level in the field of nutraceutical products.

  8. Versatile and efficient pore network extraction method using marker-based watershed segmentation

    NASA Astrophysics Data System (ADS)

    Gostick, Jeff T.

    2017-08-01

    Obtaining structural information from tomographic images of porous materials is a critical component of porous media research. Extracting pore networks is particularly valuable since it enables pore network modeling simulations which can be useful for a host of tasks from predicting transport properties to simulating performance of entire devices. This work reports an efficient algorithm for extracting networks using only standard image analysis techniques. The algorithm was applied to several standard porous materials ranging from sandstone to fibrous mats, and in all cases agreed very well with established or known values for pore and throat sizes, capillary pressure curves, and permeability. In the case of sandstone, the present algorithm was compared to the network obtained using the current state-of-the-art algorithm, and very good agreement was achieved. Most importantly, the network extracted from an image of fibrous media correctly predicted the anisotropic permeability tensor, demonstrating the critical ability to detect key structural features. The highly efficient algorithm allows extraction on fairly large images of 5003 voxels in just over 200 s. The ability for one algorithm to match materials as varied as sandstone with 20% porosity and fibrous media with 75% porosity is a significant advancement. The source code for this algorithm is provided.

  9. Inferring Small Scale Dynamics from Aircraft Measurements of Tracers

    NASA Technical Reports Server (NTRS)

    Sparling, L. C.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    The millions of ER-2 and DC-8 aircraft measurements of long-lived tracers in the Upper Troposphere/Lower Stratosphere (UT/LS) hold enormous potential as a source of statistical information about subgrid scale dynamics. Extracting this information however can be extremely difficult because the measurements are made along a 1-D transect through fields that are highly anisotropic in all three dimensions. Some of the challenges and limitations posed by both the instrumentation and platform are illustrated within the context of the problem of using the data to obtain an estimate of the dissipation scale. This presentation will also include some tutorial remarks about the conditional and two-point statistics used in the analysis.

  10. Cross-beam coherence of infrasonic signals at local and regional ranges.

    PubMed

    Alberts, W C Kirkpatrick; Tenney, Stephen M

    2017-11-01

    Signals collected by infrasound arrays require continuous analysis by skilled personnel or by automatic algorithms in order to extract useable information. Typical pieces of information gained by analysis of infrasonic signals collected by multiple sensor arrays are arrival time, line of bearing, amplitude, and duration. These can all be used, often with significant accuracy, to locate sources. A very important part of this chain is associating collected signals across multiple arrays. Here, a pairwise, cross-beam coherence method of signal association is described that allows rapid signal association for high signal-to-noise ratio events captured by multiple infrasound arrays at ranges exceeding 150 km. Methods, test cases, and results are described.

  11. Verification of nonlinear dynamic structural test results by combined image processing and acoustic analysis

    NASA Astrophysics Data System (ADS)

    Tene, Yair; Tene, Noam; Tene, G.

    1993-08-01

    An interactive data fusion methodology of video, audio, and nonlinear structural dynamic analysis for potential application in forensic engineering is presented. The methodology was developed and successfully demonstrated in the analysis of heavy transportable bridge collapse during preparation for testing. Multiple bridge elements failures were identified after the collapse, including fracture, cracks and rupture of high performance structural materials. Videotape recording by hand held camcorder was the only source of information about the collapse sequence. The interactive data fusion methodology resulted in extracting relevant information form the videotape and from dynamic nonlinear structural analysis, leading to full account of the sequence of events during the bridge collapse.

  12. Multi-element sewer slime impact pattern--a quantitative characteristic enabling identification of the source of heavy metal discharges into sewer systems.

    PubMed

    Kintrup, J; Wünsch, G

    2001-11-01

    The capability of sewer slime to accumulate heavy metals from municipal wastewater can be exploited to identify the sources of sewage sludge pollution. Former investigations of sewer slime looked for a few elements only and could, therefore, not account for deviations of the enrichment efficiency of the slime or for irregularities from sampling. Results of ICP-MS multi element determinations were analyzed by multivariate statistical methods. A new dimensionless characteristic "sewer slime impact" is proposed, which is zero for unloaded samples. Patterns expressed in this data format specifically extract the information required to identify the type of pollution and polluter quicker and with less effort and cost than hitherto.

  13. VAUD: A Visual Analysis Approach for Exploring Spatio-Temporal Urban Data.

    PubMed

    Chen, Wei; Huang, Zhaosong; Wu, Feiran; Zhu, Minfeng; Guan, Huihua; Maciejewski, Ross

    2017-10-02

    Urban data is massive, heterogeneous, and spatio-temporal, posing a substantial challenge for visualization and analysis. In this paper, we design and implement a novel visual analytics approach, Visual Analyzer for Urban Data (VAUD), that supports the visualization, querying, and exploration of urban data. Our approach allows for cross-domain correlation from multiple data sources by leveraging spatial-temporal and social inter-connectedness features. Through our approach, the analyst is able to select, filter, aggregate across multiple data sources and extract information that would be hidden to a single data subset. To illustrate the effectiveness of our approach, we provide case studies on a real urban dataset that contains the cyber-, physical-, and socialinformation of 14 million citizens over 22 days.

  14. Extracting Useful Semantic Information from Large Scale Corpora of Text

    ERIC Educational Resources Information Center

    Mendoza, Ray Padilla, Jr.

    2012-01-01

    Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…

  15. Joint spectral characterization of photon-pair sources

    NASA Astrophysics Data System (ADS)

    Zielnicki, Kevin; Garay-Palmett, Karina; Cruz-Delgado, Daniel; Cruz-Ramirez, Hector; O'Boyle, Michael F.; Fang, Bin; Lorenz, Virginia O.; U'Ren, Alfred B.; Kwiat, Paul G.

    2018-06-01

    The ability to determine the joint spectral properties of photon pairs produced by the processes of spontaneous parametric downconversion (SPDC) and spontaneous four-wave mixing (SFWM) is crucial for guaranteeing the usability of heralded single photons and polarization-entangled pairs for multi-photon protocols. In this paper, we compare six different techniques that yield either a characterization of the joint spectral intensity or of the closely related purity of heralded single photons. These six techniques include: (i) scanning monochromator measurements, (ii) a variant of Fourier transform spectroscopy designed to extract the desired information exploiting a resource-optimized technique, (iii) dispersive fibre spectroscopy, (iv) stimulated-emission-based measurement, (v) measurement of the second-order correlation function ? for one of the two photons, and (vi) two-source Hong-Ou-Mandel interferometry. We discuss the relative performance of these techniques for the specific cases of a SPDC source designed to be factorable and SFWM sources of varying purity, and compare the techniques' relative advantages and disadvantages.

  16. Real-time spectral characterization of a photon pair source using a chirped supercontinuum seed.

    PubMed

    Erskine, Jennifer; England, Duncan; Kupchak, Connor; Sussman, Benjamin

    2018-02-15

    Photon pair sources have wide ranging applications in a variety of quantum photonic experiments and protocols. Many of these protocols require well controlled spectral correlations between the two output photons. However, due to low cross-sections, measuring the joint spectral properties of photon pair sources has historically been a challenging and time-consuming task. Here, we present an approach for the real-time measurement of the joint spectral properties of a fiber-based four wave mixing source. We seed the four wave mixing process using a broadband chirped pulse, studying the stimulated process to extract information regarding the spontaneous process. In addition, we compare stimulated emission measurements with the spontaneous process to confirm the technique's validity. Joint spectral measurements have taken many hours historically and several minutes with recent techniques. Here, measurements have been demonstrated in 5-30 s depending on resolution, offering substantial improvement. Additional benefits of this approach include flexible resolution, large measurement bandwidth, and reduced experimental overhead.

  17. Bioactive Natural Products Prioritization Using Massive Multi-informational Molecular Networks.

    PubMed

    Olivon, Florent; Allard, Pierre-Marie; Koval, Alexey; Righi, Davide; Genta-Jouve, Gregory; Neyts, Johan; Apel, Cécile; Pannecouque, Christophe; Nothias, Louis-Félix; Cachet, Xavier; Marcourt, Laurence; Roussi, Fanny; Katanaev, Vladimir L; Touboul, David; Wolfender, Jean-Luc; Litaudon, Marc

    2017-10-20

    Natural products represent an inexhaustible source of novel therapeutic agents. Their complex and constrained three-dimensional structures endow these molecules with exceptional biological properties, thereby giving them a major role in drug discovery programs. However, the search for new bioactive metabolites is hampered by the chemical complexity of the biological matrices in which they are found. The purification of single constituents from such matrices requires such a significant amount of work that it should be ideally performed only on molecules of high potential value (i.e., chemical novelty and biological activity). Recent bioinformatics approaches based on mass spectrometry metabolite profiling methods are beginning to address the complex task of compound identification within complex mixtures. However, in parallel to these developments, methods providing information on the bioactivity potential of natural products prior to their isolation are still lacking and are of key interest to target the isolation of valuable natural products only. In the present investigation, we propose an integrated analysis strategy for bioactive natural products prioritization. Our approach uses massive molecular networks embedding various informational layers (bioactivity and taxonomical data) to highlight potentially bioactive scaffolds within the chemical diversity of crude extracts collections. We exemplify this workflow by targeting the isolation of predicted active and nonactive metabolites from two botanical sources (Bocquillonia nervosa and Neoguillauminia cleopatra) against two biological targets (Wnt signaling pathway and chikungunya virus replication). Eventually, the detection and isolation processes of a daphnane diterpene orthoester and four 12-deoxyphorbols inhibiting the Wnt signaling pathway and exhibiting potent antiviral activities against the CHIKV virus are detailed. Combined with efficient metabolite annotation tools, this bioactive natural products prioritization pipeline proves to be efficient. Implementation of this approach in drug discovery programs based on natural extract screening should speed up and rationalize the isolation of bioactive natural products.

  18. Interrogating trees as archives of sulphur deposition

    NASA Astrophysics Data System (ADS)

    Wynn, P. M.; Loader, N. J.; Fairchild, I. J.

    2012-04-01

    A principal driver of climatic variability over the past 1,000 years and essential forcing mechanism for climate, are the changes in atmospheric composition resulting from sulphur aerosols. Natural and anthropogenic aerosols released into the atmosphere disrupt the radiative balance through backscattering and absorption of incoming solar radiation and increase cloud albedo by acting as condensation nuclei. Understanding the impact of sulphur emissions upon climate beyond the last few hundred years however is not straightforward and natural archives of environmental information must be explored. Tree-rings represent one such archive as they are widely distributed and preserve environmental information within a precisely dateable, annually resolved timescale. Until recently the sulphur contained within tree-rings has largely remained beyond the reach of environmental scientists and climate modelers owing to difficulties associated with the extraction of a robust signal and uncertainties regarding post-depositional mobility. Our recent work using synchrotron radiation has established that the majority of non-labile sulphur in two conifer species is preserved within the cellular structure of the woody tissue after uptake and demonstrates an increasing trend in sulphur concentration during the 20th century and during known volcanic events. Due to the clear isotopic distinction between marine (+21), geological (+10 to +30), atmospheric pollution (-3 to +9 ) and volcanic sources of sulphur (0 to +5), isotopic ratios provide a diagnostic tool with which changes in the source of atmospheric sulphur can be detected in a more reliable fashion than concentration alone. Sulphur isotopes should thereby provide a fingerprint of short lived events including volcanic activity when extracted at high resolution and in conjunction with high resolution S concentrations defining the event. Here we present methodologies associated with extracting the sulphur isotopic signal from tree-rings using both elemental analyser isotope ratio mass spectrometry and ion probe technology. Preliminary data indicate success at extracting the sulphur isotopic signal from woody tissues at 2-3 year resolution. In conjunction with analytical developments in ion probe technology, high resolution records of localised sulphur forcing from tree-ring archives, including volcanic activity, no longer seem too far beyond the reach of climate scientists.

  19. Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources.

    PubMed

    Huang, Yingxiang; Lee, Junghye; Wang, Shuang; Sun, Jimeng; Liu, Hongfang; Jiang, Xiaoqian

    2018-05-16

    Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless. The aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources. We propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings. We used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling. Such aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care. ©Yingxiang Huang, Junghye Lee, Shuang Wang, Jimeng Sun, Hongfang Liu, Xiaoqian Jiang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 16.05.2018.

  20. Phase-and-amplitude recovery from a single phase-contrast image using partially spatially coherent x-ray radiation

    NASA Astrophysics Data System (ADS)

    Beltran, Mario A.; Paganin, David M.; Pelliccia, Daniele

    2018-05-01

    A simple method of phase-and-amplitude extraction is derived that corrects for image blurring induced by partially spatially coherent incident illumination using only a single intensity image as input. The method is based on Fresnel diffraction theory for the case of high Fresnel number, merged with the space-frequency description formalism used to quantify partially coherent fields and assumes the object under study is composed of a single-material. A priori knowledge of the object’s complex refractive index and information obtained by characterizing the spatial coherence of the source is required. The algorithm was applied to propagation-based phase-contrast data measured with a laboratory-based micro-focus x-ray source. The blurring due to the finite spatial extent of the source is embedded within the algorithm as a simple correction term to the so-called Paganin algorithm and is also numerically stable in the presence of noise.

Top