NASA Astrophysics Data System (ADS)
Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia
2018-05-01
Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.
Challenges in Managing Information Extraction
ERIC Educational Resources Information Center
Shen, Warren H.
2009-01-01
This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…
The Agent of extracting Internet Information with Lead Order
NASA Astrophysics Data System (ADS)
Mo, Zan; Huang, Chuliang; Liu, Aijun
In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.
Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction
ERIC Educational Resources Information Center
Sun, Chong
2012-01-01
More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…
Direct Estimation of Structure and Motion from Multiple Frames
1990-03-01
sequential frames in an image sequence. As a consequence, the information that can be extracted from a single optical flow field is limited to a snapshot of...researchers have developed techniques that extract motion and structure inform.4tion without computation of the optical flow. Best known are the "direct...operated iteratively on a sequence of images to recover structure. It required feature extraction and matching. Broida and Chellappa [9] suggested the use of
Building an automated SOAP classifier for emergency department reports.
Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W
2012-02-01
Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.
An information extraction framework for cohort identification using electronic health records.
Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G
2013-01-01
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.
Evaluation of Ultrasonic Fiber Structure Extraction Technique Using Autopsy Specimens of Liver
NASA Astrophysics Data System (ADS)
Yamaguchi, Tadashi; Hirai, Kazuki; Yamada, Hiroyuki; Ebara, Masaaki; Hachiya, Hiroyuki
2005-06-01
It is very important to diagnose liver cirrhosis noninvasively and correctly. In our previous studies, we proposed a processing technique to detect changes in liver tissue in vivo. In this paper, we propose the evaluation of the relationship between liver disease and echo information using autopsy specimens of a human liver in vitro. It is possible to verify the function of a processing parameter clearly and to compare the processing result and the actual human liver tissue structure by in vitro experiment. In the results of our processing technique, information that did not obey a Rayleigh distribution from the echo signal of the autopsy liver specimens was extracted depending on changes in a particular processing parameter. The fiber tissue structure of the same specimen was extracted from a number of histological images of stained tissue. We constructed 3D structures using the information extracted from the echo signal and the fiber structure of the stained tissue and compared the two. By comparing the 3D structures, it is possible to evaluate the relationship between the information that does not obey a Rayleigh distribution of the echo signal and the fibrosis structure.
An Information Extraction Framework for Cohort Identification Using Electronic Health Records
Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255
Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes
ERIC Educational Resources Information Center
Finch, Dezon Kile
2012-01-01
Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.
Dave, Jaydev K; Gingold, Eric L
2013-01-01
The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.
A Low-Storage-Consumption XML Labeling Method for Efficient Structural Information Extraction
NASA Astrophysics Data System (ADS)
Liang, Wenxin; Takahashi, Akihiro; Yokota, Haruo
Recently, labeling methods to extract and reconstruct the structural information of XML data, which are important for many applications such as XPath query and keyword search, are becoming more attractive. To achieve efficient structural information extraction, in this paper we propose C-DO-VLEI code, a novel update-friendly bit-vector encoding scheme, based on register-length bit operations combining with the properties of Dewey Order numbers, which cannot be implemented in other relevant existing schemes such as ORDPATH. Meanwhile, the proposed method also achieves lower storage consumption because it does not require either prefix schema or any reserved codes for node insertion. We performed experiments to evaluate and compare the performance and storage consumption of the proposed method with those of the ORDPATH method. Experimental results show that the execution times for extracting depth information and parent node labels using the C-DO-VLEI code are about 25% and 15% less, respectively, and the average label size using the C-DO-VLEI code is about 24% smaller, comparing with ORDPATH.
Schoeppe, Franziska; Sommer, Wieland H; Haack, Mareike; Havel, Miriam; Rheinwald, Marika; Wechtenbruch, Juliane; Fischer, Martin R; Meinel, Felix G; Sabel, Bastian O; Sommer, Nora N
2018-01-01
To compare free text (FTR) and structured reports (SR) of videofluoroscopic swallowing studies (VFSS) and evaluate satisfaction of referring otolaryngologists and speech therapists. Both standard FTR and SR of 26 patients with VFSS were acquired. A dedicated template focusing on oropharyngeal phases was created for SR using online software with clickable decision-trees and concomitant generation of semantically structured reports. All reports were evaluated regarding overall quality and content, information extraction and clinical decision support (10-point Likert scale (0 = I completely disagree, 10 = I completely agree)). Two otorhinolaryngologists and two speech therapists evaluated FTR and SR. SR received better ratings than FTR in all items. SR were perceived to contain more details on the swallowing phases (median rating: 10 vs. 5; P < 0.001), penetration and aspiration (10 vs. 5; P < 0.001) and facilitated information extraction compared to FTR (10 vs. 4; P < 0.001). Overall quality was rated significantly higher in SR than FTR (P < 0.001). SR of VFSS provide more detailed information and facilitate information extraction. SR better assist in clinical decision-making, might enhance the quality of the report and, thus, are recommended for the evaluation of VFSS. • Structured reports on videofluoroscopic exams of deglutition lead to improved report quality. • Information extraction is facilitated when using structured reports based on decision trees. • Template-based reports add more value to clinical decision-making than free text reports. • Structured reports receive better ratings by speech therapists and otolaryngologists. • Structured reports on videofluoroscopic exams may improve the comparability between exams.
Fernández, José M; Valencia, Alfonso
2004-10-12
Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.
Modeling ECM fiber formation: structure information extracted by analysis of 2D and 3D image sets
NASA Astrophysics Data System (ADS)
Wu, Jun; Voytik-Harbin, Sherry L.; Filmer, David L.; Hoffman, Christoph M.; Yuan, Bo; Chiang, Ching-Shoei; Sturgis, Jennis; Robinson, Joseph P.
2002-05-01
Recent evidence supports the notion that biological functions of extracellular matrix (ECM) are highly correlated to its structure. Understanding this fibrous structure is very crucial in tissue engineering to develop the next generation of biomaterials for restoration of tissues and organs. In this paper, we integrate confocal microscopy imaging and image-processing techniques to analyze the structural properties of ECM. We describe a 2D fiber middle-line tracing algorithm and apply it via Euclidean distance maps (EDM) to extract accurate fibrous structure information, such as fiber diameter, length, orientation, and density, from single slices. Based on a 2D tracing algorithm, we extend our analysis to 3D tracing via Euclidean distance maps to extract 3D fibrous structure information. We use computer simulation to construct the 3D fibrous structure which is subsequently used to test our tracing algorithms. After further image processing, these models are then applied to a variety of ECM constructions from which results of 2D and 3D traces are statistically analyzed.
Research on Crowdsourcing Emergency Information Extraction of Based on Events' Frame
NASA Astrophysics Data System (ADS)
Yang, Bo; Wang, Jizhou; Ma, Weijun; Mao, Xi
2018-01-01
At present, the common information extraction method cannot extract the structured emergency event information accurately; the general information retrieval tool cannot completely identify the emergency geographic information; these ways also do not have an accurate assessment of these results of distilling. So, this paper proposes an emergency information collection technology based on event framework. This technique is to solve the problem of emergency information picking. It mainly includes emergency information extraction model (EIEM), complete address recognition method (CARM) and the accuracy evaluation model of emergency information (AEMEI). EIEM can be structured to extract emergency information and complements the lack of network data acquisition in emergency mapping. CARM uses a hierarchical model and the shortest path algorithm and allows the toponomy pieces to be joined as a full address. AEMEI analyzes the results of the emergency event and summarizes the advantages and disadvantages of the event framework. Experiments show that event frame technology can solve the problem of emergency information drawing and provides reference cases for other applications. When the emergency disaster is about to occur, the relevant departments query emergency's data that has occurred in the past. They can make arrangements ahead of schedule which defense and reducing disaster. The technology decreases the number of casualties and property damage in the country and world. This is of great significance to the state and society.
HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records
Aggarwal, Anshul; Garhwal, Sunita
2018-01-01
Objectives One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. Methods A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. Results The HEDEA system is working, covering a large set of formats, to extract and analyse health information. Conclusions This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes. PMID:29770248
HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records.
Aggarwal, Anshul; Garhwal, Sunita; Kumar, Ajay
2018-04-01
One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. The HEDEA system is working, covering a large set of formats, to extract and analyse health information. This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.
Meta-Generalis: A Novel Method for Structuring Information from Radiology Reports
Barbosa, Flavio; Traina, Agma Jucci
2016-01-01
Summary Background A structured report for imaging exams aims at increasing the precision in information retrieval and communication between physicians. However, it is more concise than free text and may limit specialists’ descriptions of important findings not covered by pre-defined structures. A computational ontological structure derived from free texts designed by specialists may be a solution for this problem. Therefore, the goal of our study was to develop a methodology for structuring information in radiology reports covering specifications required for the Brazilian Portuguese language, including the terminology to be used. Methods We gathered 1,701 radiological reports of magnetic resonance imaging (MRI) studies of the lumbosacral spine from three different institutions. Techniques of text mining and ontological conceptualization of lexical units extracted were used to structure information. Ten radiologists, specialists in lumbosacral MRI, evaluated the textual superstructure and terminology extracted using an electronic questionnaire. Results The established methodology consists of six steps: 1) collection of radiology reports of a specific MRI examination; 2) textual decomposition; 3) normalization of lexical units; 4) identification of textual superstructures; 5) conceptualization of candidate-terms; and 6) evaluation of superstructures and extracted terminology by experts using an electronic questionnaire. Three different textual superstructures were identified, with terminological variations in the names of their textual categories. The number of candidate-terms conceptualized was 4,183, yielding 727 concepts. There were a total of 13,963 relationships between candidate-terms and concepts and 789 relationships among concepts. Conclusions The proposed methodology allowed structuring information in a more intuitive and practical way. Indications of three textual superstructures, extraction of lexicon units and the normalization and ontologically conceptualization were achieved while maintaining references to their respective categories and free text radiology reports. PMID:27580980
Meta-generalis: A novel method for structuring information from radiology reports.
Barbosa, Flavio; Traina, Agma Jucci; Muglia, Valdair Francisco
2016-08-24
A structured report for imaging exams aims at increasing the precision in information retrieval and communication between physicians. However, it is more concise than free text and may limit specialists' descriptions of important findings not covered by pre-defined structures. A computational ontological structure derived from free texts designed by specialists may be a solution for this problem. Therefore, the goal of our study was to develop a methodology for structuring information in radiology reports covering specifications required for the Brazilian Portuguese language, including the terminology to be used. We gathered 1,701 radiological reports of magnetic resonance imaging (MRI) studies of the lumbosacral spine from three different institutions. Techniques of text mining and ontological conceptualization of lexical units extracted were used to structure information. Ten radiologists, specialists in lumbosacral MRI, evaluated the textual superstructure and terminology extracted using an electronic questionnaire. The established methodology consists of six steps: 1) collection of radiology reports of a specific MRI examination; 2) textual decomposition; 3) normalization of lexical units; 4) identification of textual superstructures; 5) conceptualization of candidate-terms; and 6) evaluation of superstructures and extracted terminology by experts using an electronic questionnaire. Three different textual superstructures were identified, with terminological variations in the names of their textual categories. The number of candidate-terms conceptualized was 4,183, yielding 727 concepts. There were a total of 13,963 relationships between candidate-terms and concepts and 789 relationships among concepts. The proposed methodology allowed structuring information in a more intuitive and practical way. Indications of three textual superstructures, extraction of lexicon units and the normalization and ontologically conceptualization were achieved while maintaining references to their respective categories and free text radiology reports.
SAR matrices: automated extraction of information-rich SAR tables from large compound data sets.
Wassermann, Anne Mai; Haebel, Peter; Weskamp, Nils; Bajorath, Jürgen
2012-07-23
We introduce the SAR matrix data structure that is designed to elucidate SAR patterns produced by groups of structurally related active compounds, which are extracted from large data sets. SAR matrices are systematically generated and sorted on the basis of SAR information content. Matrix generation is computationally efficient and enables processing of large compound sets. The matrix format is reminiscent of SAR tables, and SAR patterns revealed by different categories of matrices are easily interpretable. The structural organization underlying matrix formation is more flexible than standard R-group decomposition schemes. Hence, the resulting matrices capture SAR information in a comprehensive manner.
The extraction and integration framework: a two-process account of statistical learning.
Thiessen, Erik D; Kronstein, Alexandra T; Hufnagle, Daniel G
2013-07-01
The term statistical learning in infancy research originally referred to sensitivity to transitional probabilities. Subsequent research has demonstrated that statistical learning contributes to infant development in a wide array of domains. The range of statistical learning phenomena necessitates a broader view of the processes underlying statistical learning. Learners are sensitive to a much wider range of statistical information than the conditional relations indexed by transitional probabilities, including distributional and cue-based statistics. We propose a novel framework that unifies learning about all of these kinds of statistical structure. From our perspective, learning about conditional relations outputs discrete representations (such as words). Integration across these discrete representations yields sensitivity to cues and distributional information. To achieve sensitivity to all of these kinds of statistical structure, our framework combines processes that extract segments of the input with processes that compare across these extracted items. In this framework, the items extracted from the input serve as exemplars in long-term memory. The similarity structure of those exemplars in long-term memory leads to the discovery of cues and categorical structure, which guides subsequent extraction. The extraction and integration framework provides a way to explain sensitivity to both conditional statistical structure (such as transitional probabilities) and distributional statistical structure (such as item frequency and variability), and also a framework for thinking about how these different aspects of statistical learning influence each other. 2013 APA, all rights reserved
Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S
2016-10-01
Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops
Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude
2011-01-01
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.
Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude
2011-07-01
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.
Knowledge representation and management: transforming textual information into useful knowledge.
Rassinoux, A-M
2010-01-01
To summarize current outstanding research in the field of knowledge representation and management. Synopsis of the articles selected for the IMIA Yearbook 2010. Four interesting papers, dealing with structured knowledge, have been selected for the section knowledge representation and management. Combining the newest techniques in computational linguistics and natural language processing with the latest methods in statistical data analysis, machine learning and text mining has proved to be efficient for turning unstructured textual information into meaningful knowledge. Three of the four selected papers for the section knowledge representation and management corroborate this approach and depict various experiments conducted to .extract meaningful knowledge from unstructured free texts such as extracting cancer disease characteristics from pathology reports, or extracting protein-protein interactions from biomedical papers, as well as extracting knowledge for the support of hypothesis generation in molecular biology from the Medline literature. Finally, the last paper addresses the level of formally representing and structuring information within clinical terminologies in order to render such information easily available and shareable among the health informatics community. Delivering common powerful tools able to automatically extract meaningful information from the huge amount of electronically unstructured free texts is an essential step towards promoting sharing and reusability across applications, domains, and institutions thus contributing to building capacities worldwide.
Zheng, Shuai; Jabbour, Salma K; O'Reilly, Shannon E; Lu, James J; Dong, Lihua; Ding, Lijuan; Xiao, Ying; Yue, Ning; Wang, Fusheng; Zou, Wei
2018-02-01
In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non-small cell lung cancer (NSCLC). We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies. ©Shuai Zheng, Salma K Jabbour, Shannon E O'Reilly, James J Lu, Lihua Dong, Lijuan Ding, Ying Xiao, Ning Yue, Fusheng Wang, Wei Zou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.02.2018.
Fine-grained information extraction from German transthoracic echocardiography reports.
Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank
2015-11-12
Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.
Text Mining for Protein Docking
Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.
2015-01-01
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466
DBpedia and the Live Extraction of Structured Data from Wikipedia
ERIC Educational Resources Information Center
Morsey, Mohamed; Lehmann, Jens; Auer, Soren; Stadler, Claus; Hellmann, Sebastian
2012-01-01
Purpose: DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live…
Analysis of Technique to Extract Data from the Web for Improved Performance
NASA Astrophysics Data System (ADS)
Gupta, Neena; Singh, Manish
2010-11-01
The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.
NASA Astrophysics Data System (ADS)
Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang
2005-04-01
Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.
Automated extraction of chemical structure information from digital raster images
Park, Jungkap; Rosania, Gus R; Shedden, Kerby A; Nguyen, Mandee; Lyu, Naesung; Saitou, Kazuhiro
2009-01-01
Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles. PMID:19196483
ECG Identification System Using Neural Network with Global and Local Features
ERIC Educational Resources Information Center
Tseng, Kuo-Kun; Lee, Dachao; Chen, Charles
2016-01-01
This paper proposes a human identification system via extracted electrocardiogram (ECG) signals. Two hierarchical classification structures based on global shape feature and local statistical feature is used to extract ECG signals. Global shape feature represents the outline information of ECG signals and local statistical feature extracts the…
Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei
2016-01-01
Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme. PMID:27362762
Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei
2016-01-01
Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme.
Words and possible words in early language acquisition.
Marchetto, Erika; Bonatti, Luca L
2013-11-01
In order to acquire language, infants must extract its building blocks-words-and master the rules governing their legal combinations from speech. These two problems are not independent, however: words also have internal structure. Thus, infants must extract two kinds of information from the same speech input. They must find the actual words of their language. Furthermore, they must identify its possible words, that is, the sequences of sounds that, being morphologically well formed, could be words. Here, we show that infants' sensitivity to possible words appears to be more primitive and fundamental than their ability to find actual words. We expose 12- and 18-month-old infants to an artificial language containing a conflict between statistically coherent and structurally coherent items. We show that 18-month-olds can extract possible words when the familiarization stream contains marks of segmentation, but cannot do so when the stream is continuous. Yet, they can find actual words from a continuous stream by computing statistical relationships among syllables. By contrast, 12-month-olds can find possible words when familiarized with a segmented stream, but seem unable to extract statistically coherent items from a continuous stream that contains minimal conflicts between statistical and structural information. These results suggest that sensitivity to word structure is in place earlier than the ability to analyze distributional information. The ability to compute nontrivial statistical relationships becomes fully effective relatively late in development, when infants have already acquired a considerable amount of linguistic knowledge. Thus, mechanisms for structure extraction that do not rely on extensive sampling of the input are likely to have a much larger role in language acquisition than general-purpose statistical abilities. Copyright © 2013. Published by Elsevier Inc.
The use of experimental structures to model protein dynamics.
Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L
2015-01-01
The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.
The Use of Experimental Structures to Model Protein Dynamics
Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.
2014-01-01
Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965
Support patient search on pathology reports with interactive online learning based data extraction.
Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng
2015-01-01
Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
Gstruct: a system for extracting schemas from GML documents
NASA Astrophysics Data System (ADS)
Chen, Hui; Zhu, Fubao; Guan, Jihong; Zhou, Shuigeng
2008-10-01
Geography Markup Language (GML) becomes the de facto standard for geographic information representation on the internet. GML schema provides a way to define the structure, content, and semantic of GML documents. It contains useful structural information of GML documents and plays an important role in storing, querying and analyzing GML data. However, GML schema is not mandatory, and it is common that a GML document contains no schema. In this paper, we present Gstruct, a tool for GML schema extraction. Gstruct finds the features in the input GML documents, identifies geometry datatypes as well as simple datatypes, then integrates all these features and eliminates improper components to output the optimal schema. Experiments demonstrate that Gstruct is effective in extracting semantically meaningful schemas from GML documents.
Extracting the information of coastline shape and its multiple representations
NASA Astrophysics Data System (ADS)
Liu, Ying; Li, Shujun; Tian, Zhen; Chen, Huirong
2007-06-01
According to studying the coastline, a new way of multiple representations is put forward in the paper. That is stimulating human thinking way when they generalized, building the appropriate math model and describing the coastline with graphics, extracting all kinds of the coastline shape information. The coastline automatic generalization will be finished based on the knowledge rules and arithmetic operators. Showing the information of coastline shape by building the curve Douglas binary tree, it can reveal the shape character of coastline not only microcosmically but also macroscopically. Extracting the information of coastline concludes the local characteristic point and its orientation, the curve structure and the topology trait. The curve structure can be divided the single curve and the curve cluster. By confirming the knowledge rules of the coastline generalization, the generalized scale and its shape parameter, the coastline automatic generalization model is established finally. The method of the multiple scale representation of coastline in this paper has some strong points. It is human's thinking mode and can keep the nature character of the curve prototype. The binary tree structure can control the coastline comparability, avoid the self-intersect phenomenon and hold the unanimous topology relationship.
NASA Astrophysics Data System (ADS)
Meng, Qier; Kitasaka, Takayuki; Oda, Masahiro; Mori, Kensaku
2017-03-01
Airway segmentation is an important step in analyzing chest CT volumes for computerized lung cancer detection, emphysema diagnosis, asthma diagnosis, and pre- and intra-operative bronchoscope navigation. However, obtaining an integrated 3-D airway tree structure from a CT volume is a quite challenging task. This paper presents a novel airway segmentation method based on intensity structure analysis and bronchi shape structure analysis in volume of interest (VOI). This method segments the bronchial regions by applying the cavity enhancement filter (CEF) to trace the bronchial tree structure from the trachea. It uses the CEF in each VOI to segment each branch and to predict the positions of VOIs which envelope the bronchial regions in next level. At the same time, a leakage detection is performed to avoid the leakage by analysing the pixel information and the shape information of airway candidate regions extracted in the VOI. Bronchial regions are finally obtained by unifying the extracted airway regions. The experiments results showed that the proposed method can extract most of the bronchial region in each VOI and led good results of the airway segmentation.
The methodology of semantic analysis for extracting physical effects
NASA Astrophysics Data System (ADS)
Fomenkova, M. A.; Kamaev, V. A.; Korobkin, D. M.; Fomenkov, S. A.
2017-01-01
The paper represents new methodology of semantic analysis for physical effects extracting. This methodology is based on the Tuzov ontology that formally describes the Russian language. In this paper, semantic patterns were described to extract structural physical information in the form of physical effects. A new algorithm of text analysis was described.
Extracting the Textual and Temporal Structure of Supercomputing Logs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, S; Singh, I; Chandra, A
2009-05-26
Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an onlinemore » clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.« less
Populating the Semantic Web by Macro-reading Internet Text
NASA Astrophysics Data System (ADS)
Mitchell, Tom M.; Betteridge, Justin; Carlson, Andrew; Hruschka, Estevam; Wang, Richard
A key question regarding the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web. We also describe preliminary results demonstrating that machine learning algorithms can learn to extract tens of thousands of facts to populate a diverse ontology, with imperfect but reasonably good accuracy.
Predicting nucleic acid binding interfaces from structural models of proteins
Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael
2011-01-01
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767
Zheng, Shuai; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A
2017-01-01
Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. PMID:28487265
Automatic information extraction from unstructured mammography reports using distributed semantics.
Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L
2018-02-01
To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.
Information Extraction from Unstructured Text for the Biodefense Knowledge Center
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samatova, N F; Park, B; Krishnamurthy, R
2005-04-29
The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces ofmore » our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.« less
SPECTRa-T: machine-based data extraction and semantic searching of chemistry e-theses.
Downing, Jim; Harvey, Matt J; Morgan, Peter B; Murray-Rust, Peter; Rzepa, Henry S; Stewart, Diana C; Tonge, Alan P; Townsend, Joe A
2010-02-22
The SPECTRa-T project has developed text-mining tools to extract named chemical entities (NCEs), such as chemical names and terms, and chemical objects (COs), e.g., experimental spectral assignments and physical chemistry properties, from electronic theses (e-theses). Although NCEs were readily identified within the two major document formats studied, only the use of structured documents enabled identification of chemical objects and their association with the relevant chemical entity (e.g., systematic chemical name). A corpus of theses was analyzed and it is shown that a high degree of semantic information can be extracted from structured documents. This integrated information has been deposited in a persistent Resource Description Framework (RDF) triple-store that allows users to conduct semantic searches. The strength and weaknesses of several document formats are reviewed.
Social network extraction based on Web: 3. the integrated superficial method
NASA Astrophysics Data System (ADS)
Nasution, M. K. M.; Sitompul, O. S.; Noah, S. A.
2018-03-01
The Web as a source of information has become part of the social behavior information. Although, by involving only the limitation of information disclosed by search engines in the form of: hit counts, snippets, and URL addresses of web pages, the integrated extraction method produces a social network not only trusted but enriched. Unintegrated extraction methods may produce social networks without explanation, resulting in poor supplemental information, or resulting in a social network of durmise laden, consequently unrepresentative social structures. The integrated superficial method in addition to generating the core social network, also generates an expanded network so as to reach the scope of relation clues, or number of edges computationally almost similar to n(n - 1)/2 for n social actors.
Classification of the Gabon SAR Mosaic Using a Wavelet Based Rule Classifier
NASA Technical Reports Server (NTRS)
Simard, Marc; Saatchi, Sasan; DeGrandi, Gianfranco
2000-01-01
A method is developed for semi-automated classification of SAR images of the tropical forest. Information is extracted using the wavelet transform (WT). The transform allows for extraction of structural information in the image as a function of scale. In order to classify the SAR image, a Desicion Tree Classifier is used. The method of pruning is used to optimize classification rate versus tree size. The results give explicit insight on the type of information useful for a given class.
Development of a full-text information retrieval system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keizo Oyama; AKira Miyazawa, Atsuhiro Takasu; Kouji Shibano
The authors have executed a project to realize a full-text information retrieval system. The system is designed to deal with a document database comprising full text of a large number of documents such as academic papers. The document structures are utilized in searching and extracting appropriate information. The concept of structure handling and the configuration of the system are described in this paper.
Phenomenological analysis of medical time series with regular and stochastic components
NASA Astrophysics Data System (ADS)
Timashev, Serge F.; Polyakov, Yuriy S.
2007-06-01
Flicker-Noise Spectroscopy (FNS), a general approach to the extraction and parameterization of resonant and stochastic components contained in medical time series, is presented. The basic idea of FNS is to treat the correlation links present in sequences of different irregularities, such as spikes, "jumps", and discontinuities in derivatives of different orders, on all levels of the spatiotemporal hierarchy of the system under study as main information carriers. The tools to extract and analyze the information are power spectra and difference moments (structural functions), which complement the information of each other. The structural function stochastic component is formed exclusively by "jumps" of the dynamic variable while the power spectrum stochastic component is formed by both spikes and "jumps" on every level of the hierarchy. The information "passport" characteristics that are determined by fitting the derived expressions to the experimental variations for the stochastic components of power spectra and structural functions are interpreted as the correlation times and parameters that describe the rate of "memory loss" on these correlation time intervals for different irregularities. The number of the extracted parameters is determined by the requirements of the problem under study. Application of this approach to the analysis of tremor velocity signals for a Parkinsonian patient is discussed.
Predicting nucleic acid binding interfaces from structural models of proteins.
Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael
2012-02-01
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.
Recognition techniques for extracting information from semistructured documents
NASA Astrophysics Data System (ADS)
Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna
2000-12-01
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
Comparison of Three Information Sources for Smoking Information in Electronic Health Records
Wang, Liwei; Ruan, Xiaoyang; Yang, Ping; Liu, Hongfang
2016-01-01
OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. PMID:27980387
Zheng, Shuai; Lu, James J; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A; Wang, Fusheng
2017-05-09
Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. ©Shuai Zheng, James J Lu, Nima Ghasemzadeh, Salim S Hayek, Arshed A Quyyumi, Fusheng Wang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 09.05.2017.
Research on improved edge extraction algorithm of rectangular piece
NASA Astrophysics Data System (ADS)
He, Yi-Bin; Zeng, Ya-Jun; Chen, Han-Xin; Xiao, San-Xia; Wang, Yan-Wei; Huang, Si-Yu
Traditional edge detection operators such as Prewitt operator, LOG operator and Canny operator, etc. cannot meet the requirements of the modern industrial measurement. This paper proposes a kind of image edge detection algorithm based on improved morphological gradient. It can be detect the image using structural elements, which deals with the characteristic information of the image directly. Choosing different shapes and sizes of structural elements to use together, the ideal image edge information can be detected. The experimental result shows that the algorithm can well extract image edge with noise, which is clearer, and has more detailed edges compared with the previous edge detection algorithm.
Considering context: reliable entity networks through contextual relationship extraction
NASA Astrophysics Data System (ADS)
David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.
2016-05-01
Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.
Information extraction from multi-institutional radiology reports.
Hassanpour, Saeed; Langlotz, Curtis P
2016-01-01
The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.
Jorge-Botana, Guillermo; Olmos, Ricardo; León, José Antonio
2009-11-01
There is currently a widespread interest in indexing and extracting taxonomic information from large text collections. An example is the automatic categorization of informally written medical or psychological diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as an heuristic tool for helping doctors. Vector space models have been successfully used to this end (Lee, Cimino, Zhu, Sable, Shanker, Ely & Yu, 2006; Pakhomov, Buntrock & Chute, 2006). In this study we use a computational model known as Latent Semantic Analysis (LSA) on a diagnostic corpus with the aim of retrieving definitions (in the form of lists of semantic neighbors) of common structures it contains (e.g. "storm phobia", "dog phobia") or less common structures that might be formed by logical combinations of categories and diagnostic symptoms (e.g. "gun personality" or "germ personality"). In the quest to bring definitions into line with the meaning of structures and make them in some way representative, various problems commonly arise while recovering content using vector space models. We propose some approaches which bypass these problems, such as Kintsch's (2001) predication algorithm and some corrections to the way lists of neighbors are obtained, which have already been tested on semantic spaces in a non-specific domain (Jorge-Botana, León, Olmos & Hassan-Montero, under review). The results support the idea that the predication algorithm may also be useful for extracting more precise meanings of certain structures from scientific corpora, and that the introduction of some corrections based on vector length may increases its efficiency on non-representative terms.
Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials
Federer, Callie; Yoo, Minjae
2016-01-01
Abstract Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov (https://clinicaltrials.gov/), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov. Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs. PMID:27631620
Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials.
Federer, Callie; Yoo, Minjae; Tan, Aik Choon
2016-12-01
Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov ( https://clinicaltrials.gov/ ), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov . Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs.
Rule Extracting based on MCG with its Application in Helicopter Power Train Fault Diagnosis
NASA Astrophysics Data System (ADS)
Wang, M.; Hu, N. Q.; Qin, G. J.
2011-07-01
In order to extract decision rules for fault diagnosis from incomplete historical test records for knowledge-based damage assessment of helicopter power train structure. A method that can directly extract the optimal generalized decision rules from incomplete information based on GrC was proposed. Based on semantic analysis of unknown attribute value, the granule was extended to handle incomplete information. Maximum characteristic granule (MCG) was defined based on characteristic relation, and MCG was used to construct the resolution function matrix. The optimal general decision rule was introduced, with the basic equivalent forms of propositional logic, the rules were extracted and reduction from incomplete information table. Combined with a fault diagnosis example of power train, the application approach of the method was present, and the validity of this method in knowledge acquisition was proved.
Extracting laboratory test information from biomedical text
Kang, Yanna Shen; Kayaalp, Mehmet
2013-01-01
Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058
Structuring and extracting knowledge for the support of hypothesis generation in molecular biology
Roos, Marco; Marshall, M Scott; Gibson, Andrew P; Schuemie, Martijn; Meij, Edgar; Katrenko, Sophia; van Hage, Willem Robert; Krommydas, Konstantinos; Adriaans, Pieter W
2009-01-01
Background Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation. PMID:19796406
Simultaneous parameter optimization of x-ray and neutron reflectivity data using genetic algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Surendra, E-mail: surendra@barc.gov.in; Basu, Saibal
2016-05-23
X-ray and neutron reflectivity are two non destructive techniques which provide a wealth of information on thickness, structure and interracial properties in nanometer length scale. Combination of X-ray and neutron reflectivity is well suited for obtaining physical parameters of nanostructured thin films and superlattices. Neutrons provide a different contrast between the elements than X-rays and are also sensitive to the magnetization depth profile in thin films and superlattices. The real space information is extracted by fitting a model for the structure of the thin film sample in reflectometry experiments. We have applied a Genetic Algorithms technique to extract depth dependentmore » structure and magnetic in thin film and multilayer systems by simultaneously fitting X-ray and neutron reflectivity data.« less
The selectivity and the ability to obtain structural information from detection schemes used in arsenic speciation research are growing analytical requirements driven by the growing number of arsenicalS extracted from natural products and the need to minimize misidentification in...
Ziatdinov, Maxim; Dyck, Ondrej; Maksov, Artem; Li, Xufan; Sang, Xiahan; Xiao, Kai; Unocic, Raymond R; Vasudevan, Rama; Jesse, Stephen; Kalinin, Sergei V
2017-12-26
Recent advances in scanning transmission electron and scanning probe microscopies have opened exciting opportunities in probing the materials structural parameters and various functional properties in real space with angstrom-level precision. This progress has been accompanied by an exponential increase in the size and quality of data sets produced by microscopic and spectroscopic experimental techniques. These developments necessitate adequate methods for extracting relevant physical and chemical information from the large data sets, for which a priori information on the structures of various atomic configurations and lattice defects is limited or absent. Here we demonstrate an application of deep neural networks to extract information from atomically resolved images including location of the atomic species and type of defects. We develop a "weakly supervised" approach that uses information on the coordinates of all atomic species in the image, extracted via a deep neural network, to identify a rich variety of defects that are not part of an initial training set. We further apply our approach to interpret complex atomic and defect transformation, including switching between different coordination of silicon dopants in graphene as a function of time, formation of peculiar silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of molecular "rotor". This deep learning-based approach resembles logic of a human operator, but can be scaled leading to significant shift in the way of extracting and analyzing information from raw experimental data.
[Study on infrared spectrum change of Ganoderma lucidum and its extracts].
Chen, Zao-Xin; Xu, Yong-Qun; Chen, Xiao-Kang; Huang, Dong-Lan; Lu, Wen-Guan
2013-05-01
From the determination of the infrared spectra of four substances (original ganoderma lucidum and ganoderma lucidum water extract, 95% ethanol extract and petroleum ether extract), it was found that the infrared spectrum can carry systematic chemical information and basically reflects the distribution of each component of the analyte. Ganoderma lucidum and its extracts can be distinguished according to the absorption peak area ratio of 3 416-3 279, 1 541 and 723 cm(-1) to 2 935-2 852 cm(-1). A method of calculating the information entropy of the sample set with Euclidean distance was proposed, the relationship between the information entropy and the amount of chemical information carried by the sample set was discussed, and the authors come to a conclusion that sample set of original ganoderma lucidum carry the most abundant chemical information. The infrared spectrum set of original ganoderma lucidum has better clustering effect on ganoderma atrum, Cyan ganoderma, ganoderma multiplicatum and ganoderma lucidum when making hierarchical cluster analysis of 4 sample set. The results show that infrared spectrum carries the chemical information of the material structure and closely relates to the chemical composition of the system. The higher the value of information entropy, the much richer the chemical information and the more the benefit for pattern recognition. This study has a guidance function to the construction of the sample set in pattern recognition.
Automated extraction of family history information from clinical notes.
Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S; Winden, Tamara J; Carter, Elizabeth W; Melton, Genevieve B
2014-01-01
Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication ("indicator phrases"), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications.
Automated Extraction of Family History Information from Clinical Notes
Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S.; Winden, Tamara J.; Carter, Elizabeth W.; Melton, Genevieve B.
2014-01-01
Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication (“indicator phrases”), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications. PMID:25954443
[Study on Information Extraction of Clinic Expert Information from Hospital Portals].
Zhang, Yuanpeng; Dong, Jiancheng; Qian, Danmin; Geng, Xingyun; Wu, Huiqun; Wang, Li
2015-12-01
Clinic expert information provides important references for residents in need of hospital care. Usually, such information is hidden in the deep web and cannot be directly indexed by search engines. To extract clinic expert information from the deep web, the first challenge is to make a judgment on forms. This paper proposes a novel method based on a domain model, which is a tree structure constructed by the attributes of search interfaces. With this model, search interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from the returned web pages indexed by search interfaces. To filter the noise information on a web page, a block importance model is proposed. The experiment results indicated that the domain model yielded a precision 10.83% higher than that of the rule-based method, whereas the block importance model yielded an F₁ measure 10.5% higher than that of the XPath method.
Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation
Audeh, Bissan; Beigbeder, Michel; Zimmermann, Antoine; Jaillon, Philippe; Bousquet, Cédric
2017-01-01
The extraction of information from social media is an essential yet complicated step for data analysis in multiple domains. In this paper, we present Vigi4Med Scraper, a generic open source framework for extracting structured data from web forums. Our framework is highly configurable; using a configuration file, the user can freely choose the data to extract from any web forum. The extracted data are anonymized and represented in a semantic structure using Resource Description Framework (RDF) graphs. This representation enables efficient manipulation by data analysis algorithms and allows the collected data to be directly linked to any existing semantic resource. To avoid server overload, an integrated proxy with caching functionality imposes a minimal delay between sequential requests. Vigi4Med Scraper represents the first step of Vigi4Med, a project to detect adverse drug reactions (ADRs) from social networks founded by the French drug safety agency Agence Nationale de Sécurité du Médicament (ANSM). Vigi4Med Scraper has successfully extracted greater than 200 gigabytes of data from the web forums of over 20 different websites. PMID:28122056
Extracting heading and temporal range from optic flow: Human performance issues
NASA Technical Reports Server (NTRS)
Kaiser, Mary K.; Perrone, John A.; Stone, Leland; Banks, Martin S.; Crowell, James A.
1993-01-01
Pilots are able to extract information about their vehicle motion and environmental structure from dynamic transformations in the out-the-window scene. In this presentation, we focus on the information in the optic flow which specifies vehicle heading and distance to objects in the environment, scaled to a temporal metric. In particular, we are concerned with modeling how the human operators extract the necessary information, and what factors impact their ability to utilize the critical information. In general, the psychophysical data suggest that the human visual system is fairly robust to degradations in the visual display, e.g., reduced contrast and resolution or restricted field of view. However, extraneous motion flow, i.e., introduced by sensor rotation, greatly compromises human performance. The implications of these models and data for enhanced/synthetic vision systems are discussed.
Natural Language Processing in Radiology: A Systematic Review.
Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A
2016-05-01
Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.
Event-based text mining for biology and functional genomics
Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.
2015-01-01
The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365
Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar
2017-01-01
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838
Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar
2017-01-01
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.
ERIC Educational Resources Information Center
Chen, Hsinchun
2003-01-01
Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)
Zhou, Li; Plasek, Joseph M; Mahoney, Lisa M; Karipineni, Neelima; Chang, Frank; Yan, Xuemin; Chang, Fenny; Dimaggio, Dana; Goldman, Debora S.; Rocha, Roberto A.
2011-01-01
Clinical information is often coded using different terminologies, and therefore is not interoperable. Our goal is to develop a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Evaluators manually reviewed 30 free-text and 10 structured outpatient clinical notes compared to MTERMS output. MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively for medication and temporal information. The local medication terminology had 83.0% coverage compared to RxNorm’s 98.0% coverage for free-text notes. 61.6% of mappings between the terminologies are exact match. Capture of duration was significantly improved (91.7% vs. 52.5%) from systems in the third i2b2 challenge. PMID:22195230
Statistical learning and language acquisition
Romberg, Alexa R.; Saffran, Jenny R.
2011-01-01
Human learners, including infants, are highly sensitive to structure in their environment. Statistical learning refers to the process of extracting this structure. A major question in language acquisition in the past few decades has been the extent to which infants use statistical learning mechanisms to acquire their native language. There have been many demonstrations showing infants’ ability to extract structures in linguistic input, such as the transitional probability between adjacent elements. This paper reviews current research on how statistical learning contributes to language acquisition. Current research is extending the initial findings of infants’ sensitivity to basic statistical information in many different directions, including investigating how infants represent regularities, learn about different levels of language, and integrate information across situations. These current directions emphasize studying statistical language learning in context: within language, within the infant learner, and within the environment as a whole. PMID:21666883
Scene text recognition in mobile applications by character descriptor and structure configuration.
Yi, Chucai; Tian, Yingli
2014-07-01
Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.
NASA Astrophysics Data System (ADS)
Liu, X.; Zhang, J. X.; Zhao, Z.; Ma, A. D.
2015-06-01
Synthetic aperture radar in the application of remote sensing technology is becoming more and more widely because of its all-time and all-weather operation, feature extraction research in high resolution SAR image has become a hot topic of concern. In particular, with the continuous improvement of airborne SAR image resolution, image texture information become more abundant. It's of great significance to classification and extraction. In this paper, a novel method for built-up areas extraction using both statistical and structural features is proposed according to the built-up texture features. First of all, statistical texture features and structural features are respectively extracted by classical method of gray level co-occurrence matrix and method of variogram function, and the direction information is considered in this process. Next, feature weights are calculated innovatively according to the Bhattacharyya distance. Then, all features are weighted fusion. At last, the fused image is classified with K-means classification method and the built-up areas are extracted after post classification process. The proposed method has been tested by domestic airborne P band polarization SAR images, at the same time, two groups of experiments based on the method of statistical texture and the method of structural texture were carried out respectively. On the basis of qualitative analysis, quantitative analysis based on the built-up area selected artificially is enforced, in the relatively simple experimentation area, detection rate is more than 90%, in the relatively complex experimentation area, detection rate is also higher than the other two methods. In the study-area, the results show that this method can effectively and accurately extract built-up areas in high resolution airborne SAR imagery.
KneeTex: an ontology-driven system for information extraction from MRI reports.
Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate
2015-01-01
In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance. KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.
Rare tradition of the folk medicinal use of Aconitum spp. is kept alive in Solčavsko, Slovenia.
Povšnar, Marija; Koželj, Gordana; Kreft, Samo; Lumpert, Mateja
2017-08-08
Aconitum species are poisonous plants that have been used in Western medicine for centuries. In the nineteenth century, these plants were part of official and folk medicine in the Slovenian territory. According to current ethnobotanical studies, folk use of Aconitum species is rarely reported in Europe. The purpose of this study was to research the folk medicinal use of Aconitum species in Solčavsko, Slovenia; to collect recipes for the preparation of Aconitum spp., indications for use, and dosing; and to investigate whether the folk use of aconite was connected to poisoning incidents. In Solčavsko, a remote alpine area in northern Slovenia, we performed semi-structured interviews with 19 informants in Solčavsko, 3 informants in Luče, and two retired physicians who worked in that area. Three samples of homemade ethanolic extracts were obtained from informants, and the concentration of aconitine was measured. In addition, four extracts were prepared according to reported recipes. All 22 informants knew of Aconitum spp. and their therapeutic use, and 5 of them provided a detailed description of the preparation and use of "voukuc", an ethanolic extract made from aconite roots. Seven informants were unable to describe the preparation in detail, since they knew of the extract only from the narration of others or they remembered it from childhood. Most likely, the roots of Aconitum tauricum and Aconitum napellus were used for the preparation of the extract, and the solvent was homemade spirits. Four informants kept the extract at home; two extracts were prepared recently (1998 and 2015). Three extracts were analyzed, and 2 contained aconitine. Informants reported many indications for the use of the extract; it was used internally and, in some cases, externally as well. The extract was also used in animals. The extract was measured in drops, but the number of drops differed among the informants. The informants reported nine poisonings with Aconitum spp., but none of them occurred as a result of medicinal use of the extract. In this study, we determined that folk knowledge of the medicinal use of Aconitum spp. is still present in Solčavsko, but Aconitum preparations are used only infrequently.
Smart Extraction and Analysis System for Clinical Research.
Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung
2017-05-01
With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.
Clinic expert information extraction based on domain model and block importance model.
Zhang, Yuanpeng; Wang, Li; Qian, Danmin; Geng, Xingyun; Yao, Dengfu; Dong, Jiancheng
2015-11-01
To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method. Copyright © 2015 Elsevier Ltd. All rights reserved.
System for definition of the central-chest vasculature
NASA Astrophysics Data System (ADS)
Taeprasartsit, Pinyo; Higgins, William E.
2009-02-01
Accurate definition of the central-chest vasculature from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. For instance, the aorta and pulmonary artery help in automatic definition of the Mountain lymph-node stations for lung-cancer staging. This work presents a system for defining major vascular structures in the central chest. The system provides automatic methods for extracting the aorta and pulmonary artery and semi-automatic methods for extracting the other major central chest arteries/veins, such as the superior vena cava and azygos vein. Automatic aorta and pulmonary artery extraction are performed by model fitting and selection. The system also extracts certain vascular structure information to validate outputs. A semi-automatic method extracts vasculature by finding the medial axes between provided important sites. Results of the system are applied to lymph-node station definition and guidance of bronchoscopic biopsy.
Quantification of network structural dissimilarities.
Schieber, Tiago A; Carpi, Laura; Díaz-Guilera, Albert; Pardalos, Panos M; Masoller, Cristina; Ravetti, Martín G
2017-01-09
Identifying and quantifying dissimilarities among graphs is a fundamental and challenging problem of practical importance in many fields of science. Current methods of network comparison are limited to extract only partial information or are computationally very demanding. Here we propose an efficient and precise measure for network comparison, which is based on quantifying differences among distance probability distributions extracted from the networks. Extensive experiments on synthetic and real-world networks show that this measure returns non-zero values only when the graphs are non-isomorphic. Most importantly, the measure proposed here can identify and quantify structural topological differences that have a practical impact on the information flow through the network, such as the presence or absence of critical links that connect or disconnect connected components.
Ziatdinov, Maxim; Dyck, Ondrej; Maksov, Artem; ...
2017-12-07
Recent advances in scanning transmission electron and scanning probe microscopies have opened unprecedented opportunities in probing the materials structural parameters and various functional properties in real space with an angstrom-level precision. This progress has been accompanied by exponential increase in the size and quality of datasets produced by microscopic and spectroscopic experimental techniques. These developments necessitate adequate methods for extracting relevant physical and chemical information from the large datasets, for which a priori information on the structures of various atomic configurations and lattice defects is limited or absent. Here we demonstrate an application of deep neural networks to extracting informationmore » from atomically resolved images including location of the atomic species and type of defects. We develop a “weakly-supervised” approach that uses information on the coordinates of all atomic species in the image, extracted via a deep neural network, to identify a rich variety of defects that are not part of an initial training set. We further apply our approach to interpret complex atomic and defect transformation, including switching between different coordination of silicon dopants in graphene as a function of time, formation of peculiar silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of molecular “rotor”. In conclusion, this deep learning based approach resembles logic of a human operator, but can be scaled leading to significant shift in the way of extracting and analyzing information from raw experimental data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ziatdinov, Maxim; Dyck, Ondrej; Maksov, Artem
Recent advances in scanning transmission electron and scanning probe microscopies have opened unprecedented opportunities in probing the materials structural parameters and various functional properties in real space with an angstrom-level precision. This progress has been accompanied by exponential increase in the size and quality of datasets produced by microscopic and spectroscopic experimental techniques. These developments necessitate adequate methods for extracting relevant physical and chemical information from the large datasets, for which a priori information on the structures of various atomic configurations and lattice defects is limited or absent. Here we demonstrate an application of deep neural networks to extracting informationmore » from atomically resolved images including location of the atomic species and type of defects. We develop a “weakly-supervised” approach that uses information on the coordinates of all atomic species in the image, extracted via a deep neural network, to identify a rich variety of defects that are not part of an initial training set. We further apply our approach to interpret complex atomic and defect transformation, including switching between different coordination of silicon dopants in graphene as a function of time, formation of peculiar silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of molecular “rotor”. In conclusion, this deep learning based approach resembles logic of a human operator, but can be scaled leading to significant shift in the way of extracting and analyzing information from raw experimental data.« less
TRENCADIS--a WSRF grid MiddleWare for managing DICOM structured reporting objects.
Blanquer, Ignacio; Hernandez, Vicente; Segrelles, Damià
2006-01-01
The adoption of the digital processing of medical data, especially on radiology, has leaded to the availability of millions of records (images and reports). However, this information is mainly used at patient level, being the extraction of information, organised according to administrative criteria, which make the extraction of knowledge difficult. Moreover, legal constraints make the direct integration of information systems complex or even impossible. On the other side, the widespread of the DICOM format has leaded to the inclusion of other information different from just radiological images. The possibility of coding radiology reports in a structured form, adding semantic information about the data contained in the DICOM objects, eases the process of structuring images according to content. DICOM Structured Reporting (DICOM-SR) is a specification of tags and sections to code and integrate radiology reports, with seamless references to findings and regions of interests of the associated images, movies, waveforms, signals, etc. The work presented in this paper aims at developing of a framework to efficiently and securely share medical images and radiology reports, as well as to provide high throughput processing services. This system is based on a previously developed architecture in the framework of the TRENCADIS project, and uses other components such as the security system and the Grid processing service developed in previous activities. The work presented here introduces a semantic structuring and an ontology framework, to organise medical images considering standard terminology and disease coding formats (SNOMED, ICD9, LOINC..).
Extracting information from multiplex networks
NASA Astrophysics Data System (ADS)
Iacovacci, Jacopo; Bianconi, Ginestra
2016-06-01
Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ ˜ S for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.
Geologic and mineral and water resources investigations in western Colorado using ERTS-1 data
NASA Technical Reports Server (NTRS)
Knepper, D. H. (Principal Investigator)
1974-01-01
The author has identified the following significant results. Most of the geologic information in ERTS-1 imagery can be extracted from bulk processed black and white transparencies by a skilled interpreter using standard photogeologic techniques. In central and western Colorado, the detectability of lithologic contacts on ERTS-1 imagery is closely related to the time of year the imagery was acquired. Geologic structures are the most readily extractable type of geologic information contained in ERTS images. Major tectonic features and associated minor structures can be rapidly mapped, allowing the geologic setting of a large region to be quickly accessed. Trends of geologic structures in younger sedimentary appear to strongly parallel linear trends in older metamorphic and igneous basement terrain. Linears and color anomalies mapped from ERTS imagery are closely related to loci of known mineralization in the Colorado mineral belt.
Predicate Argument Structure Frames for Modeling Information in Operative Notes
Wang, Yan; Pakhomov, Serguei; Melton, Genevieve B.
2015-01-01
The rich information about surgical procedures contained in operative notes is a valuable data source for improving the clinical evidence base and clinical research. In this study, we propose a set of Predicate Argument Structure (PAS) frames for surgical action verbs to assist in the creation of an information extraction (IE) system to automatically extract details about the techniques, equipment, and operative steps from operative notes. We created PropBank style PAS frames for the 30 top surgical action verbs based on examination of randomly selected sample sentences from 3,000 Laparoscopic Cholecystectomy notes. To assess completeness of the PAS frames to represent usage of same action verbs, we evaluated the PAS frames created on sample sentences from operative notes of 6 other gastrointestinal surgical procedures. Our results showed that the PAS frames created with one type of surgery can successfully denote the usage of the same verbs in operative notes of broader surgical categories. PMID:23920664
Nuclear surface diffuseness revealed in nucleon-nucleus diffraction
NASA Astrophysics Data System (ADS)
Hatakeyama, S.; Horiuchi, W.; Kohama, A.
2018-05-01
The nuclear surface provides useful information on nuclear radius, nuclear structure, as well as properties of nuclear matter. We discuss the relationship between the nuclear surface diffuseness and elastic scattering differential cross section at the first diffraction peak of high-energy nucleon-nucleus scattering as an efficient tool in order to extract the nuclear surface information from limited experimental data involving short-lived unstable nuclei. The high-energy reaction is described by a reliable microscopic reaction theory, the Glauber model. Extending the idea of the black sphere model, we find one-to-one correspondence between the nuclear bulk structure information and proton-nucleus elastic scattering diffraction peak. This implies that we can extract both the nuclear radius and diffuseness simultaneously, using the position of the first diffraction peak and its magnitude of the elastic scattering differential cross section. We confirm the reliability of this approach by using realistic density distributions obtained by a mean-field model.
Bayesian learning of visual chunks by human observers
Orbán, Gergő; Fiser, József; Aslin, Richard N.; Lengyel, Máté
2008-01-01
Efficient and versatile processing of any hierarchically structured information requires a learning mechanism that combines lower-level features into higher-level chunks. We investigated this chunking mechanism in humans with a visual pattern-learning paradigm. We developed an ideal learner based on Bayesian model comparison that extracts and stores only those chunks of information that are minimally sufficient to encode a set of visual scenes. Our ideal Bayesian chunk learner not only reproduced the results of a large set of previous empirical findings in the domain of human pattern learning but also made a key prediction that we confirmed experimentally. In accordance with Bayesian learning but contrary to associative learning, human performance was well above chance when pair-wise statistics in the exemplars contained no relevant information. Thus, humans extract chunks from complex visual patterns by generating accurate yet economical representations and not by encoding the full correlational structure of the input. PMID:18268353
Extraction of urban vegetation with Pleiades multiangular images
NASA Astrophysics Data System (ADS)
Lefebvre, Antoine; Nabucet, Jean; Corpetti, Thomas; Courty, Nicolas; Hubert-Moy, Laurence
2016-10-01
Vegetation is essential in urban environments since it provides significant services in terms of health, heat, property value, ecology ... As part of the European Union Biodiversity Strategy Plan for 2020, the protection and development of green-infrastructures is strengthened in urban areas. In order to evaluate and monitor the quality of the green infra-structures, this article investigates contributions of Pléiades multi-angular images to extract and characterize low and high urban vegetation. From such images one can extract both spectral and elevation information from optical images. Our method is composed of 3 main steps : (1) the computation of a normalized Digital Surface Model from the multi-angular images ; (2) Extraction of spectral and contextual features ; (3) a classification of vegetation classes (tree and grass) performed with a random forest classifier. Results performed in the city of Rennes in France show the ability of multi-angular images to extract DEM in urban area despite building height. It also highlights its importance and its complementarity with contextual information to extract urban vegetation.
[Construction of chemical information database based on optical structure recognition technique].
Lv, C Y; Li, M N; Zhang, L R; Liu, Z M
2018-04-18
To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically. Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database. A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records. This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.
MULTISCALE TENSOR ANISOTROPIC FILTERING OF FLUORESCENCE MICROSCOPY FOR DENOISING MICROVASCULATURE.
Prasath, V B S; Pelapur, R; Glinskii, O V; Glinsky, V V; Huxley, V H; Palaniappan, K
2015-04-01
Fluorescence microscopy images are contaminated by noise and improving image quality without blurring vascular structures by filtering is an important step in automatic image analysis. The application of interest here is to automatically extract the structural components of the microvascular system with accuracy from images acquired by fluorescence microscopy. A robust denoising process is necessary in order to extract accurate vascular morphology information. For this purpose, we propose a multiscale tensor with anisotropic diffusion model which progressively and adaptively updates the amount of smoothing while preserving vessel boundaries accurately. Based on a coherency enhancing flow with planar confidence measure and fused 3D structure information, our method integrates multiple scales for microvasculature preservation and noise removal membrane structures. Experimental results on simulated synthetic images and epifluorescence images show the advantage of our improvement over other related diffusion filters. We further show that the proposed multiscale integration approach improves denoising accuracy of different tensor diffusion methods to obtain better microvasculature segmentation.
Carroll, John A; Smith, Helen E; Scott, Donia; Cassell, Jackie A
2016-01-01
Background Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall). PMID:26911811
FEX: A Knowledge-Based System For Planimetric Feature Extraction
NASA Astrophysics Data System (ADS)
Zelek, John S.
1988-10-01
Topographical planimetric features include natural surfaces (rivers, lakes) and man-made surfaces (roads, railways, bridges). In conventional planimetric feature extraction, a photointerpreter manually interprets and extracts features from imagery on a stereoplotter. Visual planimetric feature extraction is a very labour intensive operation. The advantages of automating feature extraction include: time and labour savings; accuracy improvements; and planimetric data consistency. FEX (Feature EXtraction) combines techniques from image processing, remote sensing and artificial intelligence for automatic feature extraction. The feature extraction process co-ordinates the information and knowledge in a hierarchical data structure. The system simulates the reasoning of a photointerpreter in determining the planimetric features. Present efforts have concentrated on the extraction of road-like features in SPOT imagery. Keywords: Remote Sensing, Artificial Intelligence (AI), SPOT, image understanding, knowledge base, apars.
Structural health monitoring feature design by genetic programming
NASA Astrophysics Data System (ADS)
Harvey, Dustin Y.; Todd, Michael D.
2014-09-01
Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and other high-capital or life-safety critical structures. Conventional data processing involves pre-processing and extraction of low-dimensional features from in situ time series measurements. The features are then input to a statistical pattern recognition algorithm to perform the relevant classification or regression task necessary to facilitate decisions by the SHM system. Traditional design of signal processing and feature extraction algorithms can be an expensive and time-consuming process requiring extensive system knowledge and domain expertise. Genetic programming, a heuristic program search method from evolutionary computation, was recently adapted by the authors to perform automated, data-driven design of signal processing and feature extraction algorithms for statistical pattern recognition applications. The proposed method, called Autofead, is particularly suitable to handle the challenges inherent in algorithm design for SHM problems where the manifestation of damage in structural response measurements is often unclear or unknown. Autofead mines a training database of response measurements to discover information-rich features specific to the problem at hand. This study provides experimental validation on three SHM applications including ultrasonic damage detection, bearing damage classification for rotating machinery, and vibration-based structural health monitoring. Performance comparisons with common feature choices for each problem area are provided demonstrating the versatility of Autofead to produce significant algorithm improvements on a wide range of problems.
A semi-automatic method for extracting thin line structures in images as rooted tree network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brazzini, Jacopo; Dillard, Scott; Soille, Pierre
2010-01-01
This paper addresses the problem of semi-automatic extraction of line networks in digital images - e.g., road or hydrographic networks in satellite images, blood vessels in medical images, robust. For that purpose, we improve a generic method derived from morphological and hydrological concepts and consisting in minimum cost path estimation and flow simulation. While this approach fully exploits the local contrast and shape of the network, as well as its arborescent nature, we further incorporate local directional information about the structures in the image. Namely, an appropriate anisotropic metric is designed by using both the characteristic features of the targetmore » network and the eigen-decomposition of the gradient structure tensor of the image. Following, the geodesic propagation from a given seed with this metric is combined with hydrological operators for overland flow simulation to extract the line network. The algorithm is demonstrated for the extraction of blood vessels in a retina image and of a river network in a satellite image.« less
Dal Palù, Alessandro; Pontelli, Enrico; He, Jing; Lu, Yonggang
2007-01-01
The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpensive alternative to determine the exact tertiary structure of unknown proteins.
Characterization of Structural and Configurational Properties of DNA by Atomic Force Microscopy.
Meroni, Alice; Lazzaro, Federico; Muzi-Falconi, Marco; Podestà, Alessandro
2018-01-01
We describe a method to extract quantitative information on DNA structural and configurational properties from high-resolution topographic maps recorded by atomic force microscopy (AFM). DNA molecules are deposited on mica surfaces from an aqueous solution, carefully dehydrated, and imaged in air in Tapping Mode. Upon extraction of the spatial coordinates of the DNA backbones from AFM images, several parameters characterizing DNA structure and configuration can be calculated. Here, we explain how to obtain the distribution of contour lengths, end-to-end distances, and gyration radii. This modular protocol can be also used to characterize other statistical parameters from AFM topographies.
Multi-scale statistical analysis of coronal solar activity
Gamborino, Diana; del-Castillo-Negrete, Diego; Martinell, Julio J.
2016-07-08
Multi-filter images from the solar corona are used to obtain temperature maps that are analyzed using techniques based on proper orthogonal decomposition (POD) in order to extract dynamical and structural information at various scales. Exploring active regions before and after a solar flare and comparing them with quiet regions, we show that the multi-scale behavior presents distinct statistical properties for each case that can be used to characterize the level of activity in a region. Information about the nature of heat transport is also to be extracted from the analysis.
Structural analysis of pyrolytic lignins isolated from switchgrass fast pyrolysis oil
USDA-ARS?s Scientific Manuscript database
Structural characterization of lignin extracted from the bio-oil produced by fast pyrolysis of switchgrass (Panicum virgatum) is reported. This new information is important to understanding the utility of lignin as a chemical feedstock in a pyrolysis based biorefinery. Pyrolysis induces a variety of...
A Semantic Graph Query Language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaplan, I L
2006-10-16
Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.
Tool Wear Feature Extraction Based on Hilbert Marginal Spectrum
NASA Astrophysics Data System (ADS)
Guan, Shan; Song, Weijie; Pang, Hongyang
2017-09-01
In the metal cutting process, the signal contains a wealth of tool wear state information. A tool wear signal’s analysis and feature extraction method based on Hilbert marginal spectrum is proposed. Firstly, the tool wear signal was decomposed by empirical mode decomposition algorithm and the intrinsic mode functions including the main information were screened out by the correlation coefficient and the variance contribution rate. Secondly, Hilbert transform was performed on the main intrinsic mode functions. Hilbert time-frequency spectrum and Hilbert marginal spectrum were obtained by Hilbert transform. Finally, Amplitude domain indexes were extracted on the basis of the Hilbert marginal spectrum and they structured recognition feature vector of tool wear state. The research results show that the extracted features can effectively characterize the different wear state of the tool, which provides a basis for monitoring tool wear condition.
Hollister, Brittany M; Restrepo, Nicole A; Farber-Eger, Eric; Crawford, Dana C; Aldrich, Melinda C; Non, Amy
2017-01-01
Socioeconomic status (SES) is a fundamental contributor to health, and a key factor underlying racial disparities in disease. However, SES data are rarely included in genetic studies due in part to the difficultly of collecting these data when studies were not originally designed for that purpose. The emergence of large clinic-based biobanks linked to electronic health records (EHRs) provides research access to large patient populations with longitudinal phenotype data captured in structured fields as billing codes, procedure codes, and prescriptions. SES data however, are often not explicitly recorded in structured fields, but rather recorded in the free text of clinical notes and communications. The content and completeness of these data vary widely by practitioner. To enable gene-environment studies that consider SES as an exposure, we sought to extract SES variables from racial/ethnic minority adult patients (n=9,977) in BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified EHRs. We developed several measures of SES using information available within the de-identified EHR, including broad categories of occupation, education, insurance status, and homelessness. Two hundred patients were randomly selected for manual review to develop a set of seven algorithms for extracting SES information from de-identified EHRs. The algorithms consist of 15 categories of information, with 830 unique search terms. SES data extracted from manual review of 50 randomly selected records were compared to data produced by the algorithm, resulting in positive predictive values of 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), and 33.3% (homelessness), suggesting some categories of SES data are easier to extract in this EHR than others. The SES data extraction approach developed here will enable future EHR-based genetic studies to integrate SES information into statistical analyses. Ultimately, incorporation of measures of SES into genetic studies will help elucidate the impact of the social environment on disease risk and outcomes.
The future of structural fieldwork - UAV assisted aerial photogrammetry
NASA Astrophysics Data System (ADS)
Vollgger, Stefan; Cruden, Alexander
2015-04-01
Unmanned aerial vehicles (UAVs), commonly referred to as drones, are opening new and low cost possibilities to acquire high-resolution aerial images and digital surface models (DSM) for applications in structural geology. UAVs can be programmed to fly autonomously along a user defined grid to systematically capture high-resolution photographs, even in difficult to access areas. The photographs are subsequently processed using software that employ SIFT (scale invariant feature transform) and SFM (structure from motion) algorithms. These photogrammetric routines allow the extraction of spatial information (3D point clouds, digital elevation models, 3D meshes, orthophotos) from 2D images. Depending on flight altitude and camera setup, sub-centimeter spatial resolutions can be achieved. By "digitally mapping" georeferenced 3D models and images, orientation data can be extracted directly and used to analyse the structural framework of the mapped object or area. We present UAV assisted aerial mapping results from a coastal platform near Cape Liptrap (Victoria, Australia), where deformed metasediments of the Palaeozoic Lachlan Fold Belt are exposed. We also show how orientation and spatial information of brittle and ductile structures extracted from the photogrammetric model can be linked to the progressive development of folds and faults in the region. Even though there are both technical and legislative limitations, which might prohibit the use of UAVs without prior commercial licensing and training, the benefits that arise from the resulting high-resolution, photorealistic models can substantially contribute to the collection of new data and insights for applications in structural geology.
PDF text classification to leverage information extraction from publication reports.
Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha
2016-06-01
Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
Bearing diagnostics: A method based on differential geometry
NASA Astrophysics Data System (ADS)
Tian, Ye; Wang, Zili; Lu, Chen; Wang, Zhipeng
2016-12-01
The structures around bearings are complex, and the working environment is variable. These conditions cause the collected vibration signals to become nonlinear, non-stationary, and chaotic characteristics that make noise reduction, feature extraction, fault diagnosis, and health assessment significantly challenging. Thus, a set of differential geometry-based methods with superiorities in nonlinear analysis is presented in this study. For noise reduction, the Local Projection method is modified by both selecting the neighborhood radius based on empirical mode decomposition and determining noise subspace constrained by neighborhood distribution information. For feature extraction, Hessian locally linear embedding is introduced to acquire manifold features from the manifold topological structures, and singular values of eigenmatrices as well as several specific frequency amplitudes in spectrograms are extracted subsequently to reduce the complexity of the manifold features. For fault diagnosis, information geometry-based support vector machine is applied to classify the fault states. For health assessment, the manifold distance is employed to represent the health information; the Gaussian mixture model is utilized to calculate the confidence values, which directly reflect the health status. Case studies on Lorenz signals and vibration datasets of bearings demonstrate the effectiveness of the proposed methods.
Laser-based structural sensing and surface damage detection
NASA Astrophysics Data System (ADS)
Guldur, Burcu
Damage due to age or accumulated damage from hazards on existing structures poses a worldwide problem. In order to evaluate the current status of aging, deteriorating and damaged structures, it is vital to accurately assess the present conditions. It is possible to capture the in situ condition of structures by using laser scanners that create dense three-dimensional point clouds. This research investigates the use of high resolution three-dimensional terrestrial laser scanners with image capturing abilities as tools to capture geometric range data of complex scenes for structural engineering applications. Laser scanning technology is continuously improving, with commonly available scanners now capturing over 1,000,000 texture-mapped points per second with an accuracy of ~2 mm. However, automatically extracting meaningful information from point clouds remains a challenge, and the current state-of-the-art requires significant user interaction. The first objective of this research is to use widely accepted point cloud processing steps such as registration, feature extraction, segmentation, surface fitting and object detection to divide laser scanner data into meaningful object clusters and then apply several damage detection methods to these clusters. This required establishing a process for extracting important information from raw laser-scanned data sets such as the location, orientation and size of objects in a scanned region, and location of damaged regions on a structure. For this purpose, first a methodology for processing range data to identify objects in a scene is presented and then, once the objects from model library are correctly detected and fitted into the captured point cloud, these fitted objects are compared with the as-is point cloud of the investigated object to locate defects on the structure. The algorithms are demonstrated on synthetic scenes and validated on range data collected from test specimens and test-bed bridges. The second objective of this research is to combine useful information extracted from laser scanner data with color information, which provides information in the fourth dimension that enables detection of damage types such as cracks, corrosion, and related surface defects that are generally difficult to detect using only laser scanner data; moreover, the color information also helps to track volumetric changes on structures such as spalling. Although using images with varying resolution to detect cracks is an extensively researched topic, damage detection using laser scanners with and without color images is a new research area that holds many opportunities for enhancing the current practice of visual inspections. The aim is to combine the best features of laser scans and images to create an automatic and effective surface damage detection method, which will reduce the need for skilled labor during visual inspections and allow automatic documentation of related information. This work enables developing surface damage detection strategies that integrate existing condition rating criteria for a wide range damage types that are collected under three main categories: small deformations already existing on the structure (cracks); damage types that induce larger deformations, but where the initial topology of the structure has not changed appreciably (e.g., bent members); and large deformations where localized changes in the topology of the structure have occurred (e.g., rupture, discontinuities and spalling). The effectiveness of the developed damage detection algorithms are validated by comparing the detection results with the measurements taken from test specimens and test-bed bridges.
Despeckling Polsar Images Based on Relative Total Variation Model
NASA Astrophysics Data System (ADS)
Jiang, C.; He, X. F.; Yang, L. J.; Jiang, J.; Wang, D. Y.; Yuan, Y.
2018-04-01
Relatively total variation (RTV) algorithm, which can effectively decompose structure information and texture in image, is employed in extracting main structures of the image. However, applying the RTV directly to polarimetric SAR (PolSAR) image filtering will not preserve polarimetric information. A new RTV approach based on the complex Wishart distribution is proposed considering the polarimetric properties of PolSAR. The proposed polarization RTV (PolRTV) algorithm can be used for PolSAR image filtering. The L-band Airborne SAR (AIRSAR) San Francisco data is used to demonstrate the effectiveness of the proposed algorithm in speckle suppression, structural information preservation, and polarimetric property preservation.
Uncovering the essential links in online commercial networks
NASA Astrophysics Data System (ADS)
Zeng, Wei; Fang, Meiling; Shao, Junming; Shang, Mingsheng
2016-09-01
Recommender systems are designed to effectively support individuals' decision-making process on various web sites. It can be naturally represented by a user-object bipartite network, where a link indicates that a user has collected an object. Recently, research on the information backbone has attracted researchers' interests, which is a sub-network with fewer nodes and links but carrying most of the relevant information. With the backbone, a system can generate satisfactory recommenda- tions while saving much computing resource. In this paper, we propose an enhanced topology-aware method to extract the information backbone in the bipartite network mainly based on the information of neighboring users and objects. Our backbone extraction method enables the recommender systems achieve more than 90% of the accuracy of the top-L recommendation, however, consuming only 20% links. The experimental results show that our method outperforms the alternative backbone extraction methods. Moreover, the structure of the information backbone is studied in detail. Finally, we highlight that the information backbone is one of the most important properties of the bipartite network, with which one can significantly improve the efficiency of the recommender system.
Pathak, Jyotishman; Bailey, Kent R; Beebe, Calvin E; Bethard, Steven; Carrell, David S; Chen, Pei J; Dligach, Dmitriy; Endle, Cory M; Hart, Lacey A; Haug, Peter J; Huff, Stanley M; Kaggal, Vinod C; Li, Dingcheng; Liu, Hongfang; Marchant, Kyle; Masanz, James; Miller, Timothy; Oniki, Thomas A; Palmer, Martha; Peterson, Kevin J; Rea, Susan; Savova, Guergana K; Stancl, Craig R; Sohn, Sunghwan; Solbrig, Harold R; Suesse, Dale B; Tao, Cui; Taylor, David P; Westberg, Les; Wu, Stephen; Zhuo, Ning; Chute, Christopher G
2013-01-01
Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems—Mayo Clinic and Intermountain Healthcare—were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines—namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)—we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts. PMID:24190931
A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories.
Yang, Wei; Ai, Tinghua; Lu, Wei
2018-04-19
Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality.
A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories
Yang, Wei
2018-01-01
Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality. PMID:29671792
Code of Federal Regulations, 2012 CFR
2012-07-01
... you are a fixed facility and your cooling water intake structure is located in an estuary or tidal... waterbody flow information. If your cooling water intake structure is located in an estuary or tidal river...
Code of Federal Regulations, 2013 CFR
2013-07-01
... you are a fixed facility and your cooling water intake structure is located in an estuary or tidal... waterbody flow information. If your cooling water intake structure is located in an estuary or tidal river...
Code of Federal Regulations, 2014 CFR
2014-07-01
... you are a fixed facility and your cooling water intake structure is located in an estuary or tidal... waterbody flow information. If your cooling water intake structure is located in an estuary or tidal river...
NASA Astrophysics Data System (ADS)
Zhang, Han; Chen, Xuefeng; Du, Zhaohui; Li, Xiang; Yan, Ruqiang
2016-04-01
Fault information of aero-engine bearings presents two particular phenomena, i.e., waveform distortion and impulsive feature frequency band dispersion, which leads to a challenging problem for current techniques of bearing fault diagnosis. Moreover, although many progresses of sparse representation theory have been made in feature extraction of fault information, the theory also confronts inevitable performance degradation due to the fact that relatively weak fault information has not sufficiently prominent and sparse representations. Therefore, a novel nonlocal sparse model (coined NLSM) and its algorithm framework has been proposed in this paper, which goes beyond simple sparsity by introducing more intrinsic structures of feature information. This work adequately exploits the underlying prior information that feature information exhibits nonlocal self-similarity through clustering similar signal fragments and stacking them together into groups. Within this framework, the prior information is transformed into a regularization term and a sparse optimization problem, which could be solved through block coordinate descent method (BCD), is formulated. Additionally, the adaptive structural clustering sparse dictionary learning technique, which utilizes k-Nearest-Neighbor (kNN) clustering and principal component analysis (PCA) learning, is adopted to further enable sufficient sparsity of feature information. Moreover, the selection rule of regularization parameter and computational complexity are described in detail. The performance of the proposed framework is evaluated through numerical experiment and its superiority with respect to the state-of-the-art method in the field is demonstrated through the vibration signals of experimental rig of aircraft engine bearings.
NASA Astrophysics Data System (ADS)
Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua
2017-04-01
Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.
[Technologies for Complex Intelligent Clinical Data Analysis].
Baranov, A A; Namazova-Baranova, L S; Smirnov, I V; Devyatkin, D A; Shelmanov, A O; Vishneva, E A; Antonova, E V; Smirnov, V I
2016-01-01
The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient's features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented. Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality. the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center. Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as "negation" (indicates that the disease is absent), "no patient" (indicates that the disease refers to the patient's family member, but not to the patient), "severity of illness", disease course", "body region to which the disease refers". Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the methodfor determining the most informative patients'features are also proposed. Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records ofpatients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases. The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare.
Robust X-ray angular correlations for the study of meso-structures
Lhermitte, Julien R.; Tian, Cheng; Stein, Aaron; ...
2017-05-08
As self-assembling nanomaterials become more sophisticated, it is becoming increasingly important to measure the structural order of finite-sized assemblies of nano-objects. These mesoscale clusters represent an acute challenge to conventional structural probes, owing to the range of implicated size scales (10 nm to several micrometres), the weak scattering signal and the dynamic nature of meso-clusters in native solution environments. The high X-ray flux and coherence of modern synchrotrons present an opportunity to extract structural information from these challenging systems, but conventional ensemble X-ray scattering averages out crucial information about local particle configurations. Conversely, a single meso-cluster scatters too weakly tomore » recover the full diffraction pattern. Using X-ray angular cross-correlation analysis, it is possible to combine multiple noisy measurements to obtain robust structural information. This paper explores the key theoretical limits and experimental challenges that constrain the application of these methods to probing structural order in real nanomaterials. A metric is presented to quantify the signal-to-noise ratio of angular correlations, and it is used to identify several experimental artifacts that arise. In particular, it is found that background scattering, data masking and inter-cluster interference profoundly affect the quality of correlation analyses. A robust workflow is demonstrated for mitigating these effects and extracting reliable angular correlations from realistic experimental data.« less
Ad Hoc Information Extraction for Clinical Data Warehouses.
Dietrich, Georg; Krebs, Jonathan; Fette, Georg; Ertl, Maximilian; Kaspar, Mathias; Störk, Stefan; Puppe, Frank
2018-05-01
Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with "heart failure" including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [1] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [2] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts. Schattauer GmbH.
NASA Astrophysics Data System (ADS)
Brook, A.; Cristofani, E.; Vandewal, M.; Matheis, C.; Jonuscheit, J.; Beigang, R.
2012-05-01
The present study proposes a fully integrated, semi-automatic and near real-time mode-operated image processing methodology developed for Frequency-Modulated Continuous-Wave (FMCW) THz images with the center frequencies around: 100 GHz and 300 GHz. The quality control of aeronautics composite multi-layered materials and structures using Non-Destructive Testing is the main focus of this work. Image processing is applied on the 3-D images to extract useful information. The data is processed by extracting areas of interest. The detected areas are subjected to image analysis for more particular investigation managed by a spatial model. Finally, the post-processing stage examines and evaluates the spatial accuracy of the extracted information.
Code of Federal Regulations, 2012 CFR
2012-07-01
... intake structure; or (ii) Based on information submitted by any fishery management agency(ies) or other...) Based on information submitted by any fishery management agency(ies) or other relevant information, that...) and (5) of this section, would still contribute unacceptable stress to the protected species, critical...
Code of Federal Regulations, 2013 CFR
2013-07-01
... intake structure; or (ii) Based on information submitted by any fishery management agency(ies) or other...) Based on information submitted by any fishery management agency(ies) or other relevant information, that...) and (5) of this section, would still contribute unacceptable stress to the protected species, critical...
Code of Federal Regulations, 2014 CFR
2014-07-01
... intake structure; or (ii) Based on information submitted by any fishery management agency(ies) or other...) Based on information submitted by any fishery management agency(ies) or other relevant information, that...) and (5) of this section, would still contribute unacceptable stress to the protected species, critical...
Uncovering the spatial structure of mobility networks
NASA Astrophysics Data System (ADS)
Louail, Thomas; Lenormand, Maxime; Picornell, Miguel; García Cantú, Oliva; Herranz, Ricardo; Frias-Martinez, Enrique; Ramasco, José J.; Barthelemy, Marc
2015-01-01
The extraction of a clear and simple footprint of the structure of large, weighted and directed networks is a general problem that has relevance for many applications. An important example is seen in origin-destination matrices, which contain the complete information on commuting flows, but are difficult to analyze and compare. We propose here a versatile method, which extracts a coarse-grained signature of mobility networks, under the form of a 2 × 2 matrix that separates the flows into four categories. We apply this method to origin-destination matrices extracted from mobile phone data recorded in 31 Spanish cities. We show that these cities essentially differ by their proportion of two types of flows: integrated (between residential and employment hotspots) and random flows, whose importance increases with city size. Finally, the method allows the determination of categories of networks, and in the mobility case, the classification of cities according to their commuting structure.
A semi-supervised learning framework for biomedical event extraction based on hidden topics.
Zhou, Deyu; Zhong, Dayou
2015-05-01
Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.
Anibal, Paula Cristina; Peixoto, Iza Teixeira Alves; Foglio, Mary Ann; Höfling, José Francisco
2013-01-01
Ethanolic crude extracts prepared from the arils and seeds, pericarp, peels and from the whole fruit of Punica granatum, known as pomegranate, had their antifungal activity tested against Candida spp. The ethanolic crude extracts were analyzed by Mass Spectrometry and yielded many compounds such as punicalagin and galladydilacton. The extracts from the pericarp and peel showed activity against Candida spp., with MICs of 125 μg/mL. The effect of pericarp and peel extracts upon the morphological and structure of C. albicans and C. krusei were examined by scanning and transmission electron microscopy, with the visualization of an irregular membrane and hyphae, formation of vacuoles and thickening of the cell wall. The data obtained revealed potential antimicrobial activity against yeasts cells of the Candida genus, and the bioactive compounds could be responsible for changes in cell morphology and structure. The data obtained open new perspectives for future research in continuation to this study, where information such as determination of the site of action of the compounds could contribute to an alternative therapy against these organisms. PMID:24516425
Brain vascular image segmentation based on fuzzy local information C-means clustering
NASA Astrophysics Data System (ADS)
Hu, Chaoen; Liu, Xia; Liang, Xiao; Hui, Hui; Yang, Xin; Tian, Jie
2017-02-01
Light sheet fluorescence microscopy (LSFM) is a powerful optical resolution fluorescence microscopy technique which enables to observe the mouse brain vascular network in cellular resolution. However, micro-vessel structures are intensity inhomogeneity in LSFM images, which make an inconvenience for extracting line structures. In this work, we developed a vascular image segmentation method by enhancing vessel details which should be useful for estimating statistics like micro-vessel density. Since the eigenvalues of hessian matrix and its sign describes different geometric structure in images, which enable to construct vascular similarity function and enhance line signals, the main idea of our method is to cluster the pixel values of the enhanced image. Our method contained three steps: 1) calculate the multiscale gradients and the differences between eigenvalues of Hessian matrix. 2) In order to generate the enhanced microvessels structures, a feed forward neural network was trained by 2.26 million pixels for dealing with the correlations between multi-scale gradients and the differences between eigenvalues. 3) The fuzzy local information c-means clustering (FLICM) was used to cluster the pixel values in enhance line signals. To verify the feasibility and effectiveness of this method, mouse brain vascular images have been acquired by a commercial light-sheet microscope in our lab. The experiment of the segmentation method showed that dice similarity coefficient can reach up to 85%. The results illustrated that our approach extracting line structures of blood vessels dramatically improves the vascular image and enable to accurately extract blood vessels in LSFM images.
ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files.
Karthikeyan, Muthukumarasamy; Vyas, Renu
2016-01-01
Digital access to chemical journals resulted in a vast array of molecular information that is now available in the supplementary material files in PDF format. However, extracting this molecular information, generally from a PDF document format is a daunting task. Here we present an approach to harvest 3D molecular data from the supporting information of scientific research articles that are normally available from publisher's resources. In order to demonstrate the feasibility of extracting truly computable molecules from PDF file formats in a fast and efficient manner, we have developed a Java based application, namely ChemEngine. This program recognizes textual patterns from the supplementary data and generates standard molecular structure data (bond matrix, atomic coordinates) that can be subjected to a multitude of computational processes automatically. The methodology has been demonstrated via several case studies on different formats of coordinates data stored in supplementary information files, wherein ChemEngine selectively harvested the atomic coordinates and interpreted them as molecules with high accuracy. The reusability of extracted molecular coordinate data was demonstrated by computing Single Point Energies that were in close agreement with the original computed data provided with the articles. It is envisaged that the methodology will enable large scale conversion of molecular information from supplementary files available in the PDF format into a collection of ready- to- compute molecular data to create an automated workflow for advanced computational processes. Software along with source codes and instructions available at https://sourceforge.net/projects/chemengine/files/?source=navbar.Graphical abstract.
Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard
2018-01-01
Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.
How extractive industries affect health: Political economy underpinnings and pathways.
Schrecker, Ted; Birn, Anne-Emanuelle; Aguilera, Mariajosé
2018-06-07
A systematic and theoretically informed analysis of how extractive industries affect health outcomes and health inequities is overdue. Informed by the work of Saskia Sassen on "logics of extraction," we adopt an expansive definition of extractive industries to include (for example) large-scale foreign acquisitions of agricultural land for export production. To ground our analysis in concrete place-based evidence, we begin with a brief review of four case examples of major extractive activities. We then analyze the political economy of extractivism, focusing on the societal structures, processes, and relationships of power that drive and enable extraction. Next, we examine how this global order shapes and interacts with politics, institutions, and policies at the state/national level contextualizing extractive activity. Having provided necessary context, we posit a set of pathways that link the global political economy and national politics and institutional practices surrounding extraction to health outcomes and their distribution. These pathways involve both direct health effects, such as toxic work and environmental exposures and assassination of activists, and indirect effects, including sustained impoverishment, water insecurity, and stress-related ailments. We conclude with some reflections on the need for future research on the health and health equity implications of the global extractive order. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Earth Resources Survey Symposium. Volume 1-B: Geology, Information Systems and Services
NASA Technical Reports Server (NTRS)
1975-01-01
A symposium was conducted on the practical applications of earth resources survey technology including utilization and results of data from programs involving LANDSAT, the Skylab earth resources experiment package, and aircraft. Topics discussed include geological structure, landform surveys, energy and extractive resources, and information systems and services.
Guo, Yufan; Silins, Ilona; Stenius, Ulla; Korhonen, Anna
2013-06-01
Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization. We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html
Extracting nursing practice patterns from structured labor and delivery data sets.
Hall, Eric S; Thornton, Sidney N
2007-10-11
This study was designed to demonstrate the feasibility of a computerized care process model that provides real-time case profiling and outcome forecasting. A methodology was defined for extracting nursing practice patterns from structured point-of-care data collected using the labor and delivery information system at Intermountain Healthcare. Data collected during January 2006 were retrieved from Intermountain Healthcare's enterprise data warehouse for use in the study. The knowledge discovery in databases process provided a framework for data analysis including data selection, preprocessing, data-mining, and evaluation. Development of an interactive data-mining tool and construction of a data model for stratification of patient records into profiles supported the goals of the study. Five benefits of the practice pattern extraction capability, which extend to other clinical domains, are listed with supporting examples.
D'Antonio, Matteo; Masseroli, Marco
2009-01-01
Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075
Context Oriented Information Integration
NASA Astrophysics Data System (ADS)
Mohania, Mukesh; Bhide, Manish; Roy, Prasan; Chakaravarthy, Venkatesan T.; Gupta, Himanshu
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. Academicians have focused on this problem but there still remain a lot of obstacles for its widespread use in practice. One of the key problems is the absence of schema in unstructured text. In this paper we present a new paradigm for integrating information which overcomes this problem - that of Context Oriented Information Integration. The goal is to integrate unstructured data with the structured data present in the enterprise and use the extracted information to generate actionable insights for the enterprise. We present two techniques which enable context oriented information integration and show how they can be used for solving real world problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anna Johnston, SNL 9215
2002-09-01
PDB to AMPL Conversion was written to convert protein data base files to AMPL files. The protein data bases on the internet contain a wealth of information about the structue and makeup of proteins. Each file contains information derived by one or more experiments and contains information on how the experiment waw performed, the amino acid building blocks of each chain, and often the three-dimensional structure of the protein extracted from the experiments. The way a protein folds determines much about its function. Thus, studying the three-dimensional structure of the protein is of great interest. Analysing the contact maps ismore » one way to examine the structure. A contact map is a graph which has a linear back bone of amino acids for nodes (i.e., adjacent amino acids are always connected) and vertices between non-adjacent nodes if they are close enough to be considered in contact. If the graphs are similar then the folds of the protein and their function should also be similar. This software extracts the contact maps from a protein data base file and puts in into AMPL data format. This format is designed for use in AMPL, a programming language for simplifying linear programming formulations.« less
The Cadmio XML healthcare record.
Barbera, Francesco; Ferri, Fernando; Ricci, Fabrizio L; Sottile, Pier Angelo
2002-01-01
The management of clinical data is a complex task. Patient related information reported in patient folders is a set of heterogeneous and structured data accessed by different users having different goals (in local or geographical networks). XML language provides a mechanism for describing, manipulating, and visualising structured data in web-based applications. XML ensures that the structured data is managed in a uniform and transparent manner independently from the applications and their providers guaranteeing some interoperability. Extracting data from the healthcare record and structuring them according to XML makes the data available through browsers. The MIC/MIE model (Medical Information Category/Medical Information Elements), which allows the definition and management of healthcare records and used in CADMIO, a HISA based project, is described in this paper, using XML for allowing the data to be visualised through web browsers.
Dawes, Martin; Pluye, Pierre; Shea, Laura; Grad, Roland; Greenberg, Arlene; Nie, Jian-Yun
2007-01-01
Information retrieval in primary care is becoming more difficult as the volume of medical information held in electronic databases expands. The lexical structure of this information might permit automatic indexing and improved retrieval. To determine the possibility of identifying the key elements of clinical studies, namely Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR), from abstracts of medical journals. We used a convenience sample of 20 synopses from the journal Evidence-Based Medicine (EBM) and their matching original journal article abstracts obtained from PubMed. Three independent primary care professionals identified PECODR-related extracts of text. Rules were developed to define each PECODR element and the selection process of characters, words, phrases and sentences. From the extracts of text related to PECODR elements, potential lexical patterns that might help identify those elements were proposed and assessed using NVivo software. A total of 835 PECODR-related text extracts containing 41,263 individual text characters were identified from 20 EBM journal synopses. There were 759 extracts in the corresponding PubMed abstracts containing 31,947 characters. PECODR elements were found in nearly all abstracts and synopses with the exception of duration. There was agreement on 86.6% of the extracts from the 20 EBM synopses and 85.0% on the corresponding PubMed abstracts. After consensus this rose to 98.4% and 96.9% respectively. We found potential text patterns in the Comparison, Outcome and Results elements of both EBM synopses and PubMed abstracts. Some phrases and words are used frequently and are specific for these elements in both synopses and abstracts. Results suggest a PECODR-related structure exists in medical abstracts and that there might be lexical patterns specific to these elements. More sophisticated computer-assisted lexical-semantic analysis might refine these results, and pave the way to automating PECODR indexing, and improve information retrieval in primary care.
Amplitude interpretation and visualization of three-dimensional reflection data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enachescu, M.E.
1994-07-01
Digital recording and processing of modern three-dimensional surveys allow for relative good preservation and correct spatial positioning of seismic reflection amplitude. A four-dimensional seismic reflection field matrix R (x,y,t,A), which can be computer visualized (i.e., real-time interactively rendered, edited, and animated), is now available to the interpreter. The amplitude contains encoded geological information indirectly related to lithologies and reservoir properties. The magnitude of the amplitude depends not only on the acoustic impedance contrast across a boundary, but is also strongly affected by the shape of the reflective boundary. This allows the interpreter to image subtle tectonic and structural elements notmore » obvious on time-structure maps. The use of modern workstations allows for appropriate color coding of the total available amplitude range, routine on-screen time/amplitude extraction, and late display of horizon amplitude maps (horizon slices) or complex amplitude-structure spatial visualization. Stratigraphic, structural, tectonic, fluid distribution, and paleogeographic information are commonly obtained by displaying the amplitude variation A = A(x,y,t) associated with a particular reflective surface or seismic interval. As illustrated with several case histories, traditional structural and stratigraphic interpretation combined with a detailed amplitude study generally greatly enhance extraction of subsurface geological information from a reflection data volume. In the context of three-dimensional seismic surveys, the horizon amplitude map (horizon slice), amplitude attachment to structure and [open quotes]bright clouds[close quotes] displays are very powerful tools available to the interpreter.« less
BioRAT: extracting biological information from full-length papers.
Corney, David P A; Buxton, Bernard F; Langdon, William B; Jones, David T
2004-11-22
Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.
Information extraction with object based support vector machines and vegetation indices
NASA Astrophysics Data System (ADS)
Ustuner, Mustafa; Abdikan, Saygin; Balik Sanli, Fusun
2016-07-01
Information extraction through remote sensing data is important for policy and decision makers as extracted information provide base layers for many application of real world. Classification of remotely sensed data is the one of the most common methods of extracting information however it is still a challenging issue because several factors are affecting the accuracy of the classification. Resolution of the imagery, number and homogeneity of land cover classes, purity of training data and characteristic of adopted classifiers are just some of these challenging factors. Object based image classification has some superiority than pixel based classification for high resolution images since it uses geometry and structure information besides spectral information. Vegetation indices are also commonly used for the classification process since it provides additional spectral information for vegetation, forestry and agricultural areas. In this study, the impacts of the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Red Edge Index (NDRE) on the classification accuracy of RapidEye imagery were investigated. Object based Support Vector Machines were implemented for the classification of crop types for the study area located in Aegean region of Turkey. Results demonstrated that the incorporation of NDRE increase the classification accuracy from 79,96% to 86,80% as overall accuracy, however NDVI decrease the classification accuracy from 79,96% to 78,90%. Moreover it is proven than object based classification with RapidEye data give promising results for crop type mapping and analysis.
Structured prediction models for RNN based sequence labeling in clinical text.
Jagannatha, Abhyuday N; Yu, Hong
2016-11-01
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Structured prediction models for RNN based sequence labeling in clinical text
Jagannatha, Abhyuday N; Yu, Hong
2016-01-01
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040
ECO: A Framework for Entity Co-Occurrence Exploration with Faceted Navigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halliday, K. D.
2010-08-20
Even as highly structured databases and semantic knowledge bases become more prevalent, a substantial amount of human knowledge is reported as written prose. Typical textual reports, such as news articles, contain information about entities (people, organizations, and locations) and their relationships. Automatically extracting such relationships from large text corpora is a key component of corporate and government knowledge bases. The primary goal of the ECO project is to develop a scalable framework for extracting and presenting these relationships for exploration using an easily navigable faceted user interface. ECO uses entity co-occurrence relationships to identify related entities. The system aggregates andmore » indexes information on each entity pair, allowing the user to rapidly discover and mine relational information.« less
Orientation selectivity based structure for texture classification
NASA Astrophysics Data System (ADS)
Wu, Jinjian; Lin, Weisi; Shi, Guangming; Zhang, Yazhong; Lu, Liu
2014-10-01
Local structure, e.g., local binary pattern (LBP), is widely used in texture classification. However, LBP is too sensitive to disturbance. In this paper, we introduce a novel structure for texture classification. Researches on cognitive neuroscience indicate that the primary visual cortex presents remarkable orientation selectivity for visual information extraction. Inspired by this, we investigate the orientation similarities among neighbor pixels, and propose an orientation selectivity based pattern for local structure description. Experimental results on texture classification demonstrate that the proposed structure descriptor is quite robust to disturbance.
Zhou, Jiyun; Wang, Hongpeng; Zhao, Zhishan; Xu, Ruifeng; Lu, Qin
2018-05-08
Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.
Hierarchical structures of amorphous solids characterized by persistent homology
Hiraoka, Yasuaki; Nakamura, Takenobu; Hirata, Akihiko; Escolar, Emerson G.; Matsue, Kaname; Nishiura, Yasumasa
2016-01-01
This article proposes a topological method that extracts hierarchical structures of various amorphous solids. The method is based on the persistence diagram (PD), a mathematical tool for capturing shapes of multiscale data. The input to the PDs is given by an atomic configuration and the output is expressed as 2D histograms. Then, specific distributions such as curves and islands in the PDs identify meaningful shape characteristics of the atomic configuration. Although the method can be applied to a wide variety of disordered systems, it is applied here to silica glass, the Lennard-Jones system, and Cu-Zr metallic glass as standard examples of continuous random network and random packing structures. In silica glass, the method classified the atomic rings as short-range and medium-range orders and unveiled hierarchical ring structures among them. These detailed geometric characterizations clarified a real space origin of the first sharp diffraction peak and also indicated that PDs contain information on elastic response. Even in the Lennard-Jones system and Cu-Zr metallic glass, the hierarchical structures in the atomic configurations were derived in a similar way using PDs, although the glass structures and properties substantially differ from silica glass. These results suggest that the PDs provide a unified method that extracts greater depth of geometric information in amorphous solids than conventional methods. PMID:27298351
Structured reporting of MRI of the shoulder - improvement of report quality?
Gassenmaier, Sebastian; Armbruster, Marco; Haasters, Florian; Helfen, Tobias; Henzler, Thomas; Alibek, Sedat; Pförringer, Dominik; Sommer, Wieland H; Sommer, Nora N
2017-10-01
To evaluate the effect of structured reports (SRs) in comparison to non-structured narrative free text (NRs) shoulder MRI reports and potential effects of both types of reporting on completeness, readability, linguistic quality and referring surgeons' satisfaction. Thirty patients after trauma or with suspected degenerative changes of the shoulder were included in this study (2012-2015). All patients underwent shoulder MRI for further assessment and possible surgical planning. NRs were generated during clinical routine. Corresponding SRs were created using a dedicated template. All 60 reports were evaluated by two experienced orthopaedic shoulder surgeons using a questionnaire that included eight questions. Eighty per cent of the SRs were fully complete without any missing key features whereas only 45% of the NRs were fully complete (p < 0.001). The extraction of information was regarded to be easy in 92% of the SRs and 63% of the NRs. The overall quality of the SRs was rated better than that of the NRs (p < 0.001). Structured reporting of shoulder MRI improves the readability as well as the linguistic quality of radiological reports, and potentially leads to a higher satisfaction of referring physicians. • Structured MRI reports of the shoulder improve readability. • Structured reporting facilitates information extraction. • Referring physicians prefer structured reports to narrative free text reports. • Structured MRI reports of the shoulder can reduce radiologist re-consultations.
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
3D electron tomography of pretreated biomass informs atomic modeling of cellulose microfibrils.
Ciesielski, Peter N; Matthews, James F; Tucker, Melvin P; Beckham, Gregg T; Crowley, Michael F; Himmel, Michael E; Donohoe, Bryon S
2013-09-24
Fundamental insights into the macromolecular architecture of plant cell walls will elucidate new structure-property relationships and facilitate optimization of catalytic processes that produce fuels and chemicals from biomass. Here we introduce computational methodology to extract nanoscale geometry of cellulose microfibrils within thermochemically treated biomass directly from electron tomographic data sets. We quantitatively compare the cell wall nanostructure in corn stover following two leading pretreatment strategies: dilute acid with iron sulfate co-catalyst and ammonia fiber expansion (AFEX). Computational analysis of the tomographic data is used to extract mathematical descriptions for longitudinal axes of cellulose microfibrils from which we calculate their nanoscale curvature. These nanostructural measurements are used to inform the construction of atomistic models that exhibit features of cellulose within real, process-relevant biomass. By computational evaluation of these atomic models, we propose relationships between the crystal structure of cellulose Iβ and the nanoscale geometry of cellulose microfibrils.
Automated extraction and semantic analysis of mutation impacts from the biomedical literature
2012-01-01
Background Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions. PMID:22759648
NASA Astrophysics Data System (ADS)
Tene, Yair; Tene, Noam; Tene, G.
1993-08-01
An interactive data fusion methodology of video, audio, and nonlinear structural dynamic analysis for potential application in forensic engineering is presented. The methodology was developed and successfully demonstrated in the analysis of heavy transportable bridge collapse during preparation for testing. Multiple bridge elements failures were identified after the collapse, including fracture, cracks and rupture of high performance structural materials. Videotape recording by hand held camcorder was the only source of information about the collapse sequence. The interactive data fusion methodology resulted in extracting relevant information form the videotape and from dynamic nonlinear structural analysis, leading to full account of the sequence of events during the bridge collapse.
Fabrication and Properties of Multilayer Structures
1982-08-01
22- The relative voltage supported in each semiconductor is V -V NAc b 1 A227) Vb2 V2 NDlcl where V = V1 + V2. It is apparent that Eqs. 5-7 will...has more difficulty because the interface state capacitance must be extracted from the measured capacitance. When a voltage is applied, the interface...contain identical information about interface states. However, as Nicollian and Goetzberger (6 ) have shown, greater inaccuracies arise in extracting
NASA Astrophysics Data System (ADS)
Ren, B.; Wen, Q.; Zhou, H.; Guan, F.; Li, L.; Yu, H.; Wang, Z.
2018-04-01
The purpose of this paper is to provide decision support for the adjustment and optimization of crop planting structure in Jingxian County. The object-oriented information extraction method is used to extract corn and cotton from Jingxian County of Hengshui City in Hebei Province, based on multi-period GF-1 16-meter images. The best time of data extraction was screened by analyzing the spectral characteristics of corn and cotton at different growth stages based on multi-period GF-116-meter images, phenological data, and field survey data. The results showed that the total classification accuracy of corn and cotton was up to 95.7 %, the producer accuracy was 96 % and 94 % respectively, and the user precision was 95.05 % and 95.9 % respectively, which satisfied the demand of crop monitoring application. Therefore, combined with multi-period high-resolution images and object-oriented classification can be a good extraction of large-scale distribution of crop information for crop monitoring to provide convenient and effective technical means.
Accurate airway centerline extraction based on topological thinning using graph-theoretic analysis.
Bian, Zijian; Tan, Wenjun; Yang, Jinzhu; Liu, Jiren; Zhao, Dazhe
2014-01-01
The quantitative analysis of the airway tree is of critical importance in the CT-based diagnosis and treatment of popular pulmonary diseases. The extraction of airway centerline is a precursor to identify airway hierarchical structure, measure geometrical parameters, and guide visualized detection. Traditional methods suffer from extra branches and circles due to incomplete segmentation results, which induce false analysis in applications. This paper proposed an automatic and robust centerline extraction method for airway tree. First, the centerline is located based on the topological thinning method; border voxels are deleted symmetrically to preserve topological and geometrical properties iteratively. Second, the structural information is generated using graph-theoretic analysis. Then inaccurate circles are removed with a distance weighting strategy, and extra branches are pruned according to clinical anatomic knowledge. The centerline region without false appendices is eventually determined after the described phases. Experimental results show that the proposed method identifies more than 96% branches and keep consistency across different cases and achieves superior circle-free structure and centrality.
Automated software system for checking the structure and format of ACM SIG documents
NASA Astrophysics Data System (ADS)
Mirza, Arsalan Rahman; Sah, Melike
2017-04-01
Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.
Liu, Bo; Wu, Huayi; Wang, Yandong; Liu, Wenming
2015-01-01
Main road features extracted from remotely sensed imagery play an important role in many civilian and military applications, such as updating Geographic Information System (GIS) databases, urban structure analysis, spatial data matching and road navigation. Current methods for road feature extraction from high-resolution imagery are typically based on threshold value segmentation. It is difficult however, to completely separate road features from the background. We present a new method for extracting main roads from high-resolution grayscale imagery based on directional mathematical morphology and prior knowledge obtained from the Volunteered Geographic Information found in the OpenStreetMap. The two salient steps in this strategy are: (1) using directional mathematical morphology to enhance the contrast between roads and non-roads; (2) using OpenStreetMap roads as prior knowledge to segment the remotely sensed imagery. Experiments were conducted on two ZiYuan-3 images and one QuickBird high-resolution grayscale image to compare our proposed method to other commonly used techniques for road feature extraction. The results demonstrated the validity and better performance of the proposed method for urban main road feature extraction. PMID:26397832
Depth-tunable three-dimensional display with interactive light field control
NASA Astrophysics Data System (ADS)
Xie, Songlin; Wang, Peng; Sang, Xinzhu; Li, Chenyu; Dou, Wenhua; Xiao, Liquan
2016-07-01
A software-defined depth-tunable three-dimensional (3D) display with interactive 3D depth control is presented. With the proposed post-processing system, the disparity of the multi-view media can be freely adjusted. Benefiting from a wealth of information inherently contains in dense multi-view images captured with parallel arrangement camera array, the 3D light field is built and the light field structure is controlled to adjust the disparity without additional acquired depth information since the light field structure itself contains depth information. A statistical analysis based on the least square is carried out to extract the depth information inherently exists in the light field structure and the accurate depth information can be used to re-parameterize light fields for the autostereoscopic display, and a smooth motion parallax can be guaranteed. Experimental results show that the system is convenient and effective to adjust the 3D scene performance in the 3D display.
NASA Astrophysics Data System (ADS)
Pereira, Dolores; Pereira, Alcides; Neves, Luis
2015-04-01
The study of radioactivity in natural stones is a subject of great interest from different points of view: scientific, social and economic. Several previous studies have demonstrated that the radioactivity is dependent, not only on the uranium content, but also on the structures, textures, minerals containing the uranium and degree of weathering of the natural stone. Villavieja granite is extracted in a village where uranium mining was an important activity during the 20th century. Today the mine is closed but the granite is still extracted. Incorrect information about natural radioactivity given to natural stone users, policy makers, construction managers and the general public has caused turmoil in the media for many years. This paper considers problems associated with the communication of reliable information, as well as uncertainties, on natural radioactivity to these audiences.
2013-01-01
Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks. PMID:23742147
NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.
Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio
2017-03-01
The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .
Ye, Cui-Ping; Feng, Jie; Li, Wen-Ying
2012-07-01
Coal structure, especially the macromolecular aromatic skeleton structure, has a strong influence on coke reactivity and coal gasification, so it is the key to grasp the macromolecular aromatic skeleton coal structure for getting the reasonable high efficiency utilization of coal. However, it is difficult to acquire their information due to the complex compositions and structure of coal. It has been found that the macromolecular aromatic network coal structure would be most isolated if small molecular of coal was first extracted. Then the macromolecular aromatic skeleton coal structure would be clearly analyzed by instruments, such as X-ray diffraction (XRD), fluorescence spectroscopy with synchronous mode (Syn-F), Gel permeation chromatography (GPC) etc. Based on the previous results, according to the stepwise fractional liquid extraction, two Chinese typical power coals, PS and HDG, were extracted by silica gel as stationary phase and acetonitrile, tetrahydrofuran (THF), pyridine and 1-methyl-2-pyrollidinone (NMP) as a solvent group for sequential elution. GPC, Syn-F and XRD were applied to investigate molecular mass distribution, condensed aromatic structure and crystal characteristics. The results showed that the size of aromatic layers (La) is small (3-3.95 nm) and the stacking heights (Lc) are 0.8-1.2 nm. The molecular mass distribution of the macromolecular aromatic network structure is between 400 and 1 130 amu, with condensed aromatic numbers of 3-7 in the structure units.
Acquiring geographical data with web harvesting
NASA Astrophysics Data System (ADS)
Dramowicz, K.
2016-04-01
Many websites contain very attractive and up to date geographical information. This information can be extracted, stored, analyzed and mapped using web harvesting techniques. Poorly organized data from websites are transformed with web harvesting into a more structured format, which can be stored in a database and analyzed. Almost 25% of web traffic is related to web harvesting, mostly while using search engines. This paper presents how to harvest geographic information from web documents using the free tool called the Beautiful Soup, one of the most commonly used Python libraries for pulling data from HTML and XML files. It is a relatively easy task to process one static HTML table. The more challenging task is to extract and save information from tables located in multiple and poorly organized websites. Legal and ethical aspects of web harvesting are discussed as well. The paper demonstrates two case studies. The first one shows how to extract various types of information about the Good Country Index from the multiple web pages, load it into one attribute table and map the results. The second case study shows how script tools and GIS can be used to extract information from one hundred thirty six websites about Nova Scotia wines. In a little more than three minutes a database containing one hundred and six liquor stores selling these wines is created. Then the availability and spatial distribution of various types of wines (by grape types, by wineries, and by liquor stores) are mapped and analyzed.
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach
2012-01-01
Background Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. Methods We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. Results We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. Conclusions We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data. PMID:22759462
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.
Ratkovic, Zorana; Golik, Wiktoria; Warnier, Pierre
2012-06-26
Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data.
Automated Data Cleansing in Data Harvesting and Data Migration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Mark; Vowell, Lance; King, Ian
2011-03-16
In the proposal for this project, we noted how the explosion of digitized information available through corporate databases, data stores and online search systems has resulted in the knowledge worker being bombarded by information. Knowledge workers typically spend more than 20-30% of their time seeking and sorting information, only finding the information 50-60% of the time . This information exists as unstructured, semi-structured and structured data. The problem of information overload is compounded by the production of duplicate or near-duplicate information. In addition, near-duplicate items frequently have different origins, creating a situation in which each item may have unique informationmore » of value, but their differences are not significant enough to justify maintaining them as separate entities. Effective tools can be provided to eliminate duplicate and near-duplicate information. The proposed approach was to extract unique information from data sets and consolidation that information into a single comprehensive file.« less
EELS from organic crystalline materials
NASA Astrophysics Data System (ADS)
Brydson, R.; Eddleston, M. D.; Jones, W.; Seabourne, C. R.; Hondow, N.
2014-06-01
We report the use of the electron energy loss spectroscopy (EELS) for providing light element chemical composition information from organic, crystalline pharmaceutical materials including theophylline and paracetamol and discuss how this type of data can complement transmission electron microscopy (TEM) imaging and electron diffraction when investigating polymorphism. We also discuss the potential for the extraction of bonding information using electron loss near-edge structure (ELNES).
Medical-Information-Management System
NASA Technical Reports Server (NTRS)
Alterescu, Sidney; Friedman, Carl A.; Frankowski, James W.
1989-01-01
Medical Information Management System (MIMS) computer program interactive, general-purpose software system for storage and retrieval of information. Offers immediate assistance where manipulation of large data bases required. User quickly and efficiently extracts, displays, and analyzes data. Used in management of medical data and handling all aspects of data related to care of patients. Other applications include management of data on occupational safety in public and private sectors, handling judicial information, systemizing purchasing and procurement systems, and analyses of cost structures of organizations. Written in Microsoft FORTRAN 77.
NASA Astrophysics Data System (ADS)
Dogon-yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.
2016-10-01
Timely and accurate acquisition of information on the condition and structural changes of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting tree features include; ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraint, such as labour intensive field work, a lot of financial requirement, influences by weather condition and topographical covers which can be overcome by means of integrated airborne based LiDAR and very high resolution digital image datasets. This study presented a semi-automated approach for extracting urban trees from integrated airborne based LIDAR and multispectral digital image datasets over Istanbul city of Turkey. The above scheme includes detection and extraction of shadow free vegetation features based on spectral properties of digital images using shadow index and NDVI techniques and automated extraction of 3D information about vegetation features from the integrated processing of shadow free vegetation image and LiDAR point cloud datasets. The ability of the developed algorithms shows a promising result as an automated and cost effective approach to estimating and delineated 3D information of urban trees. The research also proved that integrated datasets is a suitable technology and a viable source of information for city managers to be used in urban trees management.
Kellman, Philip J; Massey, Christine M; Son, Ji Y
2010-04-01
Learning in educational settings emphasizes declarative and procedural knowledge. Studies of expertise, however, point to other crucial components of learning, especially improvements produced by experience in the extraction of information: perceptual learning (PL). We suggest that such improvements characterize both simple sensory and complex cognitive, even symbolic, tasks through common processes of discovery and selection. We apply these ideas in the form of perceptual learning modules (PLMs) to mathematics learning. We tested three PLMs, each emphasizing different aspects of complex task performance, in middle and high school mathematics. In the MultiRep PLM, practice in matching function information across multiple representations improved students' abilities to generate correct graphs and equations from word problems. In the Algebraic Transformations PLM, practice in seeing equation structure across transformations (but not solving equations) led to dramatic improvements in the speed of equation solving. In the Linear Measurement PLM, interactive trials involving extraction of information about units and lengths produced successful transfer to novel measurement problems and fraction problem solving. Taken together, these results suggest (a) that PL techniques have the potential to address crucial, neglected dimensions of learning, including discovery and fluent processing of relations; (b) PL effects apply even to complex tasks that involve symbolic processing; and (c) appropriately designed PL technology can produce rapid and enduring advances in learning. Copyright © 2009 Cognitive Science Society, Inc.
An automated procedure for detection of IDP's dwellings using VHR satellite imagery
NASA Astrophysics Data System (ADS)
Jenerowicz, Malgorzata; Kemper, Thomas; Soille, Pierre
2011-11-01
This paper presents the results for the estimation of dwellings structures in Al Salam IDP Camp, Southern Darfur, based on Very High Resolution multispectral satellite images obtained by implementation of Mathematical Morphology analysis. A series of image processing procedures, feature extraction methods and textural analysis have been applied in order to provide reliable information about dwellings structures. One of the issues in this context is related to similarity of the spectral response of thatched dwellings' roofs and the surroundings in the IDP camps, where the exploitation of multispectral information is crucial. This study shows the advantage of automatic extraction approach and highlights the importance of detailed spatial and spectral information analysis based on multi-temporal dataset. The additional data fusion of high-resolution panchromatic band with lower resolution multispectral bands of WorldView-2 satellite has positive influence on results and thereby can be useful for humanitarian aid agency, providing support of decisions and estimations of population especially in situations when frequent revisits by space imaging system are the only possibility of continued monitoring.
NASA Astrophysics Data System (ADS)
Hosseini-Golgoo, S. M.; Bozorgi, H.; Saberkari, A.
2015-06-01
Performances of three neural networks, consisting of a multi-layer perceptron, a radial basis function, and a neuro-fuzzy network with local linear model tree training algorithm, in modeling and extracting discriminative features from the response patterns of a temperature-modulated resistive gas sensor are quantitatively compared. For response pattern recording, a voltage staircase containing five steps each with a 20 s plateau is applied to the micro-heater of the sensor, when 12 different target gases, each at 11 concentration levels, are present. In each test, the hidden layer neuron weights are taken as the discriminatory feature vector of the target gas. These vectors are then mapped to a 3D feature space using linear discriminant analysis. The discriminative information content of the feature vectors are determined by the calculation of the Fisher’s discriminant ratio, affording quantitative comparison among the success rates achieved by the different neural network structures. The results demonstrate a superior discrimination ratio for features extracted from local linear neuro-fuzzy and radial-basis-function networks with recognition rates of 96.27% and 90.74%, respectively.
Identifying local structural states in atomic imaging by computer vision
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laanait, Nouamane; Ziatdinov, Maxim; He, Qian
The availability of atomically resolved imaging modalities enables an unprecedented view into the local structural states of materials, which manifest themselves by deviations from the fundamental assumptions of periodicity and symmetry. Consequently, approaches that aim to extract these local structural states from atomic imaging data with minimal assumptions regarding the average crystallographic configuration of a material are indispensable to advances in structural and chemical investigations of materials. Here, we present an approach to identify and classify local structural states that is rooted in computer vision. This approach introduces a definition of a structural state that is composed of both localmore » and non-local information extracted from atomically resolved images, and is wholly untethered from the familiar concepts of symmetry and periodicity. Instead, this approach relies on computer vision techniques such as feature detection, and concepts such as scale-invariance. We present the fundamental aspects of local structural state extraction and classification by application to simulated scanning transmission electron microscopy images, and analyze the robustness of this approach in the presence of common instrumental factors such as noise, limited spatial resolution, and weak contrast. Finally, we apply this computer vision-based approach for the unsupervised detection and classification of local structural states in an experimental electron micrograph of a complex oxides interface, and a scanning tunneling micrograph of a defect engineered multilayer graphene surface.« less
Identifying local structural states in atomic imaging by computer vision
Laanait, Nouamane; Ziatdinov, Maxim; He, Qian; ...
2016-11-02
The availability of atomically resolved imaging modalities enables an unprecedented view into the local structural states of materials, which manifest themselves by deviations from the fundamental assumptions of periodicity and symmetry. Consequently, approaches that aim to extract these local structural states from atomic imaging data with minimal assumptions regarding the average crystallographic configuration of a material are indispensable to advances in structural and chemical investigations of materials. Here, we present an approach to identify and classify local structural states that is rooted in computer vision. This approach introduces a definition of a structural state that is composed of both localmore » and non-local information extracted from atomically resolved images, and is wholly untethered from the familiar concepts of symmetry and periodicity. Instead, this approach relies on computer vision techniques such as feature detection, and concepts such as scale-invariance. We present the fundamental aspects of local structural state extraction and classification by application to simulated scanning transmission electron microscopy images, and analyze the robustness of this approach in the presence of common instrumental factors such as noise, limited spatial resolution, and weak contrast. Finally, we apply this computer vision-based approach for the unsupervised detection and classification of local structural states in an experimental electron micrograph of a complex oxides interface, and a scanning tunneling micrograph of a defect engineered multilayer graphene surface.« less
Discovering H-bonding rules in crystals with inductive logic programming.
Ando, Howard Y; Dehaspe, Luc; Luyten, Walter; Van Craenenbroeck, Elke; Vandecasteele, Henk; Van Meervelt, Luc
2006-01-01
In the domain of crystal engineering, various schemes have been proposed for the classification of hydrogen bonding (H-bonding) patterns observed in 3D crystal structures. In this study, the aim is to complement these schemes with rules that predict H-bonding in crystals from 2D structural information only. Modern computational power and the advances in inductive logic programming (ILP) can now provide computational chemistry with the opportunity for extracting structure-specific rules from large databases that can be incorporated into expert systems. ILP technology is here applied to H-bonding in crystals to develop a self-extracting expert system utilizing data in the Cambridge Structural Database of small molecule crystal structures. A clear increase in performance was observed when the ILP system DMax was allowed to refer to the local structural environment of the possible H-bond donor/acceptor pairs. This ability distinguishes ILP from more traditional approaches that build rules on the basis of global molecular properties.
Polyphenolic reductants in cane sugar
USDA-ARS?s Scientific Manuscript database
Limited information is available to understand the chemical structure of cane sugar extracts responsible for the redox reactivity. This study employed Fremy’s salt to test the hypothesis that hydroquinone/catechol-semiquinone-quinone redox cycle is responsible for the antioxidant activity of sugarc...
Antifungal cyclic peptides from the marine sponge Microscleroderma herdmani
USDA-ARS?s Scientific Manuscript database
Screening natural product extracts from National Cancer Institute Open Repository for antifungal discovery afforded hits for bioassay-guided fractionation. Upon LC-MS analysis of column fractions with antifungal activities to generate information on chemical structure, two new cyclic hexapeptides, m...
Cocco, Simona; Monasson, Remi; Weigt, Martin
2013-01-01
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764
Landmark-based deep multi-instance learning for brain disease diagnosis.
Liu, Mingxia; Zhang, Jun; Adeli, Ehsan; Shen, Dinggang
2018-01-01
In conventional Magnetic Resonance (MR) image based methods, two stages are often involved to capture brain structural information for disease diagnosis, i.e., 1) manually partitioning each MR image into a number of regions-of-interest (ROIs), and 2) extracting pre-defined features from each ROI for diagnosis with a certain classifier. However, these pre-defined features often limit the performance of the diagnosis, due to challenges in 1) defining the ROIs and 2) extracting effective disease-related features. In this paper, we propose a landmark-based deep multi-instance learning (LDMIL) framework for brain disease diagnosis. Specifically, we first adopt a data-driven learning approach to discover disease-related anatomical landmarks in the brain MR images, along with their nearby image patches. Then, our LDMIL framework learns an end-to-end MR image classifier for capturing both the local structural information conveyed by image patches located by landmarks and the global structural information derived from all detected landmarks. We have evaluated our proposed framework on 1526 subjects from three public datasets (i.e., ADNI-1, ADNI-2, and MIRIAD), and the experimental results show that our framework can achieve superior performance over state-of-the-art approaches. Copyright © 2017 Elsevier B.V. All rights reserved.
Feature extraction via KPCA for classification of gait patterns.
Wu, Jianning; Wang, Jue; Liu, Li
2007-06-01
Automated recognition of gait pattern change is important in medical diagnostics as well as in the early identification of at-risk gait in the elderly. We evaluated the use of Kernel-based Principal Component Analysis (KPCA) to extract more gait features (i.e., to obtain more significant amounts of information about human movement) and thus to improve the classification of gait patterns. 3D gait data of 24 young and 24 elderly participants were acquired using an OPTOTRAK 3020 motion analysis system during normal walking, and a total of 36 gait spatio-temporal and kinematic variables were extracted from the recorded data. KPCA was used first for nonlinear feature extraction to then evaluate its effect on a subsequent classification in combination with learning algorithms such as support vector machines (SVMs). Cross-validation test results indicated that the proposed technique could allow spreading the information about the gait's kinematic structure into more nonlinear principal components, thus providing additional discriminatory information for the improvement of gait classification performance. The feature extraction ability of KPCA was affected slightly with different kernel functions as polynomial and radial basis function. The combination of KPCA and SVM could identify young-elderly gait patterns with 91% accuracy, resulting in a markedly improved performance compared to the combination of PCA and SVM. These results suggest that nonlinear feature extraction by KPCA improves the classification of young-elderly gait patterns, and holds considerable potential for future applications in direct dimensionality reduction and interpretation of multiple gait signals.
Zhou, Li; Friedman, Carol; Parsons, Simon; Hripcsak, George
2005-01-01
Exploring temporal information in narrative Electronic Medical Records (EMRs) is essential and challenging. We propose an architecture for an integrated approach to process temporal information in clinical narrative reports. The goal is to initiate and build a foundation that supports applications which assist healthcare practice and research by including the ability to determine the time of clinical events (e.g., past vs. present). Key components include: (1) a temporal constraint structure for temporal expressions and the development of an associated tagger; (2) a Natural Language Processing (NLP) system for encoding and extracting medical events and associating them with formalized temporal data; (3) a post-processor, with a knowledge-based subsystem to help discover implicit information, that resolves temporal expressions and deals with issues such as granularity and vagueness; and (4) a reasoning mechanism which models clinical reports as Simple Temporal Problems (STPs). PMID:16779164
A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature.
Tchoua, Roselyne B; Chard, Kyle; Audus, Debra; Qin, Jian; de Pablo, Juan; Foster, Ian
2016-01-01
A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.
[HPLC-ESI-MS(n) analysis of the water soluble extracts of Fructus Choerospondiatis].
Shi, Run-ju; Dai, Yun; Fang, Min-feng; Zhao, Xin; Zheng, Jian-bin; Zheng, Xiao-hui
2007-03-01
To establish an HPLC-ESI-MS(n) method for analyzing the chemical ingredients in the water soluble extracts of Fructus Choerospondiatis. Water-solvable extracts of Fructus Choerospondiatis are obtained by heating recirculation. Multi-stage reaction mode (MRM) of the HPLC-ESI-MS(n) was used to determine the content of Gallic acid, the MS(n) technology was used to obtain the information of characteristic multistage fragment ions so as to identify the chemical structure of peaks in the total current spectrum. Eleven compounds were identified, and one of them is a new unknown ingredient. The method, which has high recovery and specificity, can offer the experimental evidences for the further research of the chemical ingredients extracted from the Fructus Choerospondiatis.
NASA Astrophysics Data System (ADS)
Dragos, Kosmas; Smarsly, Kay
2016-04-01
System identification has been employed in numerous structural health monitoring (SHM) applications. Traditional system identification methods usually rely on centralized processing of structural response data to extract information on structural parameters. However, in wireless SHM systems the centralized processing of structural response data introduces a significant communication bottleneck. Exploiting the merits of decentralization and on-board processing power of wireless SHM systems, many system identification methods have been successfully implemented in wireless sensor networks. While several system identification approaches for wireless SHM systems have been proposed, little attention has been paid to obtaining information on the physical parameters (e.g. stiffness, damping) of the monitored structure. This paper presents a hybrid system identification methodology suitable for wireless sensor networks based on the principles of component mode synthesis (dynamic substructuring). A numerical model of the monitored structure is embedded into the wireless sensor nodes in a distributed manner, i.e. the entire model is segmented into sub-models, each embedded into one sensor node corresponding to the substructure the sensor node is assigned to. The parameters of each sub-model are estimated by extracting local mode shapes and by applying the equations of the Craig-Bampton method on dynamic substructuring. The proposed methodology is validated in a laboratory test conducted on a four-story frame structure to demonstrate the ability of the methodology to yield accurate estimates of stiffness parameters. Finally, the test results are discussed and an outlook on future research directions is provided.
Nondeterministic data base for computerized visual perception
NASA Technical Reports Server (NTRS)
Yakimovsky, Y.
1976-01-01
A description is given of the knowledge representation data base in the perception subsystem of the Mars robot vehicle prototype. Two types of information are stored. The first is generic information that represents general rules that are conformed to by structures in the expected environments. The second kind of information is a specific description of a structure, i.e., the properties and relations of objects in the specific case being analyzed. The generic knowledge is represented so that it can be applied to extract and infer the description of specific structures. The generic model of the rules is substantially a Bayesian representation of the statistics of the environment, which means it is geared to representation of nondeterministic rules relating properties of, and relations between, objects. The description of a specific structure is also nondeterministic in the sense that all properties and relations may take a range of values with an associated probability distribution.
NASA Astrophysics Data System (ADS)
Buscema, Massimo; Asadi-Zeydabadi, Masoud; Lodwick, Weldon; Breda, Marco
2016-04-01
Significant applications such as the analysis of Alzheimer's disease differentiated from dementia, or in data mining of social media, or in extracting information of drug cartel structural composition, are often modeled as graphs. The structural or topological complexity or lack of it in a graph is quite often useful in understanding and more importantly, resolving the problem. We are proposing a new index we call the H0function to measure the structural/topological complexity of a graph. To do this, we introduce the concept of graph pruning and its associated algorithm that is used in the development of our measure. We illustrate the behavior of our measure, the H0 function, through different examples found in the appendix. These examples indicate that the H0 function contains information that is useful and important characteristics of a graph. Here, we restrict ourselves to undirected.
Ting, Valeska P; Henry, Paul F; Schmidtmann, Marc; Wilson, Chick C; Weller, Mark T
2012-05-21
We demonstrate the extent to which modern detector technology, coupled with a high flux constant wavelength neutron source, can be used to obtain high quality diffraction data from short data collections, allowing the refinement of the full structures (including hydrogen positions) of hydrous compounds from in situ neutron powder diffraction measurements. The in situ thermodiffractometry and controlled humidity studies reported here reveal that important information on the reorientations of structural water molecules with changing conditions can be easily extracted, providing insight into the effects of hydrogen bonding on bulk physical properties. Using crystalline BaCl2·2H2O as an example system, we analyse the structural changes in the compound and its dehydration intermediates with changing temperature and humidity levels to demonstrate the quality of the dynamic structural information on the hydrogen atoms and associated hydrogen bonding that can be obtained without resorting to sample deuteration.
Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S
2012-01-01
The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.
Extracting knowledge from the World Wide Web
Henzinger, Monika; Lawrence, Steve
2004-01-01
The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, such as the distribution of web pages over domains, the distribution of interest in different areas, communities related to different topics, the nature of competition in different categories of sites, and the degree of communication between different communities or countries. PMID:14745041
Evaluating Health Information Systems Using Ontologies
Anderberg, Peter; Larsson, Tobias C; Fricker, Samuel A; Berglund, Johan
2016-01-01
Background There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems. Objectives The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems—whether similar or heterogeneous—by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework. Methods On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries. Results The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project. Conclusions The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems. PMID:27311735
Evaluating Health Information Systems Using Ontologies.
Eivazzadeh, Shahryar; Anderberg, Peter; Larsson, Tobias C; Fricker, Samuel A; Berglund, Johan
2016-06-16
There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems. The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems-whether similar or heterogeneous-by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework. On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries. The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project. The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems.
E&V (Evaluation and Validation) Reference Manual, Version 1.1
1988-10-20
E&V. This model will allow the user to arrive at E&V techniques through many different paths, and provides a means to extract useful information...electronically (preferred) to szymansk@ajpo.sei.cmu.edu or by regular mail to Mr. Raymond Szymanski , AFWAL/AAAF, Wright Patterson AFB, OH 45433-6543. ES-2 E&V...1, 1-3 illustrate the types of infor- mation to be extracted from each document. Chapter 2 provides a more detailed description of the structure and
NASA Astrophysics Data System (ADS)
Ohar, Orest P.; Lizotte, Todd E.
2009-08-01
Over the years law enforcement has become increasingly complex, driving a need for a better level of organization of knowledge within policing. The use of COMPSTAT or other Geospatial Information Systems (GIS) for crime mapping and analysis has provided opportunities for careful analysis of crime trends. By identifying hotspots within communities, data collected and entered into these systems can be analyzed to determine how, when and where law enforcement assets can be deployed efficiently. This paper will introduce in detail, a powerful new law enforcement and forensic investigative technology called Intentional Firearm Microstamping (IFM). Once embedded and deployed into firearms, IFM will provide data for identifying and tracking the sources of illegally trafficked firearms within the borders of the United States and across the border with Mexico. Intentional Firearm Microstamping is a micro code technology that leverages a laser based micromachining process to form optimally located, microscopic "intentional structures and marks" on components within a firearm. Thus when the firearm is fired, these IFM structures transfer an identifying tracking code onto the expended cartridge that is ejected from the firearm. Intentional Firearm Microstamped structures are laser micromachined alpha numeric and encoded geometric tracking numbers, linked to the serial number of the firearm. IFM codes can be extracted quickly and used without the need to recover the firearm. Furthermore, through the process of extraction, IFM codes can be quantitatively verified to a higher level of certainty as compared to traditional forensic matching techniques. IFM provides critical intelligence capable of identifying straw purchasers, trafficking routes and networks across state borders and can be used on firearms illegally exported across international borders. This paper will outline IFM applications for supporting intelligence led policing initiatives, IFM implementation strategies, describe the how IFM overcomes the firearms stochastic properties and explain the code extraction technologies that can be used by forensic investigators and discuss the applications where the extracted data will benefit geospatial information systems for forensic intelligence benefit.
Oil, Earth mass and gravitational force.
Moustafa, Khaled
2016-04-01
Fossil fuels are intensively extracted from around the world faster than they are renewed. Regardless of direct and indirect effects of such extractions on climate change and biosphere, another issue relating to Earth's internal structure and Earth mass should receive at least some interest. According to the Energy Information Administration (EIA), about 34 billion barrels of oil (~4.7 trillion metric tons) and 9 billion tons of coal have been extracted in 2014 worldwide. Converting the amounts of oil and coal extracted over the last 3 decades and their respective reserves, intended to be extracted in the future, into mass values suggests that about 355 trillion tons, or ~5.86∗10(-9) (~0.0000000058)% of the Earth mass, would be 'lost'. Although this is a tiny percentage, modeling the potential loss of Earth mass may help figuring out a critical threshold of mass loss that should not be exceeded. Here, I briefly discuss whether such loss would have any potential consequences on the Earth's internal structure and on its gravitational force based on the Newton's law of gravitation that links the attraction force between planets to their respective masses and the distance that separate them. Copyright © 2016 Elsevier B.V. All rights reserved.
Automatic facial animation parameters extraction in MPEG-4 visual communication
NASA Astrophysics Data System (ADS)
Yang, Chenggen; Gong, Wanwei; Yu, Lu
2002-01-01
Facial Animation Parameters (FAPs) are defined in MPEG-4 to animate a facial object. The algorithm proposed in this paper to extract these FAPs is applied to very low bit-rate video communication, in which the scene is composed of a head-and-shoulder object with complex background. This paper addresses the algorithm to automatically extract all FAPs needed to animate a generic facial model, estimate the 3D motion of head by points. The proposed algorithm extracts human facial region by color segmentation and intra-frame and inter-frame edge detection. Facial structure and edge distribution of facial feature such as vertical and horizontal gradient histograms are used to locate the facial feature region. Parabola and circle deformable templates are employed to fit facial feature and extract a part of FAPs. A special data structure is proposed to describe deformable templates to reduce time consumption for computing energy functions. Another part of FAPs, 3D rigid head motion vectors, are estimated by corresponding-points method. A 3D head wire-frame model provides facial semantic information for selection of proper corresponding points, which helps to increase accuracy of 3D rigid object motion estimation.
Neutron Polarization Analysis for Biphasic Solvent Extraction Systems
Motokawa, Ryuhei; Endo, Hitoshi; Nagao, Michihiro; ...
2016-06-16
Here we performed neutron polarization analysis (NPA) of extracted organic phases containing complexes, comprised of Zr(NO 3) 4 and tri-n-butyl phosphate, which enabled decomposition of the intensity distribution of small-angle neutron scattering (SANS) into the coherent and incoherent scattering components. The coherent scattering intensity, containing structural information, and the incoherent scattering compete over a wide range of magnitude of scattering vector, q, specifically when q is larger than q* ≈ 1/R g, where R g is the radius of gyration of scatterer. Therefore, it is important to determine the incoherent scattering intensity exactly to perform an accurate structural analysis frommore » SANS data when R g is small, such as the aforementioned extracted coordination species. Although NPA is the best method for evaluating the incoherent scattering component for accurately determining the coherent scattering in SANS, this method is not used frequently in SANS data analysis because it is technically challenging. In this study, we successfully demonstrated that experimental determination of the incoherent scattering using NPA is suitable for sample systems containing a small scatterer with a weak coherent scattering intensity, such as extracted complexes in biphasic solvent extraction systems.« less
NASA Astrophysics Data System (ADS)
Davenport, Jack H.
2016-05-01
Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.
Ad-Hoc Queries over Document Collections - A Case Study
NASA Astrophysics Data System (ADS)
Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker
We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
Human listening studies reveal insights into object features extracted by echolocating dolphins
NASA Astrophysics Data System (ADS)
Delong, Caroline M.; Au, Whitlow W. L.; Roitblat, Herbert L.
2004-05-01
Echolocating dolphins extract object feature information from the acoustic parameters of object echoes. However, little is known about which object features are salient to dolphins or how they extract those features. To gain insight into how dolphins might be extracting feature information, human listeners were presented with echoes from objects used in a dolphin echoic-visual cross-modal matching task. Human participants performed a task similar to the one the dolphin had performed; however, echoic samples consisting of 23-echo trains were presented via headphones. The participants listened to the echoic sample and then visually selected the correct object from among three alternatives. The participants performed as well as or better than the dolphin (M=88.0% correct), and reported using a combination of acoustic cues to extract object features (e.g., loudness, pitch, timbre). Participants frequently reported using the pattern of aural changes in the echoes across the echo train to identify the shape and structure of the objects (e.g., peaks in loudness or pitch). It is likely that dolphins also attend to the pattern of changes across echoes as objects are echolocated from different angles.
From Deuterium to Free Neutrons - Recent Experimental Results
NASA Astrophysics Data System (ADS)
Kuhn, Sebastian
2009-05-01
Lepton scattering has long been used to gather data on the internal structure of both protons and neutrons. Assuming isospin symmetry, these data can be used to pin down the contributions of both u and d quarks to the spatial and momentum-spin structure of the nucleon and its excitations. In this context, information on the neutron is crucial and is typically obtained from experiments on few-body nuclear targets (predominantly ^3He and deuterium). However, the need to account for binding effects complicates the interpretation of these experiments. On the other hand, detailed studies of the reaction mechanism can yield important new information on the structure of few-body nuclei and the interplay of nuclear and quark degrees of freedom. Recent theoretical and experimental advances have allowed us to make significant progress on both fronts -- a cleaner extraction of neutron properties from nuclear data and a better understanding of nuclear modifications of the bound neutron structure. I will concentrate on recent results on the deuteron. I will present a new extraction of neutron spin structure functions in the resonance and large-x region (from the EG1 experiment with CLAS at Jefferson Lab). The same data can also be used for a detailed comparison with modern calculations of quasi-elastic spin-dependent scattering on the deuteron. A second experimental program with CLAS uses the technique of ``spectator tagging'' to extract the unpolarized structure functions of the neutron with minimal uncertainties from nuclear effects. By mapping out the dependence of the cross section on the ``spectator'' momentum, we can learn about final state interactions between the struck nucleon and the spectator, as well as modifications of the neutron structure due to nuclear binding. I will present preliminary results from the ``BoNuS'' experiment which pushed the detection limit of the spectator proton down to momenta of 70 MeV/c, where nuclear corrections should become small.
A structural SVM approach for reference parsing.
Zhang, Xiaoli; Zou, Jie; Le, Daniel X; Thoma, George R
2011-06-09
Automated extraction of bibliographic data, such as article titles, author names, abstracts, and references is essential to the affordable creation of large citation databases. References, typically appearing at the end of journal articles, can also provide valuable information for extracting other bibliographic data. Therefore, parsing individual reference to extract author, title, journal, year, etc. is sometimes a necessary preprocessing step in building citation-indexing systems. The regular structure in references enables us to consider reference parsing a sequence learning problem and to study structural Support Vector Machine (structural SVM), a newly developed structured learning algorithm on parsing references. In this study, we implemented structural SVM and used two types of contextual features to compare structural SVM with conventional SVM. Both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. We also compared SVM and structural SVM to Conditional Random Field (CRF). The experimental results show that structural SVM and CRF achieve similar accuracies at token- and chunk-levels. When only basic observation features are used for each token, structural SVM achieves higher performance compared to SVM since it utilizes the contextual label features. However, when the contextual observation features from neighboring tokens are combined, SVM performance improves greatly, and is close to that of structural SVM after adding the second order contextual observation features. The comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timoshenko, Janis; Frenkel, Anatoly I.; Cintins, Arturs
The knowledge of coordination environment around various atomic species in many functional materials provides a key for explaining their properties and working mechanisms. Many structural motifs and their transformations are difficult to detect and quantify in the process of work (operando conditions), due to their local nature, small changes, low dimensionality of the material, and/or extreme conditions. Here we use artificial neural network approach to extract the information on the local structure and its in-situ changes directly from the X-ray absorption fine structure spectra. We illustrate this capability by extracting the radial distribution function (RDF) of atoms in ferritic andmore » austenitic phases of bulk iron across the temperature-induced transition. Integration of RDFs allows us to quantify the changes in the iron coordination and material density, and to observe the transition from body-centered to face-centered cubic arrangement of iron atoms. Furthermore, this method is attractive for a broad range of materials and experimental conditions« less
Timoshenko, Janis; Frenkel, Anatoly I.; Cintins, Arturs; ...
2018-05-25
The knowledge of coordination environment around various atomic species in many functional materials provides a key for explaining their properties and working mechanisms. Many structural motifs and their transformations are difficult to detect and quantify in the process of work (operando conditions), due to their local nature, small changes, low dimensionality of the material, and/or extreme conditions. Here we use artificial neural network approach to extract the information on the local structure and its in-situ changes directly from the X-ray absorption fine structure spectra. We illustrate this capability by extracting the radial distribution function (RDF) of atoms in ferritic andmore » austenitic phases of bulk iron across the temperature-induced transition. Integration of RDFs allows us to quantify the changes in the iron coordination and material density, and to observe the transition from body-centered to face-centered cubic arrangement of iron atoms. Furthermore, this method is attractive for a broad range of materials and experimental conditions« less
NASA Astrophysics Data System (ADS)
Timoshenko, Janis; Anspoks, Andris; Cintins, Arturs; Kuzmin, Alexei; Purans, Juris; Frenkel, Anatoly I.
2018-06-01
The knowledge of the coordination environment around various atomic species in many functional materials provides a key for explaining their properties and working mechanisms. Many structural motifs and their transformations are difficult to detect and quantify in the process of work (operando conditions), due to their local nature, small changes, low dimensionality of the material, and/or extreme conditions. Here we use an artificial neural network approach to extract the information on the local structure and its in situ changes directly from the x-ray absorption fine structure spectra. We illustrate this capability by extracting the radial distribution function (RDF) of atoms in ferritic and austenitic phases of bulk iron across the temperature-induced transition. Integration of RDFs allows us to quantify the changes in the iron coordination and material density, and to observe the transition from a body-centered to a face-centered cubic arrangement of iron atoms. This method is attractive for a broad range of materials and experimental conditions.
The Design of Case Products’ Shape Form Information Database Based on NURBS Surface
NASA Astrophysics Data System (ADS)
Liu, Xing; Liu, Guo-zhong; Xu, Nuo-qi; Zhang, Wei-she
2017-07-01
In order to improve the computer design of product shape design,applying the Non-uniform Rational B-splines(NURBS) of curves and surfaces surface to the representation of the product shape helps designers to design the product effectively.On the basis of the typical product image contour extraction and using Pro/Engineer(Pro/E) to extract the geometric feature of scanning mold,in order to structure the information data base system of value point,control point and node vector parameter information,this paper put forward a unified expression method of using NURBS curves and surfaces to describe products’ geometric shape and using matrix laboratory(MATLAB) to simulate when products have the same or similar function.A case study of electric vehicle’s front cover illustrates the access process of geometric shape information of case product in this paper.This method can not only greatly reduce the capacity of information debate,but also improve the effectiveness of computer aided geometric innovation modeling.
Automatic generation of Web mining environments
NASA Astrophysics Data System (ADS)
Cibelli, Maurizio; Costagliola, Gennaro
1999-02-01
The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
A two-level cache for distributed information retrieval in search engines.
Zhang, Weizhe; He, Hui; Ye, Jianwei
2013-01-01
To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache.
A Two-Level Cache for Distributed Information Retrieval in Search Engines
Zhang, Weizhe; He, Hui; Ye, Jianwei
2013-01-01
To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache. PMID:24363621
SPHERE: SPherical Harmonic Elastic REgistration of HARDI Data
Yap, Pew-Thian; Chen, Yasheng; An, Hongyu; Yang, Yang; Gilmore, John H.; Lin, Weili
2010-01-01
In contrast to the more common Diffusion Tensor Imaging (DTI), High Angular Resolution Diffusion Imaging (HARDI) allows superior delineation of angular microstructures of brain white matter, and makes possible multiple-fiber modeling of each voxel for better characterization of brain connectivity. However, the complex orientation information afforded by HARDI makes registration of HARDI images more complicated than scalar images. In particular, the question of how much orientation information is needed for satisfactory alignment has not been sufficiently addressed. Low order orientation representation is generally more robust than high order representation, although the latter provides more information for correct alignment of fiber pathways. However, high order representation, when naïvely utilized, might not necessarily be conducive to improving registration accuracy since similar structures with significant orientation differences prior to proper alignment might be mistakenly taken as non-matching structures. We present in this paper a HARDI registration algorithm, called SPherical Harmonic Elastic REgistration (SPHERE), which in a principled means hierarchically extracts orientation information from HARDI data for structural alignment. The image volumes are first registered using robust, relatively direction invariant features derived from the Orientation Distribution Function (ODF), and the alignment is then further refined using spherical harmonic (SH) representation with gradually increasing orders. This progression from non-directional, single-directional to multi-directional representation provides a systematic means of extracting directional information given by diffusion-weighted imaging. Coupled with a template-subject-consistent soft-correspondence-matching scheme, this approach allows robust and accurate alignment of HARDI data. Experimental results show marked increase in accuracy over a state-of-the-art DTI registration algorithm. PMID:21147231
Biomass Increases Go under Cover: Woody Vegetation Dynamics in South African Rangelands
Mograbi, Penelope J.; Knapp, David E.; Martin, Roberta E.; Main, Russell
2015-01-01
Woody biomass dynamics are an expression of ecosystem function, yet biomass estimates do not provide information on the spatial distribution of woody vegetation within the vertical vegetation subcanopy. We demonstrate the ability of airborne light detection and ranging (LiDAR) to measure aboveground biomass and subcanopy structure, as an explanatory tool to unravel vegetation dynamics in structurally heterogeneous landscapes. We sampled three communal rangelands in Bushbuckridge, South Africa, utilised by rural communities for fuelwood harvesting. Woody biomass estimates ranged between 9 Mg ha-1 on gabbro geology sites to 27 Mg ha-1 on granitic geology sites. Despite predictions of woodland depletion due to unsustainable fuelwood extraction in previous studies, biomass in all the communal rangelands increased between 2008 and 2012. Annual biomass productivity estimates (10–14% p.a.) were higher than previous estimates of 4% and likely a significant contributor to the previous underestimations of modelled biomass supply. We show that biomass increases are attributable to growth of vegetation <5 m in height, and that, in the high wood extraction rangeland, 79% of the changes in the vertical vegetation subcanopy are gains in the 1-3m height class. The higher the wood extraction pressure on the rangelands, the greater the biomass increases in the low height classes within the subcanopy, likely a strong resprouting response to intensive harvesting. Yet, fuelwood shortages are still occurring, as evidenced by the losses in the tall tree height class in the high extraction rangeland. Loss of large trees and gain in subcanopy shrubs could result in a structurally simple landscape with reduced functional capacity. This research demonstrates that intensive harvesting can, paradoxically, increase biomass and this has implications for the sustainability of ecosystem service provision. The structural implications of biomass increases in communal rangelands could be misinterpreted as woodland recovery in the absence of three-dimensional, subcanopy information. PMID:25969985
Extraction of membrane structure in eyeball from MR volumes
NASA Astrophysics Data System (ADS)
Oda, Masahiro; Kin, Taichi; Mori, Kensaku
2017-03-01
This paper presents an accurate extraction method of spherical shaped membrane structures in the eyeball from MR volumes. In ophthalmic surgery, operation field is limited to a small region. Patient specific surgical simulation is useful to reduce complications. Understanding of tissue structure in the eyeball of a patient is required to achieve patient specific surgical simulations. Previous extraction methods of tissue structure in the eyeball use optical coherence tomography (OCT) images. Although OCT images have high resolution, imaging regions are limited to very small. Global structure extraction of the eyeball is difficult from OCT images. We propose an extraction method of spherical shaped membrane structures including the sclerotic coat, choroid, and retina. This method is applied to a T2 weighted MR volume of the head region. MR volume can capture tissue structure of whole eyeball. Because we use MR volumes, out method extracts whole membrane structures in the eyeball. We roughly extract membrane structures by applying a sheet structure enhancement filter. The rough extraction result includes parts of the membrane structures. Then, we apply the Hough transform to extract a sphere structure from the voxels set of the rough extraction result. The Hough transform finds a sphere structure from the rough extraction result. An experimental result using a T2 weighted MR volume of the head region showed that the proposed method can extract spherical shaped membrane structures accurately.
Line segment extraction for large scale unorganized point clouds
NASA Astrophysics Data System (ADS)
Lin, Yangbin; Wang, Cheng; Cheng, Jun; Chen, Bili; Jia, Fukai; Chen, Zhonggui; Li, Jonathan
2015-04-01
Line segment detection in images is already a well-investigated topic, although it has received considerably less attention in 3D point clouds. Benefiting from current LiDAR devices, large-scale point clouds are becoming increasingly common. Most human-made objects have flat surfaces. Line segments that occur where pairs of planes intersect give important information regarding the geometric content of point clouds, which is especially useful for automatic building reconstruction and segmentation. This paper proposes a novel method that is capable of accurately extracting plane intersection line segments from large-scale raw scan points. The 3D line-support region, namely, a point set near a straight linear structure, is extracted simultaneously. The 3D line-support region is fitted by our Line-Segment-Half-Planes (LSHP) structure, which provides a geometric constraint for a line segment, making the line segment more reliable and accurate. We demonstrate our method on the point clouds of large-scale, complex, real-world scenes acquired by LiDAR devices. We also demonstrate the application of 3D line-support regions and their LSHP structures on urban scene abstraction.
Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo
2017-12-01
Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures. Source code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr. zheng@itp.ac.cn or dbu@ict.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
MIMS - MEDICAL INFORMATION MANAGEMENT SYSTEM
NASA Technical Reports Server (NTRS)
Frankowski, J. W.
1994-01-01
MIMS, Medical Information Management System is an interactive, general purpose information storage and retrieval system. It was first designed to be used in medical data management, and can be used to handle all aspects of data related to patient care. Other areas of application for MIMS include: managing occupational safety data in the public and private sectors; handling judicial information where speed and accuracy are high priorities; systemizing purchasing and procurement systems; and analyzing organizational cost structures. Because of its free format design, MIMS can offer immediate assistance where manipulation of large data bases is required. File structures, data categories, field lengths and formats, including alphabetic and/or numeric, are all user defined. The user can quickly and efficiently extract, display, and analyze the data. Three means of extracting data are provided: certain short items of information, such as social security numbers, can be used to uniquely identify each record for quick access; records can be selected which match conditions defined by the user; and specific categories of data can be selected. Data may be displayed and analyzed in several ways which include: generating tabular information assembled from comparison of all the records on the system; generating statistical information on numeric data such as means, standard deviations and standard errors; and displaying formatted listings of output data. The MIMS program is written in Microsoft FORTRAN-77. It was designed to operate on IBM Personal Computers and compatibles running under PC or MS DOS 2.00 or higher. MIMS was developed in 1987.
Recent progress in automatically extracting information from the pharmacogenomic literature
Garten, Yael; Coulet, Adrien; Altman, Russ B
2011-01-01
The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206
Imaging nanoscale lattice variations by machine learning of x-ray diffraction microscopy data
Laanait, Nouamane; Zhang, Zhan; Schlepütz, Christian M.
2016-08-09
In this paper, we present a novel methodology based on machine learning to extract lattice variations in crystalline materials, at the nanoscale, from an x-ray Bragg diffraction-based imaging technique. By employing a full-field microscopy setup, we capture real space images of materials, with imaging contrast determined solely by the x-ray diffracted signal. The data sets that emanate from this imaging technique are a hybrid of real space information (image spatial support) and reciprocal lattice space information (image contrast), and are intrinsically multidimensional (5D). By a judicious application of established unsupervised machine learning techniques and multivariate analysis to this multidimensional datamore » cube, we show how to extract features that can be ascribed physical interpretations in terms of common structural distortions, such as lattice tilts and dislocation arrays. Finally, we demonstrate this 'big data' approach to x-ray diffraction microscopy by identifying structural defects present in an epitaxial ferroelectric thin-film of lead zirconate titanate.« less
Imaging nanoscale lattice variations by machine learning of x-ray diffraction microscopy data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laanait, Nouamane; Zhang, Zhan; Schlepütz, Christian M.
In this paper, we present a novel methodology based on machine learning to extract lattice variations in crystalline materials, at the nanoscale, from an x-ray Bragg diffraction-based imaging technique. By employing a full-field microscopy setup, we capture real space images of materials, with imaging contrast determined solely by the x-ray diffracted signal. The data sets that emanate from this imaging technique are a hybrid of real space information (image spatial support) and reciprocal lattice space information (image contrast), and are intrinsically multidimensional (5D). By a judicious application of established unsupervised machine learning techniques and multivariate analysis to this multidimensional datamore » cube, we show how to extract features that can be ascribed physical interpretations in terms of common structural distortions, such as lattice tilts and dislocation arrays. Finally, we demonstrate this 'big data' approach to x-ray diffraction microscopy by identifying structural defects present in an epitaxial ferroelectric thin-film of lead zirconate titanate.« less
Learning to classify wakes from local sensory information
NASA Astrophysics Data System (ADS)
Alsalman, Mohamad; Colvert, Brendan; Kanso, Eva; Kanso Team
2017-11-01
Aquatic organisms exhibit remarkable abilities to sense local flow signals contained in their fluid environment and to surmise the origins of these flows. For example, fish can discern the information contained in various flow structures and utilize this information for obstacle avoidance and prey tracking. Flow structures created by flapping and swimming bodies are well characterized in the fluid dynamics literature; however, such characterization relies on classical methods that use an external observer to reconstruct global flow fields. The reconstructed flows, or wakes, are then classified according to the unsteady vortex patterns. Here, we propose a new approach for wake identification: we classify the wakes resulting from a flapping airfoil by applying machine learning algorithms to local flow information. In particular, we simulate the wakes of an oscillating airfoil in an incoming flow, extract the downstream vorticity information, and train a classifier to learn the different flow structures and classify new ones. This data-driven approach provides a promising framework for underwater navigation and detection in application to autonomous bio-inspired vehicles.
Text mining and its potential applications in systems biology.
Ananiadou, Sophia; Kell, Douglas B; Tsujii, Jun-ichi
2006-12-01
With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.
Jagannathan, V; Mullett, Charles J; Arbogast, James G; Halbritter, Kevin A; Yellapragada, Deepthi; Regulapati, Sushmitha; Bandaru, Pavani
2009-04-01
We assessed the current state of commercial natural language processing (NLP) engines for their ability to extract medication information from textual clinical documents. Two thousand de-identified discharge summaries and family practice notes were submitted to four commercial NLP engines with the request to extract all medication information. The four sets of returned results were combined to create a comparison standard which was validated against a manual, physician-derived gold standard created from a subset of 100 reports. Once validated, the individual vendor results for medication names, strengths, route, and frequency were compared against this automated standard with precision, recall, and F measures calculated. Compared with the manual, physician-derived gold standard, the automated standard was successful at accurately capturing medication names (F measure=93.2%), but performed less well with strength (85.3%) and route (80.3%), and relatively poorly with dosing frequency (48.3%). Moderate variability was seen in the strengths of the four vendors. The vendors performed better with the structured discharge summaries than with the clinic notes in an analysis comparing the two document types. Although automated extraction may serve as the foundation for a manual review process, it is not ready to automate medication lists without human intervention.
Experiences in extraction of contact parameters from process-evaluation test-structures
NASA Technical Reports Server (NTRS)
Lieneweg, Udo
1988-01-01
Six-terminal-contact test structures are introduced for characterizing ohmic contacts between a metal and a heavily doped semiconductor layer. Specifically, the six-terminal test structure supplies the additional information needed in order to calculate the transmission length and eventual corrections to the characteristic resistance per unit width due to finite contact length. The essential feature of this test structure is a square contact with four taps in the lower (semiconductor) layer. Every other one of these taps is used for current injection ('front'). From the voltage drop at the opposite tap and the side taps, the 'end' resistance and the 'side' resistances are calculated. The test structures are shown to give valuable information complementary to the common front resistance measurements. The interfacial resistivity is obtained directly after proper correction for flange effects.
Medem, Anna V; Seidling, Hanna M; Eichler, Hans-Georg; Kaltschmidt, Jens; Metzner, Michael; Hubert, Carina M; Czock, David; Haefeli, Walter E
2017-05-01
Electronic clinical decision support systems (CDSS) require drug information that can be processed by computers. The goal of this project was to determine and evaluate a compilation of variables that comprehensively capture the information contained in the summary of product characteristic (SmPC) and unequivocally describe the drug, its dosage options, and clinical pharmacokinetics. An expert panel defined and structured a set of variables and drafted a guideline to extract and enter information on dosage and clinical pharmacokinetics from textual SmPCs as published by the European Medicines Agency (EMA). The set of variables was iteratively revised and evaluated by data extraction and variable allocation of roughly 7% of all centrally approved drugs. The information contained in the SmPC was allocated to three information clusters consisting of 260 variables. The cluster "drug characterization" specifies the nature of the drug. The cluster "dosage" provides information on approved drug dosages and defines corresponding specific conditions. The cluster "clinical pharmacokinetics" includes pharmacokinetic parameters of relevance for dosing in clinical practice. A first evaluation demonstrated that, despite the complexity of the current free text SmPCs, dosage and pharmacokinetic information can be reliably extracted from the SmPCs and comprehensively described by a limited set of variables. By proposing a compilation of variables well describing drug dosage and clinical pharmacokinetics, the project represents a step forward towards the development of a comprehensive database system serving as information source for sophisticated CDSS.
A novel 3D shape descriptor for automatic retrieval of anatomical structures from medical images
NASA Astrophysics Data System (ADS)
Nunes, Fátima L. S.; Bergamasco, Leila C. C.; Delmondes, Pedro H.; Valverde, Miguel A. G.; Jackowski, Marcel P.
2017-03-01
Content-based image retrieval (CBIR) aims at retrieving from a database objects that are similar to an object provided by a query, by taking into consideration a set of extracted features. While CBIR has been widely applied in the two-dimensional image domain, the retrieval of3D objects from medical image datasets using CBIR remains to be explored. In this context, the development of descriptors that can capture information specific to organs or structures is desirable. In this work, we focus on the retrieval of two anatomical structures commonly imaged by Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) techniques, the left ventricle of the heart and blood vessels. Towards this aim, we developed the Area-Distance Local Descriptor (ADLD), a novel 3D local shape descriptor that employs mesh geometry information, namely facet area and distance from centroid to surface, to identify shape changes. Because ADLD only considers surface meshes extracted from volumetric medical images, it substantially diminishes the amount of data to be analyzed. A 90% precision rate was obtained when retrieving both convex (left ventricle) and non-convex structures (blood vessels), allowing for detection of abnormalities associated with changes in shape. Thus, ADLD has the potential to aid in the diagnosis of a wide range of vascular and cardiac diseases.
Staccini, Pascal; Joubert, Michel; Quaranta, Jean-François; Fieschi, Marius
2005-03-01
Today, the economic and regulatory environment, involving activity-based and prospective payment systems, healthcare quality and risk analysis, traceability of the acts performed and evaluation of care practices, accounts for the current interest in clinical and hospital information systems. The structured gathering of information relative to users' needs and system requirements is fundamental when installing such systems. This stage takes time and is generally misconstrued by caregivers and is of limited efficacy to analysts. We used a modelling technique designed for manufacturing processes (IDEF0/SADT). We enhanced the basic model of an activity with descriptors extracted from the Ishikawa cause-and-effect diagram (methods, men, materials, machines, and environment). We proposed an object data model of a process and its components, and programmed a web-based tool in an object-oriented environment. This tool makes it possible to extract the data dictionary of a given process from the description of its elements and to locate documents (procedures, recommendations, instructions) according to each activity or role. Aimed at structuring needs and storing information provided by directly involved teams regarding the workings of an institution (or at least part of it), the process-mapping approach has an important contribution to make in the analysis of clinical information systems.
What Can Interfacial Water Molecules Tell Us About Solute Structure?
NASA Astrophysics Data System (ADS)
Willard, Adam
The molecular structure of bulk liquid water reflects a molecular tendency to engage in tetrahedrally coordinated hydrogen bonding. At a solute interface waters preferred three-dimensional hydrogen bonding network must conform to a locally anisotropy interfacial environment. Interfacial water molecules adopt configurations that balance water-solute and water-water interactions. The arrangements of interfacial water molecules, therefore encode information about the effective solute-water interactions. This solute-specific information is difficult to extract, however, because interfacial structure also reflects waters collective response to an anisotropic hydrogen bonding environment. Here I present a methodology for characterizing the molecular-level structure of liquid water interface from simulation data. This method can be used to explore waters static and/or dynamic response to a wide range of chemically and topologically heterogeneous solutes such as proteins.
Extracting local information from crowds through betting markets
NASA Astrophysics Data System (ADS)
Weijs, Steven
2015-04-01
In this research, a set-up is considered in which users can bet against a forecasting agency to challenge their probabilistic forecasts. From an information theory standpoint, a reward structure is considered that either provides the forecasting agency with better information, paying the successful providers of information for their winning bets, or funds excellent forecasting agencies through users that think they know better. Especially for local forecasts, the approach may help to diagnose model biases and to identify local predictive information that can be incorporated in the models. The challenges and opportunities for implementing such a system in practice are also discussed.
Sulci segmentation using geometric active contours
NASA Astrophysics Data System (ADS)
Torkaman, Mahsa; Zhu, Liangjia; Karasev, Peter; Tannenbaum, Allen
2017-02-01
Sulci are groove-like regions lying in the depth of the cerebral cortex between gyri, which together, form a folded appearance in human and mammalian brains. Sulci play an important role in the structural analysis of the brain, morphometry (i.e., the measurement of brain structures), anatomical labeling and landmark-based registration.1 Moreover, sulcal morphological changes are related to cortical thickness, whose measurement may provide useful information for studying variety of psychiatric disorders. Manually extracting sulci requires complying with complex protocols, which make the procedure both tedious and error prone.2 In this paper, we describe an automatic procedure, employing geometric active contours, which extract the sulci. Sulcal boundaries are obtained by minimizing a certain energy functional whose minimum is attained at the boundary of the given sulci.
PDB Editor: a user-friendly Java-based Protein Data Bank file editor with a GUI.
Lee, Jonas; Kim, Sung Hou
2009-04-01
The Protein Data Bank file format is the format most widely used by protein crystallographers and biologists to disseminate and manipulate protein structures. Despite this, there are few user-friendly software packages available to efficiently edit and extract raw information from PDB files. This limitation often leads to many protein crystallographers wasting significant time manually editing PDB files. PDB Editor, written in Java Swing GUI, allows the user to selectively search, select, extract and edit information in parallel. Furthermore, the program is a stand-alone application written in Java which frees users from the hassles associated with platform/operating system-dependent installation and usage. PDB Editor can be downloaded from http://sourceforge.net/projects/pdbeditorjl/.
Biologically active extracts with kidney affections applications
NASA Astrophysics Data System (ADS)
Pascu (Neagu), Mihaela; Pascu, Daniela-Elena; Cozea, Andreea; Bunaciu, Andrei A.; Miron, Alexandra Raluca; Nechifor, Cristina Aurelia
2015-12-01
This paper is aimed to select plant materials rich in bioflavonoid compounds, made from herbs known for their application performances in the prevention and therapy of renal diseases, namely kidney stones and urinary infections (renal lithiasis, nephritis, urethritis, cystitis, etc.). This paper presents a comparative study of the medicinal plant extracts composition belonging to Ericaceae-Cranberry (fruit and leaves) - Vaccinium vitis-idaea L. and Bilberry (fruit) - Vaccinium myrtillus L. Concentrated extracts obtained from medicinal plants used in this work were analyzed from structural, morphological and compositional points of view using different techniques: chromatographic methods (HPLC), scanning electronic microscopy, infrared, and UV spectrophotometry, also by using kinetic model. Liquid chromatography was able to identify the specific compounds of the Ericaceae family, present in all three extracts, arbutosid, as well as specific components of each species, mostly from the class of polyphenols. The identification and quantitative determination of the active ingredients from these extracts can give information related to their therapeutic effects.
Topological properties of flat electroencephalography's state space
NASA Astrophysics Data System (ADS)
Ken, Tan Lit; Ahmad, Tahir bin; Mohd, Mohd Sham bin; Ngien, Su Kong; Suwa, Tohru; Meng, Ong Sie
2016-02-01
Neuroinverse problem are often associated with complex neuronal activity. It involves locating problematic cell which is highly challenging. While epileptic foci localization is possible with the aid of EEG signals, it relies greatly on the ability to extract hidden information or pattern within EEG signals. Flat EEG being an enhancement of EEG is a way of viewing electroencephalograph on the real plane. In the perspective of dynamical systems, Flat EEG is equivalent to epileptic seizure hence, making it a great platform to study epileptic seizure. Throughout the years, various mathematical tools have been applied on Flat EEG to extract hidden information that is hardly noticeable by traditional visual inspection. While these tools have given worthy results, the journey towards understanding seizure process completely is yet to be succeeded. Since the underlying structure of Flat EEG is dynamic and is deemed to contain wealthy information regarding brainstorm, it would certainly be appealing to explore in depth its structures. To better understand the complex seizure process, this paper studies the event of epileptic seizure via Flat EEG in a more general framework by means of topology, particularly, on the state space where the event of Flat EEG lies.
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
NASA Astrophysics Data System (ADS)
Wang, Zhao; Yang, Shan; Wang, Shuguang; Shen, Yan
2017-10-01
The assessment of the dynamic urban structure has been affected by lack of timely and accurate spatial information for a long period, which has hindered the measurements of structural continuity at the macroscale. Defense meteorological satellite program's operational linescan system (DMSP/OLS) nighttime light (NTL) data provide an ideal source for urban information detection with a long-time span, short-time interval, and wide coverage. In this study, we extracted the physical boundaries of urban clusters from corrected NTL images and quantitatively analyzed the structure of the urban cluster system based on rank-size distribution, spatial metrics, and Mann-Kendall trend test. Two levels of urban cluster systems in the Yangtze River Delta region (YRDR) were examined. We found that (1) in the entire YRDR, the urban cluster system showed a periodic process, with a significant trend of even distribution before 2007 but an unequal growth pattern after 2007, and (2) at the metropolitan level, vast disparities exist in four metropolitan areas for the fluctuations of Pareto's exponent, the speed of cluster expansion, and the dominance of core cluster. The results suggest that the extracted urban cluster information from NTL data effectively reflect the evolving nature of regional urbanization, which in turn can aid in the planning of cities and help achieve more sustainable regional development.
Modelling spatiotemporal change using multidimensional arrays Meng
NASA Astrophysics Data System (ADS)
Lu, Meng; Appel, Marius; Pebesma, Edzer
2017-04-01
The large variety of remote sensors, model simulations, and in-situ records provide great opportunities to model environmental change. The massive amount of high-dimensional data calls for methods to integrate data from various sources and to analyse spatiotemporal and thematic information jointly. An array is a collection of elements ordered and indexed in arbitrary dimensions, which naturally represent spatiotemporal phenomena that are identified by their geographic locations and recording time. In addition, array regridding (e.g., resampling, down-/up-scaling), dimension reduction, and spatiotemporal statistical algorithms are readily applicable to arrays. However, the role of arrays in big geoscientific data analysis has not been systematically studied: How can arrays discretise continuous spatiotemporal phenomena? How can arrays facilitate the extraction of multidimensional information? How can arrays provide a clean, scalable and reproducible change modelling process that is communicable between mathematicians, computer scientist, Earth system scientist and stakeholders? This study emphasises on detecting spatiotemporal change using satellite image time series. Current change detection methods using satellite image time series commonly analyse data in separate steps: 1) forming a vegetation index, 2) conducting time series analysis on each pixel, and 3) post-processing and mapping time series analysis results, which does not consider spatiotemporal correlations and ignores much of the spectral information. Multidimensional information can be better extracted by jointly considering spatial, spectral, and temporal information. To approach this goal, we use principal component analysis to extract multispectral information and spatial autoregressive models to account for spatial correlation in residual based time series structural change modelling. We also discuss the potential of multivariate non-parametric time series structural change methods, hierarchical modelling, and extreme event detection methods to model spatiotemporal change. We show how array operations can facilitate expressing these methods, and how the open-source array data management and analytics software SciDB and R can be used to scale the process and make it easily reproducible.
Text Extraction from Scene Images by Character Appearance and Structure Modeling
Yi, Chucai; Tian, Yingli
2012-01-01
In this paper, we propose a novel algorithm to detect text information from natural scene images. Scene text classification and detection are still open research topics. Our proposed algorithm is able to model both character appearance and structure to generate representative and discriminative text descriptors. The contributions of this paper include three aspects: 1) a new character appearance model by a structure correlation algorithm which extracts discriminative appearance features from detected interest points of character samples; 2) a new text descriptor based on structons and correlatons, which model character structure by structure differences among character samples and structure component co-occurrence; and 3) a new text region localization method by combining color decomposition, character contour refinement, and string line alignment to localize character candidates and refine detected text regions. We perform three groups of experiments to evaluate the effectiveness of our proposed algorithm, including text classification, text detection, and character identification. The evaluation results on benchmark datasets demonstrate that our algorithm achieves the state-of-the-art performance on scene text classification and detection, and significantly outperforms the existing algorithms for character identification. PMID:23316111
Fan, Ming; Zheng, Bin; Li, Lihua
2015-10-01
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Qian, Jianjun; Yang, Jian; Xu, Yong
2013-09-01
This paper presents a robust but simple image feature extraction method, called image decomposition based on local structure (IDLS). It is assumed that in the local window of an image, the macro-pixel (patch) of the central pixel, and those of its neighbors, are locally linear. IDLS captures the local structural information by describing the relationship between the central macro-pixel and its neighbors. This relationship is represented with the linear representation coefficients determined using ridge regression. One image is actually decomposed into a series of sub-images (also called structure images) according to a local structure feature vector. All the structure images, after being down-sampled for dimensionality reduction, are concatenated into one super-vector. Fisher linear discriminant analysis is then used to provide a low-dimensional, compact, and discriminative representation for each super-vector. The proposed method is applied to face recognition and examined using our real-world face image database, NUST-RWFR, and five popular, publicly available, benchmark face image databases (AR, Extended Yale B, PIE, FERET, and LFW). Experimental results show the performance advantages of IDLS over state-of-the-art algorithms.
Structural study of complexes formed by acidic and neutral organophosphorus reagents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Braatz, Alexander D.; Antonio, Mark R.; Nilsson, Mikael
The coordination of the trivalent 4f ions, Ln = La 3+, Dy 3+, and Lu 3+, with neutral and acidic organophosphorus reagents, both individually and combined, was studied by use of X-ray absorption spectroscopy. These studies provide metrical information about the interatomic interactions between these cations and the ligands tri- n-butyl phosphate (TBP) and di- n-butyl phosphoric acid (HDBP), whose behavior are of practical importance to chemical separation processes that are currently used on an industrial scale. Previous studies have suggested the existence of complexes involving a mixture of ligands, accounting for extraction synergy. Through systematic variation of the aqueousmore » phase acidity and extractant concentration and combination, we have found that complexes with Ln and TBP : HDBP at any mixture and HDBP alone involve direct Ln–O interactions involving 6 oxygen atoms and distant Ln–P interactions involving on average 3–5 phosphorus atoms per Ln ion. It was also found that Ln complexes formed by TBP alone seem to favor eight oxygen coordination, though we were unable to obtain metrical results regarding the distant Ln–P interactions due to the low signal attributed to a lower concentration of Ln ions in the organic phases. Our study does not support the existence of mixed Ln–TBP–HDBP complexes but, rather, indicates that the lanthanides are extracted as either Ln–HDBP complexes or Ln–TBP complexes and that these complexes exist in different ratios depending on the conditions of the extraction system. Furthermore, this fundamental structural information offers insight into the solvent extraction processes that are taking place and are of particular importance to issues arising from the separation and disposal of radioactive materials from used nuclear fuel.« less
Structural study of complexes formed by acidic and neutral organophosphorus reagents
Braatz, Alexander D.; Antonio, Mark R.; Nilsson, Mikael
2016-12-23
The coordination of the trivalent 4f ions, Ln = La 3+, Dy 3+, and Lu 3+, with neutral and acidic organophosphorus reagents, both individually and combined, was studied by use of X-ray absorption spectroscopy. These studies provide metrical information about the interatomic interactions between these cations and the ligands tri- n-butyl phosphate (TBP) and di- n-butyl phosphoric acid (HDBP), whose behavior are of practical importance to chemical separation processes that are currently used on an industrial scale. Previous studies have suggested the existence of complexes involving a mixture of ligands, accounting for extraction synergy. Through systematic variation of the aqueousmore » phase acidity and extractant concentration and combination, we have found that complexes with Ln and TBP : HDBP at any mixture and HDBP alone involve direct Ln–O interactions involving 6 oxygen atoms and distant Ln–P interactions involving on average 3–5 phosphorus atoms per Ln ion. It was also found that Ln complexes formed by TBP alone seem to favor eight oxygen coordination, though we were unable to obtain metrical results regarding the distant Ln–P interactions due to the low signal attributed to a lower concentration of Ln ions in the organic phases. Our study does not support the existence of mixed Ln–TBP–HDBP complexes but, rather, indicates that the lanthanides are extracted as either Ln–HDBP complexes or Ln–TBP complexes and that these complexes exist in different ratios depending on the conditions of the extraction system. Furthermore, this fundamental structural information offers insight into the solvent extraction processes that are taking place and are of particular importance to issues arising from the separation and disposal of radioactive materials from used nuclear fuel.« less
Chen, Guijie; Yuan, Qingxia; Saeeduddin, Muhammad; Ou, Shiyi; Zeng, Xiaoxiong; Ye, Hong
2016-11-20
Tea has a long history of medicinal and dietary use. Tea polysaccharide (TPS) is regarded as one of the main bioactive constituents of tea and is beneficial for health. Over the last decades, considerable efforts have been devoted to the studies on TPS: extraction, structural feature and bioactivity of TPS. However, it has been received much less attention compared with tea polyphenols. In order to provide new insight for further development of TPS in functional foods, in present review we summarize the recent literature, update the information and put forward future perspectives on TPS covering its extraction, purification, quantitative determination techniques as well as physicochemical characterization and bioactivities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Extracting Inter-business Relationship from World Wide Web
NASA Astrophysics Data System (ADS)
Jin, Yingzi; Matsuo, Yutaka; Ishizuka, Mitsuru
Social relation plays an important role in a real community. Interaction patterns reveal relations among actors (such as persons, groups, companies), which can be merged into valuable information as a network structure. In this paper, we propose a new approach to extract inter-business relationship from the Web. Extraction of relation between a pair of companies is realized by using a search engine and text processing. Since names of companies co-appear coincidentaly on the Web, we propose an advanced algorithm which is characterized by addition of keywords (or we call relation words) to a query. The relation words are obtained from either an annotated corpus or the Web. We show some examples and comprehensive evaluations on our approach.
Building a diabetes screening population data repository using electronic medical records.
Tuan, Wen-Jan; Sheehy, Ann M; Smith, Maureen A
2011-05-01
There has been a rapid advancement of information technology in the area of clinical and population health data management since 2000. However, with the fast growth of electronic medical records (EMRs) and the increasing complexity of information systems, it has become challenging for researchers to effectively access, locate, extract, and analyze information critical to their research. This article introduces an outpatient encounter data framework designed to construct an EMR-based population data repository for diabetes screening research. The outpatient encounter data framework is developed on a hybrid data structure of entity-attribute-value models, dimensional models, and relational models. This design preserves a small number of subject-specific tables essential to key clinical constructs in the data repository. It enables atomic information to be maintained in a transparent and meaningful way to researchers and health care practitioners who need to access data and still achieve the same performance level as conventional data warehouse models. A six-layer information processing strategy is developed to extract and transform EMRs to the research data repository. The data structure also complies with both Health Insurance Portability and Accountability Act regulations and the institutional review board's requirements. Although developed for diabetes screening research, the design of the outpatient encounter data framework is suitable for other types of health service research. It may also provide organizations a tool to improve health care quality and efficiency, consistent with the "meaningful use" objectives of the Health Information Technology for Economic and Clinical Health Act. © 2011 Diabetes Technology Society.
Knowledge Discovery in Spectral Data by Means of Complex Networks
Zanin, Massimiliano; Papo, David; Solís, José Luis González; Espinosa, Juan Carlos Martínez; Frausto-Reyes, Claudio; Anda, Pascual Palomares; Sevilla-Escoboza, Ricardo; Boccaletti, Stefano; Menasalvas, Ernestina; Sousa, Pedro
2013-01-01
In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease. PMID:24957895
Knowledge discovery in spectral data by means of complex networks.
Zanin, Massimiliano; Papo, David; Solís, José Luis González; Espinosa, Juan Carlos Martínez; Frausto-Reyes, Claudio; Anda, Pascual Palomares; Sevilla-Escoboza, Ricardo; Jaimes-Reategui, Rider; Boccaletti, Stefano; Menasalvas, Ernestina; Sousa, Pedro
2013-03-11
In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
Boiteau, Rene M.; Hoyt, David W.; Nicora, Carrie D.; ...
2018-01-17
Here, we introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS 2), and NMR in a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS 2 approach is well suited for discovery ofmore » new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases.« less
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boiteau, Rene M.; Hoyt, David W.; Nicora, Carrie D.
Here, we introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS 2), and NMR in a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS 2 approach is well suited for discovery ofmore » new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases.« less
Combined non-parametric and parametric approach for identification of time-variant systems
NASA Astrophysics Data System (ADS)
Dziedziech, Kajetan; Czop, Piotr; Staszewski, Wieslaw J.; Uhl, Tadeusz
2018-03-01
Identification of systems, structures and machines with variable physical parameters is a challenging task especially when time-varying vibration modes are involved. The paper proposes a new combined, two-step - i.e. non-parametric and parametric - modelling approach in order to determine time-varying vibration modes based on input-output measurements. Single-degree-of-freedom (SDOF) vibration modes from multi-degree-of-freedom (MDOF) non-parametric system representation are extracted in the first step with the use of time-frequency wavelet-based filters. The second step involves time-varying parametric representation of extracted modes with the use of recursive linear autoregressive-moving-average with exogenous inputs (ARMAX) models. The combined approach is demonstrated using system identification analysis based on the experimental mass-varying MDOF frame-like structure subjected to random excitation. The results show that the proposed combined method correctly captures the dynamics of the analysed structure, using minimum a priori information on the model.
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
Hoyt, David W.; Nicora, Carrie D.; Kinmonth-Schultz, Hannah A.; Ward, Joy K.
2018-01-01
We introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS2), and NMR into a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter out the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture, and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS2 approach is well suited to the discovery of new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases. PMID:29342073
1993-09-15
and structure of the equations. The Lagrangian for- c and we can extract information for any speed of mulation gives us an extremum principle for the...Dueholm and N.F. Pedersen, J. Appi. [261 For references on this see e.g. N.F. Pedersen, in: Phys. 60 (1986) 1447. SQUID 80, eds. H. Hahlbohm and H...obtained for arbitrary initial conditions, and a number of physical How do we augment the DNLSE (4) to treat features have been extracted [121. The
Diverse structural approaches to haem appropriation by pathogenic bacteria.
Hare, Stephen A
2017-04-01
The critical need for iron presents a challenge for pathogenic bacteria that must survive in an environment bereft of accessible iron due to a natural low bioavailability and their host's nutritional immunity. Appropriating haem, either direct from host haemoproteins or by secreting haem-scavenging haemophores, is one way pathogenic bacteria can overcome this challenge. After capturing their target, haem appropriation systems must remove haem from a high-affinity binding site (on the host haemoprotein or bacterial haemophore) and transfer it to a binding site of lower affinity on a bacterial receptor. Structural information is now available to show how, using a combination of induced structural changes and steric clashes, bacteria are able to extract haem from haemophores, haemopexin and haemoglobin. This review focuses on structural descriptions of these bacterial haem acquisition systems, summarising how they bind haem and their target haemoproteins with particularly emphasis on the mechanism of haem extraction. Copyright © 2017 The Author. Published by Elsevier B.V. All rights reserved.
Isolating contour information from arbitrary images
NASA Technical Reports Server (NTRS)
Jobson, Daniel J.
1989-01-01
Aspects of natural vision (physiological and perceptual) serve as a basis for attempting the development of a general processing scheme for contour extraction. Contour information is assumed to be central to visual recognition skills. While the scheme must be regarded as highly preliminary, initial results do compare favorably with the visual perception of structure. The scheme pays special attention to the construction of a smallest scale circular difference-of-Gaussian (DOG) convolution, calibration of multiscale edge detection thresholds with the visual perception of grayscale boundaries, and contour/texture discrimination methods derived from fundamental assumptions of connectivity and the characteristics of printed text. Contour information is required to fall between a minimum connectivity limit and maximum regional spatial density limit at each scale. Results support the idea that contour information, in images possessing good image quality, is (centered at about 10 cyc/deg and 30 cyc/deg). Further, lower spatial frequency channels appear to play a major role only in contour extraction from images with serious global image defects.
Super resolution reconstruction of μ-CT image of rock sample using neighbour embedding algorithm
NASA Astrophysics Data System (ADS)
Wang, Yuzhu; Rahman, Sheik S.; Arns, Christoph H.
2018-03-01
X-ray computed tomography (μ-CT) is considered to be the most effective way to obtain the inner structure of rock sample without destructions. However, its limited resolution hampers its ability to probe sub-micro structures which is critical for flow transportation of rock sample. In this study, we propose an innovative methodology to improve the resolution of μ-CT image using neighbour embedding algorithm where low frequency information is provided by μ-CT image itself while high frequency information is supplemented by high resolution scanning electron microscopy (SEM) image. In order to obtain prior for reconstruction, a large number of image patch pairs contain high- and low- image patches are extracted from the Gaussian image pyramid generated by SEM image. These image patch pairs contain abundant information about tomographic evolution of local porous structures under different resolution spaces. Relying on the assumption of self-similarity of porous structure, this prior information can be used to supervise the reconstruction of high resolution μ-CT image effectively. The experimental results show that the proposed method is able to achieve the state-of-the-art performance.
Surface studies of solids using integral x-ray-induced photoemission yield
Stoupin, Stanislav; Zhernenkov, Mikhail; Shi, Bing
2016-11-22
X-ray induced photoemission yield contains structural information complementary to that provided by X-ray Fresnel reflectivity, which presents an advantage to a wide variety of surface studies if this information is made easily accessible. Photoemission in materials research is commonly acknowledged as a method with a probing depth limited by the escape depth of the photoelectrons. Here we show that the integral hard-X-ray-induced photoemission yield is modulated by the Fresnel reflectivity of a multilayer structure and carries structural information that extends well beyond the photoelectron escape depth. A simple electric self-detection of the integral photoemission yield and Fourier data analysis permitmore » extraction of thicknesses of individual layers. The approach does not require detection of the reflected radiation and can be considered as a framework for non-invasive evaluation of buried layers with hard X-rays under grazing incidence.« less
Surface studies of solids using integral X-ray-induced photoemission yield
Stoupin, Stanislav; Zhernenkov, Mikhail; Shi, Bing
2016-01-01
X-ray induced photoemission yield contains structural information complementary to that provided by X-ray Fresnel reflectivity, which presents an advantage to a wide variety of surface studies if this information is made easily accessible. Photoemission in materials research is commonly acknowledged as a method with a probing depth limited by the escape depth of the photoelectrons. Here we show that the integral hard-X-ray-induced photoemission yield is modulated by the Fresnel reflectivity of a multilayer structure and carries structural information that extends well beyond the photoelectron escape depth. A simple electric self-detection of the integral photoemission yield and Fourier data analysis permit extraction of thicknesses of individual layers. The approach does not require detection of the reflected radiation and can be considered as a framework for non-invasive evaluation of buried layers with hard X-rays under grazing incidence. PMID:27874041
DOE Office of Scientific and Technical Information (OSTI.GOV)
Okunishi, M.; Pruemper, G.; Shimada, K.
We have measured two-dimensional photoelectron momentum spectra of Ne, Ar, and Xe generated by 800-nm, 100-fs laser pulses and succeeded in identifying the spectral ridge region (back-rescattered ridges) which marks the location of the returning electrons that have been backscattered at their maximum kinetic energies. We demonstrate that the structural information, in particular the differential elastic scattering cross sections of the target ion by free electrons, can be accurately extracted from the intensity distributions of photoelectrons on the ridges, thus effecting a first step toward laser-induced self-imaging of the target, with unprecedented spatial and temporal resolutions.
Kamali, Tschackad; Považay, Boris; Kumar, Sunil; Silberberg, Yaron; Hermann, Boris; Werkmeister, René; Drexler, Wolfgang; Unterhuber, Angelika
2014-10-01
We demonstrate a multimodal optical coherence tomography (OCT) and online Fourier transform coherent anti-Stokes Raman scattering (FTCARS) platform using a single sub-12 femtosecond (fs) Ti:sapphire laser enabling simultaneous extraction of structural and chemical ("morphomolecular") information of biological samples. Spectral domain OCT prescreens the specimen providing a fast ultrahigh (4×12 μm axial and transverse) resolution wide field morphologic overview. Additional complementary intrinsic molecular information is obtained by zooming into regions of interest for fast label-free chemical mapping with online FTCARS spectroscopy. Background-free CARS is based on a Michelson interferometer in combination with a highly linear piezo stage, which allows for quick point-to-point extraction of CARS spectra in the fingerprint region in less than 125 ms with a resolution better than 4 cm(-1) without the need for averaging. OCT morphology and CARS spectral maps indicating phosphate and carbonate bond vibrations from human bone samples are extracted to demonstrate the performance of this hybrid imaging platform.
Technical design and system implementation of region-line primitive association framework
NASA Astrophysics Data System (ADS)
Wang, Min; Xing, Jinjin; Wang, Jie; Lv, Guonian
2017-08-01
Apart from regions, image edge lines are an important information source, and they deserve more attention in object-based image analysis (OBIA) than they currently receive. In the region-line primitive association framework (RLPAF), we promote straight-edge lines as line primitives to achieve powerful OBIAs. Along with regions, straight lines become basic units for subsequent extraction and analysis of OBIA features. This study develops a new software system called remote-sensing knowledge finder (RSFinder) to implement RLPAF for engineering application purposes. This paper introduces the extended technical framework, a comprehensively designed feature set, key technology, and software implementation. To our knowledge, RSFinder is the world's first OBIA system based on two types of primitives, namely, regions and lines. It is fundamentally different from other well-known region-only-based OBIA systems, such as eCogntion and ENVI feature extraction module. This paper has important reference values for the development of similarly structured OBIA systems and line-involved extraction algorithms of remote sensing information.
NASA Astrophysics Data System (ADS)
Darlow, Luke Nicholas; Connan, James
2015-11-01
Surface fingerprint scanners are limited to a two-dimensional representation of the fingerprint topography, and thus, are vulnerable to fingerprint damage, distortion, and counterfeiting. Optical coherence tomography (OCT) scanners are able to image (in three dimensions) the internal structure of the fingertip skin. Techniques for obtaining the internal fingerprint from OCT scans have since been developed. This research presents an internal fingerprint extraction algorithm designed to extract high-quality internal fingerprints from touchless OCT fingertip scans. Furthermore, it serves as a correlation study between surface and internal fingerprints. Provided the scanned region contains sufficient fingerprint information, correlation to the surface topography is shown to be good (74% have true matches). The cross-correlation of internal fingerprints (96% have true matches) is substantial that internal fingerprints can constitute a fingerprint database. The internal fingerprints' performance was also compared to the performance of cropped surface counterparts, to eliminate bias owing to information level present, showing that the internal fingerprints' performance is superior 63.6% of the time.
Sadeh, Talya; Maril, Anat; Bitan, Tali; Goshen-Gottstein, Yonatan
2012-03-01
A remarkable act of memory entails binding different forms of information. We focus on the timeless question of how the bound engram is accessed such that its component features-item and context-are extracted. To shed light on this question, we investigate the dynamics between brain structures that together mediate the binding and extraction of item and context. Converging evidence has implicated the Parahippocampal cortex (PHc) in contextual processing, the Perirhinal cortex (PRc) in item processing, and the hippocampus in item-context binding. Effective connectivity analysis was conducted on fMRI data gathered during retrieval on tests that differ with regard to the to-be-extracted information. Results revealed that recall is initiated by context-related PHc activity, followed by hippocampal item-context engram activation, and completed with retrieval of the study-item by the PRc. The reverse path was found for recognition. We thus provide novel evidence for dissociative patterns of item-context unbinding during retrieval. Copyright © 2011 Elsevier Inc. All rights reserved.
Versatile and efficient pore network extraction method using marker-based watershed segmentation
NASA Astrophysics Data System (ADS)
Gostick, Jeff T.
2017-08-01
Obtaining structural information from tomographic images of porous materials is a critical component of porous media research. Extracting pore networks is particularly valuable since it enables pore network modeling simulations which can be useful for a host of tasks from predicting transport properties to simulating performance of entire devices. This work reports an efficient algorithm for extracting networks using only standard image analysis techniques. The algorithm was applied to several standard porous materials ranging from sandstone to fibrous mats, and in all cases agreed very well with established or known values for pore and throat sizes, capillary pressure curves, and permeability. In the case of sandstone, the present algorithm was compared to the network obtained using the current state-of-the-art algorithm, and very good agreement was achieved. Most importantly, the network extracted from an image of fibrous media correctly predicted the anisotropic permeability tensor, demonstrating the critical ability to detect key structural features. The highly efficient algorithm allows extraction on fairly large images of 5003 voxels in just over 200 s. The ability for one algorithm to match materials as varied as sandstone with 20% porosity and fibrous media with 75% porosity is a significant advancement. The source code for this algorithm is provided.
Change Detection in High-Resolution Remote Sensing Images Using Levene-Test and Fuzzy Evaluation
NASA Astrophysics Data System (ADS)
Wang, G. H.; Wang, H. B.; Fan, W. F.; Liu, Y.; Liu, H. J.
2018-04-01
High-resolution remote sensing images possess complex spatial structure and rich texture information, according to these, this paper presents a new method of change detection based on Levene-Test and Fuzzy Evaluation. It first got map-spots by segmenting two overlapping images which had been pretreated, extracted features such as spectrum and texture. Then, changed information of all map-spots which had been treated by the Levene-Test were counted to obtain the candidate changed regions, hue information (H component) was extracted through the IHS Transform and conducted change vector analysis combined with the texture information. Eventually, the threshold was confirmed by an iteration method, the subject degrees of candidate changed regions were calculated, and final change regions were determined. In this paper experimental results on multi-temporal ZY-3 high-resolution images of some area in Jiangsu Province show that: Through extracting map-spots of larger difference as the candidate changed regions, Levene-Test decreases the computing load, improves the precision of change detection, and shows better fault-tolerant capacity for those unchanged regions which are of relatively large differences. The combination of Hue-texture features and fuzzy evaluation method can effectively decrease omissions and deficiencies, improve the precision of change detection.
Development of pair distribution function analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vondreele, R.; Billinge, S.; Kwei, G.
1996-09-01
This is the final report of a 3-year LDRD project at LANL. It has become more and more evident that structural coherence in the CuO{sub 2} planes of high-{Tc} superconducting materials over some intermediate length scale (nm range) is important to superconductivity. In recent years, the pair distribution function (PDF) analysis of powder diffraction data has been developed for extracting structural information on these length scales. This project sought to expand and develop this technique, use it to analyze neutron powder diffraction data, and apply it to problems. In particular, interest is in the area of high-{Tc} superconductors, although wemore » planned to extend the study to the closely related perovskite ferroelectric materials andother materials where the local structure affects the properties where detailed knowledge of the local and intermediate range structure is important. In addition, we planned to carry out single crystal experiments to look for diffuse scattering. This information augments the information from the PDF.« less
Image processing of metal surface with structured light
NASA Astrophysics Data System (ADS)
Luo, Cong; Feng, Chang; Wang, Congzheng
2014-09-01
In structured light vision measurement system, the ideal image of structured light strip, in addition to black background , contains only the gray information of the position of the stripe. However, the actual image contains image noise, complex background and so on, which does not belong to the stripe, and it will cause interference to useful information. To extract the stripe center of mental surface accurately, a new processing method was presented. Through adaptive median filtering, the noise can be preliminary removed, and the noise which introduced by CCD camera and measured environment can be further removed with difference image method. To highlight fine details and enhance the blurred regions between the stripe and noise, the sharping algorithm is used which combine the best features of Laplacian operator and Sobel operator. Morphological opening operation and closing operation are used to compensate the loss of information.Experimental results show that this method is effective in the image processing, not only to restrain the information but also heighten contrast. It is beneficial for the following processing.
Mode extraction on wind turbine blades via phase-based video motion estimation
NASA Astrophysics Data System (ADS)
Sarrafi, Aral; Poozesh, Peyman; Niezrecki, Christopher; Mao, Zhu
2017-04-01
In recent years, image processing techniques are being applied more often for structural dynamics identification, characterization, and structural health monitoring. Although as a non-contact and full-field measurement method, image processing still has a long way to go to outperform other conventional sensing instruments (i.e. accelerometers, strain gauges, laser vibrometers, etc.,). However, the technologies associated with image processing are developing rapidly and gaining more attention in a variety of engineering applications including structural dynamics identification and modal analysis. Among numerous motion estimation and image-processing methods, phase-based video motion estimation is considered as one of the most efficient methods regarding computation consumption and noise robustness. In this paper, phase-based video motion estimation is adopted for structural dynamics characterization on a 2.3-meter long Skystream wind turbine blade, and the modal parameters (natural frequencies, operating deflection shapes) are extracted. Phase-based video processing adopted in this paper provides reliable full-field 2-D motion information, which is beneficial for manufacturing certification and model updating at the design stage. The phase-based video motion estimation approach is demonstrated through processing data on a full-scale commercial structure (i.e. a wind turbine blade) with complex geometry and properties, and the results obtained have a good correlation with the modal parameters extracted from accelerometer measurements, especially for the first four bending modes, which have significant importance in blade characterization.
Photo-CIDNP NMR spectroscopy of amino acids and proteins.
Kuhn, Lars T
2013-01-01
Photo-chemically induced dynamic nuclear polarization (CIDNP) is a nuclear magnetic resonance (NMR) phenomenon which, among other things, is exploited to extract information on biomolecular structure via probing solvent-accessibilities of tryptophan (Trp), tyrosine (Tyr), and histidine (His) amino acid side chains both in polypeptides and proteins in solution. The effect, normally triggered by a (laser) light-induced photochemical reaction in situ, yields both positive and/or negative signal enhancements in the resulting NMR spectra which reflect the solvent exposure of these residues both in equilibrium and during structural transformations in "real time". As such, the method can offer - qualitatively and, to a certain extent, quantitatively - residue-specific structural and kinetic information on both the native and, in particular, the non-native states of proteins which, often, is not readily available from more routine NMR techniques. In this review, basic experimental procedures of the photo-CIDNP technique as applied to amino acids and proteins are discussed, recent improvements to the method highlighted, and future perspectives presented. First, the basic principles of the phenomenon based on the theory of the radical pair mechanism (RPM) are outlined. Second, a description of standard photo-CIDNP applications is given and it is shown how the effect can be exploited to extract residue-specific structural information on the conformational space sampled by unfolded or partially folded proteins on their "path" to the natively folded form. Last, recent methodological advances in the field are highlighted, modern applications of photo-CIDNP in the context of biological NMR evaluated, and an outlook into future perspectives of the method is given.
NASA Astrophysics Data System (ADS)
Villéger, Alice; Ouchchane, Lemlih; Lemaire, Jean-Jacques; Boire, Jean-Yves
2007-03-01
Symptoms of neurodegenerative pathologies such as Parkinson's disease can be relieved through Deep Brain Stimulation. This neurosurgical technique relies on high precision positioning of electrodes in specific areas of the basal ganglia and the thalamus. These subcortical anatomical targets must be located at pre-operative stage, from a set of MRI acquired under stereotactic conditions. In order to assist surgical planning, we designed a semi-automated image analysis process for extracting anatomical areas of interest. Complementary information, provided by both patient's data and expert knowledge, is represented as fuzzy membership maps, which are then fused by means of suitable possibilistic operators in order to achieve the segmentation of targets. More specifically, theoretical prior knowledge on brain anatomy is modelled within a 'virtual atlas' organised as a spatial graph: a list of vertices linked by edges, where each vertex represents an anatomical structure of interest and contains relevant information such as tissue composition, whereas each edge represents a spatial relationship between two structures, such as their relative directions. The model is built using heterogeneous sources of information such as qualitative descriptions from the expert, or quantitative information from prelabelled images. For each patient, tissue membership maps are extracted from MR data through a classification step. Prior model and patient's data are then matched by using a research algorithm (or 'strategy') which simultaneously computes an estimation of the location of every structures. The method was tested on 10 clinical images, with promising results. Location and segmentation results were statistically assessed, opening perspectives for enhancements.
NASA Astrophysics Data System (ADS)
Li, Xuan; Liu, Zhiping; Jiang, Xiaoli; Lodewijks, Gabrol
2018-01-01
Eddy current pulsed thermography (ECPT) is well established for non-destructive testing of electrical conductive materials, featuring the advantages of contactless, intuitive detecting and efficient heating. The concept of divergence characterization of the damage rate of carbon fibre-reinforced plastic (CFRP)-steel structures can be extended to ECPT thermal pattern characterization. It was found in this study that the use of ECPT technology on CFRP-steel structures generated a sizeable amount of valuable information for comprehensive material diagnostics. The relationship between divergence and transient thermal patterns can be identified and analysed by deploying mathematical models to analyse the information about fibre texture-like orientations, gaps and undulations in these multi-layered materials. The developed algorithm enabled the removal of information about fibre texture and the extraction of damage features. The model of the CFRP-glue-steel structures with damage was established using COMSOL Multiphysics® software, and quantitative non-destructive damage evaluation from the ECPT image areas was derived. The results of this proposed method illustrate that damaged areas are highly affected by available information about fibre texture. This proposed work can be applied for detection of impact induced damage and quantitative evaluation of CFRP structures.
Medical Image Fusion Based on Feature Extraction and Sparse Representation
Wei, Gao; Zongxi, Song
2017-01-01
As a novel multiscale geometric analysis tool, sparse representation has shown many advantages over the conventional image representation methods. However, the standard sparse representation does not take intrinsic structure and its time complexity into consideration. In this paper, a new fusion mechanism for multimodal medical images based on sparse representation and decision map is proposed to deal with these problems simultaneously. Three decision maps are designed including structure information map (SM) and energy information map (EM) as well as structure and energy map (SEM) to make the results reserve more energy and edge information. SM contains the local structure feature captured by the Laplacian of a Gaussian (LOG) and EM contains the energy and energy distribution feature detected by the mean square deviation. The decision map is added to the normal sparse representation based method to improve the speed of the algorithm. Proposed approach also improves the quality of the fused results by enhancing the contrast and reserving more structure and energy information from the source images. The experiment results of 36 groups of CT/MR, MR-T1/MR-T2, and CT/PET images demonstrate that the method based on SR and SEM outperforms five state-of-the-art methods. PMID:28321246
NASA Astrophysics Data System (ADS)
Wurm, Michael; Taubenböck, Hannes; Dech, Stefan
2010-10-01
Dynamics of urban environments are a challenge to a sustainable development. Urban areas promise wealth, realization of individual dreams and power. Hence, many cities are characterized by a population growth as well as physical development. Traditional, visual mapping and updating of urban structure information of cities is a very laborious and cost-intensive task, especially for large urban areas. For this purpose, we developed a workflow for the extraction of the relevant information by means of object-based image classification. In this manner, multisensoral remote sensing data has been analyzed in terms of very high resolution optical satellite imagery together with height information by a digital surface model to retrieve a detailed 3D city model with the relevant land-use / land-cover information. This information has been aggregated on the level of the building block to describe the urban structure by physical indicators. A comparison between the indicators derived by the classification and a reference classification has been accomplished to show the correlation between the individual indicators and a reference classification of urban structure types. The indicators have been used to apply a cluster analysis to group the individual blocks into similar clusters.
NASA Astrophysics Data System (ADS)
Beyer, Hans Georg
2016-04-01
With the increasing availability of satellite derived irradiance information, this type of data set is more and more in use for the design and operation of solar energy systems, most notably PV- and CSP-systems. By this, the need for data measured on-site is reduced. However, due to basic limitations of the satellite-derived data, several requirements put by the intended application cannot be coped with this data type directly. Traw satellite information has to be enhanced in both space and time resolution by additional information to be fully applicable for all aspects of the modelling od solar energy systems. To cope with this problem, several individual and collaborative projects had been performed in the recent years or are ongoing. Approaches are on one hand based on pasting synthesized high-resolution data into the low-resolution original sets. Pre-requite is an appropriate model, validated against real world data. For the case of irradiance data, these models can be extracted either directly from ground measured data sets or from data referring to the cloud situation as gained from the images of sky cameras or from monte -carlo initialized physical models. The current models refer to the spatial structure of the cloud fields. Dynamics are imposed by moving the cloud structures according to a large scale cloud motion vector, either extracted from the dynamics interfered from consecutive satellite images or taken from a meso-scale meteorological model. Dynamic irradiance information is then derived from the cloud field structure and the cloud motion vector. This contribution, which is linked to subtask A - Solar Resource Applications for High Penetration of Solar Technologies - of IEA SHC task 46, will present the different approaches and discuss examples in view of validation, need for auxiliary information and respective general applicability.
Wang, Zhifei; Xie, Yanming; Wang, Yongyan
2011-10-01
Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.
Motamed, Cyrus; Bourgain, Jean Louis
2016-06-01
Anaesthesia Information Management Systems (AIMS) generate large amounts of data, which might be useful for quality assurance programs. This study was designed to highlight the multiple contributions of our AIMS system in extracting quality indicators over a period of 10years. The study was conducted from 2002 to 2011. Two methods were used to extract anaesthesia indicators: the manual extraction of individual files for monitoring neuromuscular relaxation and structured query language (SQL) extraction for other indicators which were postoperative nausea and vomiting (PONV), pain, sedation scores, pain-related medications, scores and postoperative hypothermia. For each indicator, a program of information/meetings and adaptation/suggestions for operating room and PACU personnel was initiated to improve quality assurance, while data were extracted each year. The study included 77,573 patients. The mean overall completeness of data for the initial years ranged from 55 to 85% and was indicator-dependent, which then improved to 95% completeness for the last 5years. The incidence of neuromuscular monitoring was initially 67% and then increased to 95% (P<0.05). The rate of pharmacological reversal remained around 53% throughout the study. Regarding SQL data, an improvement of severe postoperative pain and PONV scores was observed throughout the study, while mild postoperative hypothermia remained a challenge, despite efforts for improvement. The AIMS system permitted the follow-up of certain indicators through manual sampling and many more via SQL extraction in a sustained and non-time-consuming way across years. However, it requires competent and especially dedicated resources to handle the database. Copyright © 2016 Société française d'anesthésie et de réanimation (Sfar). Published by Elsevier Masson SAS. All rights reserved.
Information Retrieval and Text Mining Technologies for Chemistry.
Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso
2017-06-28
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
NASA Astrophysics Data System (ADS)
Sarrafi, Aral; Mao, Zhu; Niezrecki, Christopher; Poozesh, Peyman
2018-05-01
Vibration-based Structural Health Monitoring (SHM) techniques are among the most common approaches for structural damage identification. The presence of damage in structures may be identified by monitoring the changes in dynamic behavior subject to external loading, and is typically performed by using experimental modal analysis (EMA) or operational modal analysis (OMA). These tools for SHM normally require a limited number of physically attached transducers (e.g. accelerometers) in order to record the response of the structure for further analysis. Signal conditioners, wires, wireless receivers and a data acquisition system (DAQ) are also typical components of traditional sensing systems used in vibration-based SHM. However, instrumentation of lightweight structures with contact sensors such as accelerometers may induce mass-loading effects, and for large-scale structures, the instrumentation is labor intensive and time consuming. Achieving high spatial measurement resolution for a large-scale structure is not always feasible while working with traditional contact sensors, and there is also the potential for a lack of reliability associated with fixed contact sensors in outliving the life-span of the host structure. Among the state-of-the-art non-contact measurements, digital video cameras are able to rapidly collect high-density spatial information from structures remotely. In this paper, the subtle motions from recorded video (i.e. a sequence of images) are extracted by means of Phase-based Motion Estimation (PME) and the extracted information is used to conduct damage identification on a 2.3-m long Skystream® wind turbine blade (WTB). The PME and phased-based motion magnification approach estimates the structural motion from the captured sequence of images for both a baseline and damaged test cases on a wind turbine blade. Operational deflection shapes of the test articles are also quantified and compared for the baseline and damaged states. In addition, having proper lighting while working with high-speed cameras can be an issue, therefore image enhancement and contrast manipulation has also been performed to enhance the raw images. Ultimately, the extracted resonant frequencies and operational deflection shapes are used to detect the presence of damage, demonstrating the feasibility of implementing non-contact video measurements to perform realistic structural damage detection.
Roi Detection and Vessel Segmentation in Retinal Image
NASA Astrophysics Data System (ADS)
Sabaz, F.; Atila, U.
2017-11-01
Diabetes disrupts work by affecting the structure of the eye and afterwards leads to loss of vision. Depending on the stage of disease that called diabetic retinopathy, there are sudden loss of vision and blurred vision problems. Automated detection of vessels in retinal images is a useful study to diagnose eye diseases, disease classification and other clinical trials. The shape and structure of the vessels give information about the severity of the disease and the stage of the disease. Automatic and fast detection of vessels allows for a quick diagnosis of the disease and the treatment process to start shortly. ROI detection and vessel extraction methods for retinal image are mentioned in this study. It is shown that the Frangi filter used in image processing can be successfully used in detection and extraction of vessels.
Oxygen octahedra picker: A software tool to extract quantitative information from STEM images.
Wang, Yi; Salzberger, Ute; Sigle, Wilfried; Eren Suyolcu, Y; van Aken, Peter A
2016-09-01
In perovskite oxide based materials and hetero-structures there are often strong correlations between oxygen octahedral distortions and functionality. Thus, atomistic understanding of the octahedral distortion, which requires accurate measurements of atomic column positions, will greatly help to engineer their properties. Here, we report the development of a software tool to extract quantitative information of the lattice and of BO6 octahedral distortions from STEM images. Center-of-mass and 2D Gaussian fitting methods are implemented to locate positions of individual atom columns. The precision of atomic column distance measurements is evaluated on both simulated and experimental images. The application of the software tool is demonstrated using practical examples. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Extraction of quantitative surface characteristics from AIRSAR data for Death Valley, California
NASA Technical Reports Server (NTRS)
Kierein-Young, K. S.; Kruse, F. A.
1992-01-01
Polarimetric Airborne Synthetic Aperture Radar (AIRSAR) data were collected for the Geologic Remote Sensing Field Experiment (GRSFE) over Death Valley, California, USA, in Sep. 1989. AIRSAR is a four-look, quad-polarization, three frequency instrument. It collects measurements at C-band (5.66 cm), L-band (23.98 cm), and P-band (68.13 cm), and has a GIFOV of 10 meters and a swath width of 12 kilometers. Because the radar measures at three wavelengths, different scales of surface roughness are measured. Also, dielectric constants can be calculated from the data. The AIRSAR data were calibrated using in-scene trihedral corner reflectors to remove cross-talk; and to calibrate the phase, amplitude, and co-channel gain imbalance. The calibration allows for the extraction of accurate values of rms surface roughness, dielectric constants, sigma(sub 0) backscatter, and polarization information. The radar data sets allow quantitative characterization of small scale surface structure of geologic units, providing information about the physical and chemical processes that control the surface morphology. Combining the quantitative information extracted from the radar data with other remotely sensed data sets allows discrimination, identification and mapping of geologic units that may be difficult to discern using conventional techniques.
Tuinman, Albert A; Lewis, Linda A; Lewis, Samuel A
2003-06-01
The application of electrospray ionization mass spectrometry (ESI-MS) to trace-fiber color analysis is explored using acidic dyes commonly employed to color nylon-based fibers, as well as extracts from dyed nylon fibers. Qualitative information about constituent dyes and quantitative information about the relative amounts of those dyes present on a single fiber become readily available using this technique. Sample requirements for establishing the color identity of different samples (i.e., comparative trace-fiber analysis) are shown to be submillimeter. Absolute verification of dye mixture identity (beyond the comparison of molecular weights derived from ESI-MS) can be obtained by expanding the technique to include tandem mass spectrometry (ESI-MS/MS). For dyes of unknown origin, the ESI-MS/MS analyses may offer insights into the chemical structure of the compound-information not available from chromatographic techniques alone. This research demonstrates that ESI-MS is viable as a sensitive technique for distinguishing dye constituents extracted from a minute amount of trace-fiber evidence. A protocol is suggested to establish/refute the proposition that two fibers--one of which is available in minute quantity only--are of the same origin.
MolTalk – a programming library for protein structures and structure analysis
Diemand, Alexander V; Scheib, Holger
2004-01-01
Background Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. Results We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. Conclusion MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications: 1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot. 2) To quickly retrieve information for (a limited number of) macro-molecular structures, i.e. H-bonds, salt bridges, contacts between amino acids and ligands or at the interface between two chains. 3) To programme more complex structural bioinformatics software and to implement demanding algorithms through its portability to Objective-C, e.g. iMolTalk. 4) To be used as a front end to databases, e.g. PDBChainSaw. PMID:15096277
MolTalk--a programming library for protein structures and structure analysis.
Diemand, Alexander V; Scheib, Holger
2004-04-19
Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page http://www.moltalk.org following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications:1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot.2) To quickly retrieve information for (a limited number of) macro-molecular structures, i.e. H-bonds, salt bridges, contacts between amino acids and ligands or at the interface between two chains.3) To programme more complex structural bioinformatics software and to implement demanding algorithms through its portability to Objective-C, e.g. iMolTalk.4) To be used as a front end to databases, e.g. PDBChainSaw.
Using automatically extracted information from mammography reports for decision-support
Bozkurt, Selen; Gimenez, Francisco; Burnside, Elizabeth S.; Gulkesen, Kemal H.; Rubin, Daniel L.
2016-01-01
Objective To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report. Materials and methods We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect (“reference standard”) structured inputs to the DSS (“RS-DSS”) vs NLP-derived inputs (“NLP-DSS”) on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist. Results The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004 ± 0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%. Conclusion The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions. PMID:27388877
NASA Astrophysics Data System (ADS)
Chen, Tian-Yu; Chen, Yang; Yang, Hu-Jiang; Xiao, Jing-Hua; Hu, Gang
2018-03-01
Nowadays, massive amounts of data have been accumulated in various and wide fields, it has become today one of the central issues in interdisciplinary fields to analyze existing data and extract as much useful information as possible from data. It is often that the output data of systems are measurable while dynamic structures producing these data are hidden, and thus studies to reveal system structures by analyzing available data, i.e., reconstructions of systems become one of the most important tasks of information extractions. In the past, most of the works in this respect were based on theoretical analyses and numerical verifications. Direct analyses of experimental data are very rare. In physical science, most of the analyses of experimental setups were based on the first principles of physics laws, i.e., so-called top-down analyses. In this paper, we conducted an experiment of “Boer resonant instrument for forced vibration” (BRIFV) and inferred the dynamic structure of the experimental set purely from the analysis of the measurable experimental data, i.e., by applying the bottom-up strategy. Dynamics of the experimental set is strongly nonlinear and chaotic, and itʼs subjects to inevitable noises. We proposed to use high-order correlation computations to treat nonlinear dynamics; use two-time correlations to treat noise effects. By applying these approaches, we have successfully reconstructed the structure of the experimental setup, and the dynamic system reconstructed with the measured data reproduces good experimental results in a wide range of parameters.
An Experimental Investigation of Complexity in Database Query Formulation Tasks
ERIC Educational Resources Information Center
Casterella, Gretchen Irwin; Vijayasarathy, Leo
2013-01-01
Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…
Pierson, Jean T; Monteith, Gregory R; Roberts-Thomson, Sarah J; Dietzgen, Ralf G; Gidley, Michael J; Shaw, Paul N
2014-04-15
In this study we determined the qualitative composition and distribution of phytochemicals in peel and flesh of fruits from four different varieties of mango using mass spectrometry profiling following fractionation of methanol extracts by preparative HPLC. Gallic acid substituted compounds, of diverse core structure, were characteristic of the phytochemicals extracted using this approach. Other principal compounds identified were from the quercetin family, the hydrolysable tannins and fatty acids and their derivatives. This work provides additional information regarding mango fruit phytochemical composition and its potential contribution to human health and nutrition. Compounds present in mango peel and flesh are likely subject to genetic control and this will be the subject of future studies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Pometti, Carolina L; Bessega, Cecilia F; Saidman, Beatriz O; Vilardi, Juan C
2014-03-01
Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other.
Research on public participant urban infrastructure safety monitoring system using smartphone
NASA Astrophysics Data System (ADS)
Zhao, Xuefeng; Wang, Niannian; Ou, Jinping; Yu, Yan; Li, Mingchu
2017-04-01
Currently more and more people concerned about the safety of major public security. Public participant urban infrastructure safety monitoring and investigation has become a trend in the era of big data. In this paper, public participant urban infrastructure safety protection system based on smart phones is proposed. The system makes it possible to public participant disaster data collection, monitoring and emergency evaluation in the field of disaster prevention and mitigation. Function of the system is to monitor the structural acceleration, angle and other vibration information, and extract structural deformation and implement disaster emergency communications based on smartphone without network. The monitoring data is uploaded to the website to create urban safety information database. Then the system supports big data analysis processing, the structure safety assessment and city safety early warning.
Reactive immunization on complex networks
NASA Astrophysics Data System (ADS)
Alfinito, Eleonora; Beccaria, Matteo; Fachechi, Alberto; Macorini, Guido
2017-01-01
Epidemic spreading on complex networks depends on the topological structure as well as on the dynamical properties of the infection itself. Generally speaking, highly connected individuals play the role of hubs and are crucial to channel information across the network. On the other hand, static topological quantities measuring the connectivity structure are independent of the dynamical mechanisms of the infection. A natural question is therefore how to improve the topological analysis by some kind of dynamical information that may be extracted from the ongoing infection itself. In this spirit, we propose a novel vaccination scheme that exploits information from the details of the infection pattern at the moment when the vaccination strategy is applied. Numerical simulations of the infection process show that the proposed immunization strategy is effective and robust on a wide class of complex networks.
Visualization of JPEG Metadata
NASA Astrophysics Data System (ADS)
Malik Mohamad, Kamaruddin; Deris, Mustafa Mat
There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.
NASA Astrophysics Data System (ADS)
Schroth, M. H.; Kleikemper, J.; Pombo, S. A.; Zeyer, J.
2002-12-01
In the past, studies on microbial communities in natural environments have typically focused on either their structure or on their metabolic function. However, linking structure and function is important for understanding microbial community dynamics, in particular in contaminated environments. We will present results of a novel combination of a hydrogeological field method (push-pull tests) with molecular tools and stable isotope analysis, which was employed to quantify anaerobic activities and associated microbial diversity in a petroleum-contaminated aquifer in Studen, Switzerland. Push-pull tests consisted of the injection of test solution containing a conservative tracer and reactants (electron acceptors, 13C-labeled carbon sources) into the aquifer anoxic zone. Following an incubation period, the test solution/groundwater mixture was extracted from the same location. Metabolic activities were computed from solute concentrations measured during extraction. Simultaneously, microbial diversity in sediment and groundwater was characterized by using fluorescence in situ hybridization (FISH), denaturing gradient gel electrophoresis (DGGE), as well as phospholipids fatty acid (PLFA) analysis in combination with 13C isotopic measurements. Results from DGGE analyses provided information on the general community structure before, during and after the tests, while FISH yielded information on active populations. Moreover, using 13C-labeling of microbial PLFA we were able to directly link carbon source assimilation in an aquifer to indigenous microorganisms while providing quantitative information on respective carbon source consumption.
Hayat, Maqsood; Khan, Asifullah
2013-05-01
Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://111.68.99.218/WRF-TMH/ .
Shen, Yang; Bax, Ad
2015-01-01
Summary Chemical shifts are obtained at the first stage of any protein structural study by NMR spectroscopy. Chemical shifts are known to be impacted by a wide range of structural factors and the artificial neural network based TALOS-N program has been trained to extract backbone and sidechain torsion angles from 1H, 15N and 13C shifts. The program is quite robust, and typically yields backbone torsion angles for more than 90% of the residues, and sidechain χ1 rotamer information for about half of these, in addition to reliably predicting secondary structure. The use of TALOS-N is illustrated for the protein DinI, and torsion angles obtained by TALOS-N analysis from the measured chemical shifts of its backbone and 13Cβ nuclei are compared to those seen in a prior, experimentally determined structure. The program is also particularly useful for generating torsion angle restraints, which then can be used during standard NMR protein structure calculations. PMID:25502373
a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information
NASA Astrophysics Data System (ADS)
Lian, Shizhong; Chen, Jiangping; Luo, Minghai
2016-06-01
Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.
Esquivel, Rodolfo O; Molina-Espíritu, Moyocoyani; López-Rosa, Sheila; Soriano-Correa, Catalina; Barrientos-Salcedo, Carolina; Kohout, Miroslav; Dehesa, Jesús S
2015-08-24
In this work we undertake a pioneer information-theoretical analysis of 18 selected amino acids extracted from a natural protein, bacteriorhodopsin (1C3W). The conformational structures of each amino acid are analyzed by use of various quantum chemistry methodologies at high levels of theory: HF, M062X and CISD(Full). The Shannon entropy, Fisher information and disequilibrium are determined to grasp the spatial spreading features of delocalizability, order and uniformity of the optimized structures. These three entropic measures uniquely characterize all amino acids through a predominant information-theoretic quality scheme (PIQS), which gathers all chemical families by means of three major spreading features: delocalization, narrowness and uniformity. This scheme recognizes four major chemical families: aliphatic (delocalized), aromatic (delocalized), electro-attractive (narrowed) and tiny (uniform). All chemical families recognized by the existing energy-based classifications are embraced by this entropic scheme. Finally, novel chemical patterns are shown in the information planes associated with the PIQS entropic measures. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Southan, Christopher; Várkonyi, Péter; Muresan, Sorel
2009-07-06
Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets. Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not. On the basis of chemical structure content per se public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.
Tošić, Tamara; Sellers, Kristin K; Fröhlich, Flavio; Fedotenkova, Mariia; Beim Graben, Peter; Hutt, Axel
2015-01-01
For decades, research in neuroscience has supported the hypothesis that brain dynamics exhibits recurrent metastable states connected by transients, which together encode fundamental neural information processing. To understand the system's dynamics it is important to detect such recurrence domains, but it is challenging to extract them from experimental neuroscience datasets due to the large trial-to-trial variability. The proposed methodology extracts recurrent metastable states in univariate time series by transforming datasets into their time-frequency representations and computing recurrence plots based on instantaneous spectral power values in various frequency bands. Additionally, a new statistical inference analysis compares different trial recurrence plots with corresponding surrogates to obtain statistically significant recurrent structures. This combination of methods is validated by applying it to two artificial datasets. In a final study of visually-evoked Local Field Potentials in partially anesthetized ferrets, the methodology is able to reveal recurrence structures of neural responses with trial-to-trial variability. Focusing on different frequency bands, the δ-band activity is much less recurrent than α-band activity. Moreover, α-activity is susceptible to pre-stimuli, while δ-activity is much less sensitive to pre-stimuli. This difference in recurrence structures in different frequency bands indicates diverse underlying information processing steps in the brain.
Tošić, Tamara; Sellers, Kristin K.; Fröhlich, Flavio; Fedotenkova, Mariia; beim Graben, Peter; Hutt, Axel
2016-01-01
For decades, research in neuroscience has supported the hypothesis that brain dynamics exhibits recurrent metastable states connected by transients, which together encode fundamental neural information processing. To understand the system's dynamics it is important to detect such recurrence domains, but it is challenging to extract them from experimental neuroscience datasets due to the large trial-to-trial variability. The proposed methodology extracts recurrent metastable states in univariate time series by transforming datasets into their time-frequency representations and computing recurrence plots based on instantaneous spectral power values in various frequency bands. Additionally, a new statistical inference analysis compares different trial recurrence plots with corresponding surrogates to obtain statistically significant recurrent structures. This combination of methods is validated by applying it to two artificial datasets. In a final study of visually-evoked Local Field Potentials in partially anesthetized ferrets, the methodology is able to reveal recurrence structures of neural responses with trial-to-trial variability. Focusing on different frequency bands, the δ-band activity is much less recurrent than α-band activity. Moreover, α-activity is susceptible to pre-stimuli, while δ-activity is much less sensitive to pre-stimuli. This difference in recurrence structures in different frequency bands indicates diverse underlying information processing steps in the brain. PMID:26834580
Analysis of atomic force microscopy data for surface characterization using fuzzy logic
DOE Office of Scientific and Technical Information (OSTI.GOV)
Al-Mousa, Amjed, E-mail: aalmousa@vt.edu; Niemann, Darrell L.; Niemann, Devin J.
2011-07-15
In this paper we present a methodology to characterize surface nanostructures of thin films. The methodology identifies and isolates nanostructures using Atomic Force Microscopy (AFM) data and extracts quantitative information, such as their size and shape. The fuzzy logic based methodology relies on a Fuzzy Inference Engine (FIE) to classify the data points as being top, bottom, uphill, or downhill. The resulting data sets are then further processed to extract quantitative information about the nanostructures. In the present work we introduce a mechanism which can consistently distinguish crowded surfaces from those with sparsely distributed structures and present an omni-directional searchmore » technique to improve the structural recognition accuracy. In order to demonstrate the effectiveness of our approach we present a case study which uses our approach to quantitatively identify particle sizes of two specimens each with a unique gold nanoparticle size distribution. - Research Highlights: {yields} A Fuzzy logic analysis technique capable of characterizing AFM images of thin films. {yields} The technique is applicable to different surfaces regardless of their densities. {yields} Fuzzy logic technique does not require manual adjustment of the algorithm parameters. {yields} The technique can quantitatively capture differences between surfaces. {yields} This technique yields more realistic structure boundaries compared to other methods.« less
Measuring nuclear reaction cross sections to extract information on neutrinoless double beta decay
NASA Astrophysics Data System (ADS)
Cavallaro, M.; Cappuzzello, F.; Agodi, C.; Acosta, L.; Auerbach, N.; Bellone, J.; Bijker, R.; Bonanno, D.; Bongiovanni, D.; Borello-Lewin, T.; Boztosun, I.; Branchina, V.; Bussa, M. P.; Calabrese, S.; Calabretta, L.; Calanna, A.; Calvo, D.; Carbone, D.; Chávez Lomelí, E. R.; Coban, A.; Colonna, M.; D'Agostino, G.; De Geronimo, G.; Delaunay, F.; Deshmukh, N.; de Faria, P. N.; Ferraresi, C.; Ferreira, J. L.; Finocchiaro, P.; Fisichella, M.; Foti, A.; Gallo, G.; Garcia, U.; Giraudo, G.; Greco, V.; Hacisalihoglu, A.; Kotila, J.; Iazzi, F.; Introzzi, R.; Lanzalone, G.; Lavagno, A.; La Via, F.; Lay, J. A.; Lenske, H.; Linares, R.; Litrico, G.; Longhitano, F.; Lo Presti, D.; Lubian, J.; Medina, N.; Mendes, D. R.; Muoio, A.; Oliveira, J. R. B.; Pakou, A.; Pandola, L.; Petrascu, H.; Pinna, F.; Reito, S.; Rifuggiato, D.; Rodrigues, M. R. D.; Russo, A. D.; Russo, G.; Santagati, G.; Santopinto, E.; Sgouros, O.; Solakci, S. O.; Souliotis, G.; Soukeras, V.; Spatafora, A.; Torresi, D.; Tudisco, S.; Vsevolodovna, R. I. M.; Wheadon, R. J.; Yildirin, A.; Zagatto, V. A. B.
2018-02-01
Neutrinoless double beta decay (0vββ) is considered the best potential resource to access the absolute neutrino mass scale. Moreover, if observed, it will signal that neutrinos are their own anti-particles (Majorana particles). Presently, this physics case is one of the most important research “beyond Standard Model” and might guide the way towards a Grand Unified Theory of fundamental interactions. Since the 0vββ decay process involves nuclei, its analysis necessarily implies nuclear structure issues. In the NURE project, supported by a Starting Grant of the European Research Council (ERC), nuclear reactions of double charge-exchange (DCE) are used as a tool to extract information on the 0vββ Nuclear Matrix Elements. In DCE reactions and ββ decay indeed the initial and final nuclear states are the same and the transition operators have similar structure. Thus the measurement of the DCE absolute cross-sections can give crucial information on ββ matrix elements. In a wider view, the NUMEN international collaboration plans a major upgrade of the INFN-LNS facilities in the next years in order to increase the experimental production of nuclei of at least two orders of magnitude, thus making feasible a systematic study of all the cases of interest as candidates for 0vββ.
Real-Time Digital Signal Processing Based on FPGAs for Electronic Skin Implementation †
Ibrahim, Ali; Gastaldo, Paolo; Chible, Hussein; Valle, Maurizio
2017-01-01
Enabling touch-sensing capability would help appliances understand interaction behaviors with their surroundings. Many recent studies are focusing on the development of electronic skin because of its necessity in various application domains, namely autonomous artificial intelligence (e.g., robots), biomedical instrumentation, and replacement prosthetic devices. An essential task of the electronic skin system is to locally process the tactile data and send structured information either to mimic human skin or to respond to the application demands. The electronic skin must be fabricated together with an embedded electronic system which has the role of acquiring the tactile data, processing, and extracting structured information. On the other hand, processing tactile data requires efficient methods to extract meaningful information from raw sensor data. Machine learning represents an effective method for data analysis in many domains: it has recently demonstrated its effectiveness in processing tactile sensor data. In this framework, this paper presents the implementation of digital signal processing based on FPGAs for tactile data processing. It provides the implementation of a tensorial kernel function for a machine learning approach. Implementation results are assessed by highlighting the FPGA resource utilization and power consumption. Results demonstrate the feasibility of the proposed implementation when real-time classification of input touch modalities are targeted. PMID:28287448
Real-Time Digital Signal Processing Based on FPGAs for Electronic Skin Implementation.
Ibrahim, Ali; Gastaldo, Paolo; Chible, Hussein; Valle, Maurizio
2017-03-10
Enabling touch-sensing capability would help appliances understand interaction behaviors with their surroundings. Many recent studies are focusing on the development of electronic skin because of its necessity in various application domains, namely autonomous artificial intelligence (e.g., robots), biomedical instrumentation, and replacement prosthetic devices. An essential task of the electronic skin system is to locally process the tactile data and send structured information either to mimic human skin or to respond to the application demands. The electronic skin must be fabricated together with an embedded electronic system which has the role of acquiring the tactile data, processing, and extracting structured information. On the other hand, processing tactile data requires efficient methods to extract meaningful information from raw sensor data. Machine learning represents an effective method for data analysis in many domains: it has recently demonstrated its effectiveness in processing tactile sensor data. In this framework, this paper presents the implementation of digital signal processing based on FPGAs for tactile data processing. It provides the implementation of a tensorial kernel function for a machine learning approach. Implementation results are assessed by highlighting the FPGA resource utilization and power consumption. Results demonstrate the feasibility of the proposed implementation when real-time classification of input touch modalities are targeted.
Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James
2014-05-13
An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.
Wszelaki, Natalia; Paradowska, Katarzyna; Jamróz, Marta K; Granica, Sebastian; Kiss, Anna K
2011-09-14
Isolation and identification of the inhibitors of butyrylcholinesterase (BChE), obtained from the extracts of roots and fruits of Angelica archangelica L., are reported. Our results confirmed the weak inhibitory effect of Angelica roots on acetylcholinesterase activity. BChE inhibition was much more pronounced at a concentration of 100 μg/mL for hexane extracts and attained a higher rate than 50%. The TLC bioautography guided fractionation and spectroscopic analysis led to the isolation and identification of imperatorin from the fruit's hexane extract and of heraclenol-2'-O-angelate from the root's hexane extract. Both compounds showed significant BChE inhibition activity with IC(50) = 14.4 ± 3.2 μM and IC(50) = 7.5 ± 1.8 μM, respectively. Only C8-substituted and C5-unsubstituted furanocoumarins were active, which could supply information about the initial structures of specific BChE inhibitors.
Soltanipour, Asieh; Sadri, Saeed; Rabbani, Hossein; Akhlaghi, Mohammad Reza
2015-01-01
This paper presents a new procedure for automatic extraction of the blood vessels and optic disk (OD) in fundus fluorescein angiogram (FFA). In order to extract blood vessel centerlines, the algorithm of vessel extraction starts with the analysis of directional images resulting from sub-bands of fast discrete curvelet transform (FDCT) in the similar directions and different scales. For this purpose, each directional image is processed by using information of the first order derivative and eigenvalues obtained from the Hessian matrix. The final vessel segmentation is obtained using a simple region growing algorithm iteratively, which merges centerline images with the contents of images resulting from modified top-hat transform followed by bit plane slicing. After extracting blood vessels from FFA image, candidates regions for OD are enhanced by removing blood vessels from the FFA image, using multi-structure elements morphology, and modification of FDCT coefficients. Then, canny edge detector and Hough transform are applied to the reconstructed image to extract the boundary of candidate regions. At the next step, the information of the main arc of the retinal vessels surrounding the OD region is used to extract the actual location of the OD. Finally, the OD boundary is detected by applying distance regularized level set evolution. The proposed method was tested on the FFA images from angiography unit of Isfahan Feiz Hospital, containing 70 FFA images from different diabetic retinopathy stages. The experimental results show the accuracy more than 93% for vessel segmentation and more than 87% for OD boundary extraction.
Soltanipour, Asieh; Sadri, Saeed; Rabbani, Hossein; Akhlaghi, Mohammad Reza
2015-01-01
This paper presents a new procedure for automatic extraction of the blood vessels and optic disk (OD) in fundus fluorescein angiogram (FFA). In order to extract blood vessel centerlines, the algorithm of vessel extraction starts with the analysis of directional images resulting from sub-bands of fast discrete curvelet transform (FDCT) in the similar directions and different scales. For this purpose, each directional image is processed by using information of the first order derivative and eigenvalues obtained from the Hessian matrix. The final vessel segmentation is obtained using a simple region growing algorithm iteratively, which merges centerline images with the contents of images resulting from modified top-hat transform followed by bit plane slicing. After extracting blood vessels from FFA image, candidates regions for OD are enhanced by removing blood vessels from the FFA image, using multi-structure elements morphology, and modification of FDCT coefficients. Then, canny edge detector and Hough transform are applied to the reconstructed image to extract the boundary of candidate regions. At the next step, the information of the main arc of the retinal vessels surrounding the OD region is used to extract the actual location of the OD. Finally, the OD boundary is detected by applying distance regularized level set evolution. The proposed method was tested on the FFA images from angiography unit of Isfahan Feiz Hospital, containing 70 FFA images from different diabetic retinopathy stages. The experimental results show the accuracy more than 93% for vessel segmentation and more than 87% for OD boundary extraction. PMID:26284170
A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.
Segura-Bedmar, Isabel; Martínez, Paloma; de Pablo-Sánchez, César
2011-03-29
A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.
Topological entanglement Rényi entropy and reduced density matrix structure.
Flammia, Steven T; Hamma, Alioscia; Hughes, Taylor L; Wen, Xiao-Gang
2009-12-31
We generalize the topological entanglement entropy to a family of topological Rényi entropies parametrized by a parameter alpha, in an attempt to find new invariants for distinguishing topologically ordered phases. We show that, surprisingly, all topological Rényi entropies are the same, independent of alpha for all nonchiral topological phases. This independence shows that topologically ordered ground-state wave functions have reduced density matrices with a certain simple structure, and no additional universal information can be extracted from the entanglement spectrum.
Topological Entanglement Rényi Entropy and Reduced Density Matrix Structure
NASA Astrophysics Data System (ADS)
Flammia, Steven T.; Hamma, Alioscia; Hughes, Taylor L.; Wen, Xiao-Gang
2009-12-01
We generalize the topological entanglement entropy to a family of topological Rényi entropies parametrized by a parameter α, in an attempt to find new invariants for distinguishing topologically ordered phases. We show that, surprisingly, all topological Rényi entropies are the same, independent of α for all nonchiral topological phases. This independence shows that topologically ordered ground-state wave functions have reduced density matrices with a certain simple structure, and no additional universal information can be extracted from the entanglement spectrum.
An Overview of the Production Quality Compiler-Compiler Project
1979-02-01
process. A parse tree is assumed, and there is a set of primitives for extracting information from it and for "walking" it: using its structure to...not adequate for, and even preclude, techniques that involve multiple phases, or non-trivial auxiliary data structures. In recent years there have...VALUE field of node 23: would indicate that the type of the value field was mtcger. As with "union mode" or "variant record" features in many
Coustaty, M; Bertet, K; Visani, M; Ogier, J
2011-08-01
In this paper, we propose a new approach for symbol recognition using structural signatures and a Galois lattice as a classifier. The structural signatures are based on topological graphs computed from segments which are extracted from the symbol images by using an adapted Hough transform. These structural signatures-that can be seen as dynamic paths which carry high-level information-are robust toward various transformations. They are classified by using a Galois lattice as a classifier. The performance of the proposed approach is evaluated based on the GREC'03 symbol database, and the experimental results we obtain are encouraging.
Validation and extraction of molecular-geometry information from small-molecule databases.
Long, Fei; Nicholls, Robert A; Emsley, Paul; Graǽulis, Saulius; Merkys, Andrius; Vaitkus, Antanas; Murshudov, Garib N
2017-02-01
A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.
Misra, Dharitri; Chen, Siyuan; Thoma, George R
2009-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
Jenke, Dennis; Carlson, Tage
2014-01-01
Demonstrating suitability for intended use is necessary to register packaging, delivery/administration, or manufacturing systems for pharmaceutical products. During their use, such systems may interact with the pharmaceutical product, potentially adding extraneous entities to those products. These extraneous entities, termed leachables, have the potential to affect the product's performance and/or safety. To establish the potential safety impact, drug products and their packaging, delivery, or manufacturing systems are tested for leachables or extractables, respectively. This generally involves testing a sample (either the extract or the drug product) by a means that produces a test method response and then correlating the test method response with the identity and concentration of the entity causing the response. Oftentimes, analytical tests produce responses that cannot readily establish the associated entity's identity. Entities associated with un-interpretable responses are termed unknowns. Scientifically justifiable thresholds are used to establish those individual unknowns that represent an acceptable patient safety risk and thus which do not require further identification and, conversely, those unknowns whose potential safety impact require that they be identified. Such thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article documents toxicological information for over 540 extractables identified in laboratory testing of polymeric materials used in pharmaceutical applications. Relevant toxicological endpoints, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others were collated for these extractables or their structurally similar surrogates and were systematically assessed to produce a risk index, which represents a daily intake value for life-long intravenous administration. This systematic approach uses four uncertainty factors, each assigned a factor of 10, which consider the quality and relevance of the data, differences in route of administration, non-human species to human extrapolations, and inter-individual variation among humans. In addition to the risk index values, all extractables and most of their surrogates were classified for structural safety alerts using Cramer rules and for mutagenicity alerts using an in silico approach (Benigni/Bossa rule base for mutagenicity via Toxtree). Lastly, in vitro mutagenicity data (Ames Salmonella typimurium and Mouse Lymphoma tests) were collected from available databases (Chemical Carcinogenesis Research Information and Carcinogenic Potency Database). The frequency distributions of the resulting data were established; in general risk index values were normally distributed around a band ranging from 5 to 20 mg/day. The risk index associated with 95% level of the cumulative distribution plot was approximately 0.1 mg/day. Thirteen extractables in the dataset had individual risk index values less than 0.1 mg/day, although four of these had additional risk indices, based on multiple different toxicological endpoints, above 0.1 mg/day. Additionally, approximately 50% of the extractables were classified in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (no basis to assume safety). Lastly, roughly 20% of the extractables triggered either an in vitro or in silico alert for mutagenicity. When Cramer classifications and the mutagenicity alerts were compared to the risk indices, extractables with safety alerts generally had lower risk index values, although the differences in the risk index data distributions, extractables with or without alerts, were small and subtle. Leachables from packaging systems, manufacturing systems, or delivery devices can accumulate in drug products and potentially affect the drug product. Although drug products can be analyzed for leachables (and material extracts can be analyzed for extractables), not all leachables or extractables can be fully identified. Safety thresholds can be used to establish whether the unidentified substances can be deemed to be safe or whether additional analytical efforts need to be made to secure the identities. These thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article contains safety data for over 500 extractables that were identified in laboratory characterizations of polymers used in pharmaceutical applications. The safety data consists of structural toxicity classifications of the extractables as well as calculated risk indices, where the risk indices were obtained by subjecting toxicological safety data, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others to a systematic evaluation process using appropriate uncertainty factors. Thus the risk index values represent daily exposures for the lifetime intravenous administration of drugs. The frequency distributions of the risk indices and Cramer classifications were examined. The risk index values were normally distributed around a range of 5 to 20 mg/day, and the risk index associated with the 95% level of the cumulative frequency plot was 0.1 mg/day. Approximately 50% of the extractables were in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (high risk of toxicity). Approximately 20% of the extractables produced an in vitro or in silico mutagenicity alert. In general, the distribution of risk index values was not strongly correlated with the either extractables' Cramer classification or by mutagenicity alerts. However, extractables with either in vitro or in silico alerts were somewhat more likely to have low risk index values. © PDA, Inc. 2014.
Dörrstein, Jörg; Scholz, Ronja; Schwarz, Dominik; Schieder, Doris; Sieber, Volker; Walther, Frank; Zollfrank, Cordt
2018-04-01
This article presents experimental data of organosolv lignin from Poacea grass and structural changes after compounding and injection molding as presented in the research article "Effects of high-lignin-loading on thermal, mechanical, and morphological properties of bioplastic composites" [1]. It supplements the article with morphological (SEM), spectroscopic ( 31 P NMR, FT-IR) and chromatographic (GPC, EA) data of the starting lignin as well as molar mass characteristics (mass average molar mass (M w ) and Polydispersity (D)) of the extracted lignin. Refer to Schwarz et al. [2] for a detailed description of the production of the organosolv residue and for further information on the raw material used for lignin extraction. The dataset is made publicly available and can be useful for extended lignin research and critical analyzes.
Native Cellulose: Structure, Characterization and Thermal Properties
Poletto, Matheus; Ornaghi Júnior, Heitor L.; Zattera, Ademir J.
2014-01-01
In this work, the relationship between cellulose crystallinity, the influence of extractive content on lignocellulosic fiber degradation, the correlation between chemical composition and the physical properties of ten types of natural fibers were investigated by FTIR spectroscopy, X-ray diffraction and thermogravimetry techniques. The results showed that higher extractive contents associated with lower crystallinity and lower cellulose crystallite size can accelerate the degradation process and reduce the thermal stability of the lignocellulosic fibers studied. On the other hand, the thermal decomposition of natural fibers is shifted to higher temperatures with increasing the cellulose crystallinity and crystallite size. These results indicated that the cellulose crystallite size affects the thermal degradation temperature of natural fibers. This study showed that through the methods used, previous information about the structure and properties of lignocellulosic fibers can be obtained before use in composite formulations. PMID:28788179
Clustering XML Documents Using Frequent Subtrees
NASA Astrophysics Data System (ADS)
Kutty, Sangeetha; Tran, Tien; Nayak, Richi; Li, Yuefeng
This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.
NASA Technical Reports Server (NTRS)
Smith, Michael A.; Kanade, Takeo
1997-01-01
Digital video is rapidly becoming important for education, entertainment, and a host of multimedia applications. With the size of the video collections growing to thousands of hours, technology is needed to effectively browse segments in a short time without losing the content of the video. We propose a method to extract the significant audio and video information and create a "skim" video which represents a very short synopsis of the original. The goal of this work is to show the utility of integrating language and image understanding techniques for video skimming by extraction of significant information, such as specific objects, audio keywords and relevant video structure. The resulting skim video is much shorter, where compaction is as high as 20:1, and yet retains the essential content of the original segment.
PropBase Query Layer: a single portal to UK subsurface physical property databases
NASA Astrophysics Data System (ADS)
Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham
2013-04-01
Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.
Yang, Zhi; Wu, Youqian; Wu, Shihua
2016-01-29
Despite of substantial developments of extraction and separation techniques, isolation of natural products from natural resources is still a challenging task. In this work, an efficient strategy for extraction and isolation of multi-component natural products has been successfully developed by combination of systematic two-phase liquid-liquid extraction-(13)C NMR pattern recognition and following conical counter-current chromatography separation. A small-scale crude sample was first distributed into 9 systematic hexane-ethyl acetate-methanol-water (HEMWat) two-phase solvent systems for determination of the optimum extraction solvents and partition coefficients of the prominent components. Then, the optimized solvent systems were used in succession to enrich the hydrophilic and lipophilic components from the large-scale crude sample. At last, the enriched components samples were further purified by a new conical counter-current chromatography (CCC). Due to the use of (13)C NMR pattern recognition, the kinds and structures of major components in the solvent extracts could be predicted. Therefore, the method could collect simultaneously the partition coefficients and the structural information of components in the selected two-phase solvents. As an example, a cytotoxic extract of podophyllotoxins and flavonoids from Dysosma versipellis (Hance) was selected. After the systematic HEMWat system solvent extraction and (13)C NMR pattern recognition analyses, the crude extract of D. versipellis was first degreased by the upper phase of HEMWat system (9:1:9:1, v/v), and then distributed in the two phases of the system of HEMWat (2:8:2:8, v/v) to obtain the hydrophilic lower phase extract and lipophilic upper phase extract, respectively. These extracts were further separated by conical CCC with the HEMWat systems (1:9:1:9 and 4:6:4:6, v/v). As results, total 17 cytotoxic compounds were isolated and identified. In general, whole results suggested that the strategy was very efficient for the systematic extraction and isolation of biological active components from the complex biomaterials. Copyright © 2016 Elsevier B.V. All rights reserved.
Small Angle X-Ray Scattering from Lipid-Bound Myelin Basic Protein in Solution
Haas, H.; Oliveira, C. L. P.; Torriani, I. L.; Polverini, E.; Fasano, A.; Carlone, G.; Cavatorta, P.; Riccio, P.
2004-01-01
The structure of myelin basic protein (MBP), purified from the myelin sheath in both lipid-free (LF-MBP) and lipid-bound (LB-MBP) forms, was investigated in solution by small angle x-ray scattering. The water-soluble LF-MBP, extracted at pH < 3.0 from defatted brain, is the classical preparation of MBP, commonly regarded as an intrinsically unfolded protein. LB-MBP is a lipoprotein-detergent complex extracted from myelin with its native lipidic environment at pH > 7.0. Under all conditions, the scattering from the two protein forms was different, indicating different molecular shapes. For the LB-MBP, well-defined scattering curves were obtained, suggesting that the protein had a unique, compact (but not globular) structure. Furthermore, these data were compatible with earlier results from molecular modeling calculations on the MBP structure which have been refined by us. In contrast, the LF-MBP data were in accordance with the expected open-coil conformation. The results represent the first direct structural information from x-ray scattering measurements on MBP in its native lipidic environment in solution. PMID:14695288
NASA Astrophysics Data System (ADS)
Su, Zhongqing; Ye, Lin
2004-08-01
The practical utilization of elastic waves, e.g. Rayleigh-Lamb waves, in high-performance structural health monitoring techniques is somewhat impeded due to the complicated wave dispersion phenomena, the existence of multiple wave modes, the high susceptibility to diverse interferences, the bulky sampled data and the difficulty in signal interpretation. An intelligent signal processing and pattern recognition (ISPPR) approach using the wavelet transform and artificial neural network algorithms was developed; this was actualized in a signal processing package (SPP). The ISPPR technique comprehensively functions as signal filtration, data compression, characteristic extraction, information mapping and pattern recognition, capable of extracting essential yet concise features from acquired raw wave signals and further assisting in structural health evaluation. For validation, the SPP was applied to the prediction of crack growth in an alloy structural beam and construction of a damage parameter database for defect identification in CF/EP composite structures. It was clearly apparent that the elastic wave propagation-based damage assessment could be dramatically streamlined by introduction of the ISPPR technique.
Liu, Chao; Sun, Yonghai; Mao, Qian; Guo, Xiaolei; Li, Peng; Liu, Yang; Xu, Na
2016-01-01
Polysaccharides from Morchella esculenta have been proven to be functional and helpful for humans. The purpose of this study was to investigate the chemical structure and anti-proliferating and antitumor activities of a Morchella esculenta polysaccharide (MEP) extracted by pulsed electric field (PEF) in submerged fermentation. The endo-polysaccharide was separated and purified by column chromatography and Gel permeation chromatography, and analyzed by gas chromatography. The MEP with an average molecular weight of 81,835 Da consisted of xylose, glucose, mannose, rhamnose and galactose at the ratio of 5.4:5.0:6.5:7.8:72.3. Structure of MEP was further analyzed by Fourier-transform infrared spectroscopy and 1H and 13C liquid-state nuclear magnetic resonance spectroscopy. Apoptosis tests proved that MEP could inhibit the proliferation and growth of human colon cancer HT-29 cells in a time- and dose-dependent manner within 48 h. This study provides more information on chemical structure of anti-proliferating polysaccharides isolated from Morchella esculenta. PMID:27338370
Ricci, Arianna; Parpinello, Giuseppina P; Olejar, Kenneth J; Kilmartin, Paul A; Versari, Andrea
2015-11-01
Attenuated total reflection Fourier transform infrared (FT-IR) spectroscopy was used to characterize 40 commercial tannins, including condensed and hydrolyzable chemical classes, provided as powder extracts from suppliers. Spectral data were processed to detect typical molecular vibrations of tannins bearing different chemical groups and of varying botanical origin (univariate qualitative analysis). The mid-infrared region between 4000 and 520 cm(-1) was analyzed, with a particular emphasis on the vibrational modes in the fingerprint region (1800-520 cm(-1)), which provide detailed information about skeletal structures and specific substituents. The region 1800-1500 cm(-1) contained signals due to hydrolyzable structures, while bands due to condensed tannins appeared at 1300-900 cm(-1) and exhibited specific hydroxylation patterns useful to elucidate the structure of the flavonoid monomeric units. The spectra were investigated further using principal component analysis for discriminative purposes, to enhance the ability of infrared spectroscopy in the classification and quality control of commercial dried extracts and to enhance their industrial exploitation.
Towards Phenotyping of Clinical Trial Eligibility Criteria.
Löbe, Matthias; Stäubert, Sebastian; Goldberg, Colleen; Haffner, Ivonne; Winter, Alfred
2018-01-01
Medical plaintext documents contain important facts about patients, but they are rarely available for structured queries. The provision of structured information from natural language texts in addition to the existing structured data can significantly speed up the search for fulfilled inclusion criteria and thus improve the recruitment rate. This work is aimed at supporting clinical trial recruitment with text mining techniques to identify suitable subjects in hospitals. Based on the inclusion/exclusion criteria of 5 sample studies and a text corpus consisting of 212 doctor's letters and medical follow-up documentation from a university cancer center, a prototype was developed and technically evaluated using NLP procedures (UIMA) for the extraction of facts from medical free texts. It was found that although the extracted entities are not always correct (precision between 23% and 96%), they provide a decisive indication as to which patient file should be read preferentially. The prototype presented here demonstrates the technical feasibility. In order to find available, lucrative phenotypes, an in-depth evaluation is required.
NASA Astrophysics Data System (ADS)
Chalmin, E.; Farges, F.; Brown, G. E.
2009-01-01
High-resolution manganese K-edge X-ray absorption near edge structure spectra were collected on a set of 40 Mn-bearing minerals. The pre-edge feature information (position, area) was investigated to extract as much as possible quantitative valence and symmetry information for manganese in various “test” and “unknown” minerals and glasses. The samples present a range of manganese symmetry environments (tetrahedral, square planar, octahedral, and cubic) and valences (II to VII). The extraction of the pre-edge information is based on a previous multiple scattering and multiplet calculations for model compounds. Using the method described in this study, a robust estimation of the manganese valence could be obtained from the pre-edge region at 5% accuracy level. This method applied to 20 “test” compounds (such as hausmannite and rancieite) and to 15 “unknown” compounds (such as axinite and birnessite) provides a quantitative estimate of the average valence of manganese in complex minerals and silicate glasses.
Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S
2018-05-31
Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.
Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource
Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa
2003-01-01
Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355
Cancer Patients' Informational Needs: Qualitative Content Analysis.
Heidari, Haydeh; Mardani-Hamooleh, Marjan
2016-12-01
Understanding the informational needs of cancer patients is a requirement to plan any educative care program for them. The aim of this study was to identify Iranian cancer patients' perceptions of informational needs. The study took a qualitative approach. Semi-structured interviews were held with 25 cancer patients in two teaching hospitals in Iran. Transcripts of the interviews underwent conventional content analysis, and categories were extracted. The results came under two main categories: disease-related informational needs and information needs related to daily life. Disease-related informational needs had two subcategories: obtaining information about the nature of disease and obtaining information about disease prognosis. Information needs related to daily life also had two subcategories: obtaining information about healthy lifestyle and obtaining information about regular activities of daily life. The findings provide deep understanding of cancer patients' informational needs in Iran.
Review of Extracting Information From the Social Web for Health Personalization
Karlsen, Randi; Bonander, Jason
2011-01-01
In recent years the Web has come into its own as a social platform where health consumers are actively creating and consuming Web content. Moreover, as the Web matures, consumers are gaining access to personalized applications adapted to their health needs and interests. The creation of personalized Web applications relies on extracted information about the users and the content to personalize. The Social Web itself provides many sources of information that can be used to extract information for personalization apart from traditional Web forms and questionnaires. This paper provides a review of different approaches for extracting information from the Social Web for health personalization. We reviewed research literature across different fields addressing the disclosure of health information in the Social Web, techniques to extract that information, and examples of personalized health applications. In addition, the paper includes a discussion of technical and socioethical challenges related to the extraction of information for health personalization. PMID:21278049
Wireless AE Event and Environmental Monitoring for Wind Turbine Blades at Low Sampling Rates
NASA Astrophysics Data System (ADS)
Bouzid, Omar M.; Tian, Gui Y.; Cumanan, K.; Neasham, J.
Integration of acoustic wireless technology in structural health monitoring (SHM) applications introduces new challenges due to requirements of high sampling rates, additional communication bandwidth, memory space, and power resources. In order to circumvent these challenges, this chapter proposes a novel solution through building a wireless SHM technique in conjunction with acoustic emission (AE) with field deployment on the structure of a wind turbine. This solution requires a low sampling rate which is lower than the Nyquist rate. In addition, features extracted from aliased AE signals instead of reconstructing the original signals on-board the wireless nodes are exploited to monitor AE events, such as wind, rain, strong hail, and bird strike in different environmental conditions in conjunction with artificial AE sources. Time feature extraction algorithm, in addition to the principal component analysis (PCA) method, is used to extract and classify the relevant information, which in turn is used to classify or recognise a testing condition that is represented by the response signals. This proposed novel technique yields a significant data reduction during the monitoring process of wind turbine blades.
Oil-Water Flow Investigations using Planar-Laser Induced Fluorescence and Particle Velocimetry
NASA Astrophysics Data System (ADS)
Ibarra, Roberto; Matar, Omar K.; Markides, Christos N.
2017-11-01
The study of the complex behaviour of immiscible liquid-liquid flow in pipes requires the implementation of advanced measurement techniques in order to extract detailed in situ information. Laser-based diagnostic techniques allow the extraction of high-resolution space- and time resolve phase and velocity information, which aims to improve the fundamental understanding of these flows and to validate closure relations for advanced multiphase flow models. This work shows a novel simultaneous planar-laser induced fluorescence and particle velocimetry in stratified oil-water flows using two laser light sheets at two different wavelengths for fluids with different refractive indices at horizontal and upward pipe inclinations (<5°) in stratified flow conditions (i.e. separated layers). Complex flow structures are extracted from 2-D instantaneous velocity fields, which are strongly dependent on the pipe inclination at low velocities. The analysis of mean wall-normal velocity profiles and velocity fluctuations suggests the presence of single- and counter-rotating vortices in the azimuthal direction, especially in the oil layer, which can be attributed to the influence of the interfacial waves. Funding from BP, and the TMF Consortium is gratefully acknowledged.
Yang, Heejung; Lee, Dong Young; Kang, Kyo Bin; Kim, Jeom Yong; Kim, Sun Ok; Yoo, Young Hyo; Sung, Sang Hyun
2015-05-10
A dry purified extract of Panax ginseng (PEG) was prepared using a manufacturing process that includes column chromatography, acid hydrolysis, and an enzyme reaction. During the manufacturing process, the more polar ginsenosides were altered into less polar forms via cleavage of their sugar chains and structural modifications of the aglycones, such as hydroxylation and dehydroxylation. The structural changes of ginsenosides during the intermediate steps from dried ginseng extract (DGE) to PEG were monitored by ultra-performance liquid chromatography coupled with quadrupole time-of-flight mass spectroscopy (UPLC-QTOF/MS). 22 ginsenosides isolated from PEG were used as the reference standards for determining of unknown ginsenosides and further suggesting of the metabolic markers. The elution order of 22 ginsenosides based on the type of aglycones, and the location and number of sugar chains can be used for the structural elucidation of unknown ginsenosides. This information could be used in a dereplication process for quick and efficient identification of ginsenoside derivatives in ginseng preparations. A dereplication approach helped the identification of the metabolic markers in the UPLC-QTOF/MS chromatograms during the conversion process with multivariate analyses, including principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) plots. These metabolic markers were identified by comparing with the dereplication information of the reference standards of 22 ginsenosides, or they were assigned using the pattern of the MS/MS fragmented ions. Consequently, the developed metabolic profiling approach using UPLC-QTOF/MS and multivariate analysis represents a new method for providing quality control as well as useful criteria for a similarity evaluation of the manufacturing process of ginseng preparations. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, Lei; Li, Dehua; Yang, Jie
2007-12-01
Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.
Papenmeier, Frank; Schwan, Stephan
2016-02-01
Viewing objects with stereoscopic displays provides additional depth cues through binocular disparity supporting object recognition. So far, it was unknown whether this results from the representation of specific stereoscopic information in memory or a more general representation of an object's depth structure. Therefore, we investigated whether continuous object rotation acting as depth cue during encoding results in a memory representation that can subsequently be accessed by stereoscopic information during retrieval. In Experiment 1, we found such transfer effects from continuous object rotation during encoding to stereoscopic presentations during retrieval. In Experiments 2a and 2b, we found that the continuity of object rotation is important because only continuous rotation and/or stereoscopic depth but not multiple static snapshots presented without stereoscopic information caused the extraction of an object's depth structure into memory. We conclude that an object's depth structure and not specific depth cues are represented in memory. Copyright © 2015 Elsevier B.V. All rights reserved.
Smart Point Cloud: Definition and Remaining Challenges
NASA Astrophysics Data System (ADS)
Poux, F.; Hallot, P.; Neuville, R.; Billen, R.
2016-10-01
Dealing with coloured point cloud acquired from terrestrial laser scanner, this paper identifies remaining challenges for a new data structure: the smart point cloud. This concept arises with the statement that massive and discretized spatial information from active remote sensing technology is often underused due to data mining limitations. The generalisation of point cloud data associated with the heterogeneity and temporality of such datasets is the main issue regarding structure, segmentation, classification, and interaction for an immediate understanding. We propose to use both point cloud properties and human knowledge through machine learning to rapidly extract pertinent information, using user-centered information (smart data) rather than raw data. A review of feature detection, machine learning frameworks and database systems indexed both for mining queries and data visualisation is studied. Based on existing approaches, we propose a new 3-block flexible framework around device expertise, analytic expertise and domain base reflexion. This contribution serves as the first step for the realisation of a comprehensive smart point cloud data structure.
Higgins, Denice; Rohrlach, Adam B.; Kaidonis, John; Townsend, Grant; Austin, Jeremy J.
2015-01-01
Major advances in genetic analysis of skeletal remains have been made over the last decade, primarily due to improvements in post-DNA-extraction techniques. Despite this, a key challenge for DNA analysis of skeletal remains is the limited yield of DNA recovered from these poorly preserved samples. Enhanced DNA recovery by improved sampling and extraction techniques would allow further advancements. However, little is known about the post-mortem kinetics of DNA degradation and whether the rate of degradation varies between nuclear and mitochondrial DNA or across different skeletal tissues. This knowledge, along with information regarding ante-mortem DNA distribution within skeletal elements, would inform sampling protocols facilitating development of improved extraction processes. Here we present a combined genetic and histological examination of DNA content and rates of DNA degradation in the different tooth tissues of 150 human molars over short-medium post-mortem intervals. DNA was extracted from coronal dentine, root dentine, cementum and pulp of 114 teeth via a silica column method and the remaining 36 teeth were examined histologically. Real time quantification assays based on two nuclear DNA fragments (67 bp and 156 bp) and one mitochondrial DNA fragment (77 bp) showed nuclear and mitochondrial DNA degraded exponentially, but at different rates, depending on post-mortem interval and soil temperature. In contrast to previous studies, we identified differential survival of nuclear and mtDNA in different tooth tissues. Futhermore histological examination showed pulp and dentine were rapidly affected by loss of structural integrity, and pulp was completely destroyed in a relatively short time period. Conversely, cementum showed little structural change over the same time period. Finally, we confirm that targeted sampling of cementum from teeth buried for up to 16 months can provide a reliable source of nuclear DNA for STR-based genotyping using standard extraction methods, without the need for specialised equipment or large-volume demineralisation steps. PMID:25992635
High-Resolution Remote Sensing Image Building Extraction Based on Markov Model
NASA Astrophysics Data System (ADS)
Zhao, W.; Yan, L.; Chang, Y.; Gong, L.
2018-04-01
With the increase of resolution, remote sensing images have the characteristics of increased information load, increased noise, more complex feature geometry and texture information, which makes the extraction of building information more difficult. To solve this problem, this paper designs a high resolution remote sensing image building extraction method based on Markov model. This method introduces Contourlet domain map clustering and Markov model, captures and enhances the contour and texture information of high-resolution remote sensing image features in multiple directions, and further designs the spectral feature index that can characterize "pseudo-buildings" in the building area. Through the multi-scale segmentation and extraction of image features, the fine extraction from the building area to the building is realized. Experiments show that this method can restrain the noise of high-resolution remote sensing images, reduce the interference of non-target ground texture information, and remove the shadow, vegetation and other pseudo-building information, compared with the traditional pixel-level image information extraction, better performance in building extraction precision, accuracy and completeness.
Gea, An; Stringano, Elisabetta; Brown, Ron H; Mueller-Harvey, Irene
2011-01-26
A rapid thiolytic degradation and cleanup procedure was developed for analyzing tannins directly in chlorophyll-containing sainfoin ( Onobrychis viciifolia ) plants. The technique proved suitable for complex tannin mixtures containing catechin, epicatechin, gallocatechin, and epigallocatechin flavan-3-ol units. The reaction time was standardized at 60 min to minimize the loss of structural information as a result of epimerization and degradation of terminal flavan-3-ol units. The results were evaluated by separate analysis of extractable and unextractable tannins, which accounted for 63.6-113.7% of the in situ plant tannins. It is of note that 70% aqueous acetone extracted tannins with a lower mean degree of polymerization (mDP) than was found for tannins analyzed in situ. Extractable tannins had between 4 and 29 lower mDP values. The method was validated by comparing results from individual and mixed sample sets. The tannin composition of different sainfoin accessions covered a range of mDP values from 16 to 83, procyanidin/prodelphinidin (PC/PD) ratios from 19.2/80.8 to 45.6/54.4, and cis/trans ratios from 74.1/25.9 to 88.0/12.0. This is the first high-throughput screening method that is suitable for analyzing condensed tannin contents and structural composition directly in green plant tissue.
NASA Astrophysics Data System (ADS)
Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron
2017-05-01
This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.
v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text
Divita, Guy; Carter, Marjorie E.; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H.; Gundlapalli, Adi V.
2016-01-01
Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records. PMID:27683667
Pometti, Carolina L.; Bessega, Cecilia F.; Saidman, Beatriz O.; Vilardi, Juan C.
2014-01-01
Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other. PMID:24688293
Self-Supervised Chinese Ontology Learning from Online Encyclopedias
Shao, Zhiqing; Ruan, Tong
2014-01-01
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO. PMID:24715819
Self-supervised Chinese ontology learning from online encyclopedias.
Hu, Fanghuai; Shao, Zhiqing; Ruan, Tong
2014-01-01
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
How to integrate quantitative information into imaging reports for oncologic patients.
Martí-Bonmatí, L; Ruiz-Martínez, E; Ten, A; Alberich-Bayarri, A
2018-05-01
Nowadays, the images and information generated in imaging tests, as well as the reports that are issued, are digital and represent a reliable source of data. Reports can be classified according to their content and to the type of information they include into three main types: organized (free text in natural language), predefined (with templates and guidelines elaborated with previously determined natural language like that used in BI-RADS and PI-RADS), or structured (with drop-down menus displaying questions with various possible answers that have been agreed on with the rest of the multidisciplinary team, which use standardized lexicons and are structured in the form of a database with data that can be traced and exploited with statistical tools and data mining). The structured report, compatible with Management of Radiology Report Templates (MRRT), makes it possible to incorporate quantitative information related with the digital analysis of the data from the acquired images to accurately and precisely describe the properties and behavior of tissues by means of radiomics (characteristics and parameters). In conclusion, structured digital information (images, text, measurements, radiomic features, and imaging biomarkers) should be integrated into computerized reports so that they can be indexed in large repositories. Radiologic databanks are fundamental for exploiting health information, phenotyping lesions and diseases, and extracting conclusions in personalized medicine. Copyright © 2018 SERAM. Publicado por Elsevier España, S.L.U. All rights reserved.
Super-pixel extraction based on multi-channel pulse coupled neural network
NASA Astrophysics Data System (ADS)
Xu, GuangZhu; Hu, Song; Zhang, Liu; Zhao, JingJing; Fu, YunXia; Lei, BangJun
2018-04-01
Super-pixel extraction techniques group pixels to form over-segmented image blocks according to the similarity among pixels. Compared with the traditional pixel-based methods, the image descripting method based on super-pixel has advantages of less calculation, being easy to perceive, and has been widely used in image processing and computer vision applications. Pulse coupled neural network (PCNN) is a biologically inspired model, which stems from the phenomenon of synchronous pulse release in the visual cortex of cats. Each PCNN neuron can correspond to a pixel of an input image, and the dynamic firing pattern of each neuron contains both the pixel feature information and its context spatial structural information. In this paper, a new color super-pixel extraction algorithm based on multi-channel pulse coupled neural network (MPCNN) was proposed. The algorithm adopted the block dividing idea of SLIC algorithm, and the image was divided into blocks with same size first. Then, for each image block, the adjacent pixels of each seed with similar color were classified as a group, named a super-pixel. At last, post-processing was adopted for those pixels or pixel blocks which had not been grouped. Experiments show that the proposed method can adjust the number of superpixel and segmentation precision by setting parameters, and has good potential for super-pixel extraction.
Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M
2012-10-01
In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.
Balu, Rajkamal; Knott, Robert; Cowieson, Nathan P.; Elvin, Christopher M.; Hill, Anita J.; Choudhury, Namita R.; Dutta, Naba K.
2015-01-01
Rec1-resilin is the first recombinant resilin-mimetic protein polymer, synthesized from exon-1 of the Drosophila melanogaster gene CG15920 that has demonstrated unusual multi-stimuli responsiveness in aqueous solution. Crosslinked hydrogels of Rec1-resilin have also displayed remarkable mechanical properties including near-perfect rubber-like elasticity. The structural basis of these extraordinary properties is not clearly understood. Here we combine a computational and experimental investigation to examine structural ensembles of Rec1-resilin in aqueous solution. The structure of Rec1-resilin in aqueous solutions is investigated experimentally using circular dichroism (CD) spectroscopy and small angle X-ray scattering (SAXS). Both bench-top and synchrotron SAXS are employed to extract structural data sets of Rec1-resilin and to confirm their validity. Computational approaches have been applied to these experimental data sets in order to extract quantitative information about structural ensembles including radius of gyration, pair-distance distribution function, and the fractal dimension. The present work confirms that Rec1-resilin is an intrinsically disordered protein (IDP) that displays equilibrium structural qualities between those of a structured globular protein and a denatured protein. The ensemble optimization method (EOM) analysis reveals a single conformational population with partial compactness. This work provides new insight into the structural ensembles of Rec1-resilin in solution. PMID:26042819
Balu, Rajkamal; Knott, Robert; Cowieson, Nathan P; Elvin, Christopher M; Hill, Anita J; Choudhury, Namita R; Dutta, Naba K
2015-06-04
Rec1-resilin is the first recombinant resilin-mimetic protein polymer, synthesized from exon-1 of the Drosophila melanogaster gene CG15920 that has demonstrated unusual multi-stimuli responsiveness in aqueous solution. Crosslinked hydrogels of Rec1-resilin have also displayed remarkable mechanical properties including near-perfect rubber-like elasticity. The structural basis of these extraordinary properties is not clearly understood. Here we combine a computational and experimental investigation to examine structural ensembles of Rec1-resilin in aqueous solution. The structure of Rec1-resilin in aqueous solutions is investigated experimentally using circular dichroism (CD) spectroscopy and small angle X-ray scattering (SAXS). Both bench-top and synchrotron SAXS are employed to extract structural data sets of Rec1-resilin and to confirm their validity. Computational approaches have been applied to these experimental data sets in order to extract quantitative information about structural ensembles including radius of gyration, pair-distance distribution function, and the fractal dimension. The present work confirms that Rec1-resilin is an intrinsically disordered protein (IDP) that displays equilibrium structural qualities between those of a structured globular protein and a denatured protein. The ensemble optimization method (EOM) analysis reveals a single conformational population with partial compactness. This work provides new insight into the structural ensembles of Rec1-resilin in solution.
NASA Astrophysics Data System (ADS)
Balu, Rajkamal; Knott, Robert; Cowieson, Nathan P.; Elvin, Christopher M.; Hill, Anita J.; Choudhury, Namita R.; Dutta, Naba K.
2015-06-01
Rec1-resilin is the first recombinant resilin-mimetic protein polymer, synthesized from exon-1 of the Drosophila melanogaster gene CG15920 that has demonstrated unusual multi-stimuli responsiveness in aqueous solution. Crosslinked hydrogels of Rec1-resilin have also displayed remarkable mechanical properties including near-perfect rubber-like elasticity. The structural basis of these extraordinary properties is not clearly understood. Here we combine a computational and experimental investigation to examine structural ensembles of Rec1-resilin in aqueous solution. The structure of Rec1-resilin in aqueous solutions is investigated experimentally using circular dichroism (CD) spectroscopy and small angle X-ray scattering (SAXS). Both bench-top and synchrotron SAXS are employed to extract structural data sets of Rec1-resilin and to confirm their validity. Computational approaches have been applied to these experimental data sets in order to extract quantitative information about structural ensembles including radius of gyration, pair-distance distribution function, and the fractal dimension. The present work confirms that Rec1-resilin is an intrinsically disordered protein (IDP) that displays equilibrium structural qualities between those of a structured globular protein and a denatured protein. The ensemble optimization method (EOM) analysis reveals a single conformational population with partial compactness. This work provides new insight into the structural ensembles of Rec1-resilin in solution.
Chen, W; Kowatch, R; Lin, S; Splaingard, M; Huang, Y
2015-01-01
Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.
Chen, W.; Kowatch, R.; Lin, S.; Splaingard, M.
2015-01-01
Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. PMID:26171080
NASA Astrophysics Data System (ADS)
Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.
2016-09-01
Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.
Ghahari, S. F.; Abazarsa, F.; Avci, O.; Çelebi, Mehmet; Taciroglu, E.
2016-01-01
The Robert A. Millikan Library is a reinforced concrete building with a basement level and nine stories above the ground. Located on the campus of California Institute of Technology (Caltech) in Pasadena California, it is among the most densely instrumented buildings in the U.S. From the early dates of its construction, it has been the subject of many investigations, especially regarding soil–structure interaction effects. It is well accepted that the structure is significantly interacting with the surrounding soil, which implies that the true foundation input motions cannot be directly recorded during earthquakes because of inertial effects. Based on this limitation, input–output modal identification methods are not applicable to this soil–structure system. On the other hand, conventional output-only methods are typically based on the unknown input signals to be stationary whitenoise, which is not the case for earthquake excitations. Through the use of recently developed blind identification (i.e. output-only) methods, it has become possible to extract such information from only the response signals because of earthquake excitations. In the present study, we employ such a blind identification method to extract the modal properties of the Millikan Library. We present some modes that have not been identified from force vibration tests in several studies to date. Then, to quantify the contribution of soil–structure interaction effects, we first create a detailed Finite Element (FE) model using available information about the superstructure; and subsequently update the soil–foundation system's dynamic stiffnesses at each mode such that the modal properties of the entire soil–structure system agree well with those obtained via output-only modal identification.
Athavale, Prashant; Xu, Robert; Radau, Perry; Nachman, Adrian; Wright, Graham A
2015-07-01
Images consist of structures of varying scales: large scale structures such as flat regions, and small scale structures such as noise, textures, and rapidly oscillatory patterns. In the hierarchical (BV, L(2)) image decomposition, Tadmor, et al. (2004) start with extracting coarse scale structures from a given image, and successively extract finer structures from the residuals in each step of the iterative decomposition. We propose to begin instead by extracting the finest structures from the given image and then proceed to extract increasingly coarser structures. In most images, noise could be considered as a fine scale structure. Thus, starting the image decomposition with finer scales, rather than large scales, leads to fast denoising. We note that our approach turns out to be equivalent to the nonstationary regularization in Scherzer and Weickert (2000). The continuous limit of this procedure leads to a time-scaled version of total variation flow. Motivated by specific clinical applications, we introduce an image depending weight in the regularization functional, and study the corresponding weighted TV flow. We show that the edge-preserving property of the multiscale representation of an input image obtained with the weighted TV flow can be enhanced and localized by appropriate choice of the weight. We use this in developing an efficient and edge-preserving denoising algorithm with control on speed and localization properties. We examine analytical properties of the weighted TV flow that give precise information about the denoising speed and the rate of change of energy of the images. An additional contribution of the paper is to use the images obtained at different scales for robust multiscale registration. We show that the inherently multiscale nature of the weighted TV flow improved performance for registration of noisy cardiac MRI images, compared to other methods such as bilateral or Gaussian filtering. A clinical application of the multiscale registration algorithm is also demonstrated for aligning viability assessment magnetic resonance (MR) images from 8 patients with previous myocardial infarctions. Copyright © 2015. Published by Elsevier B.V.
Paiardini, Alessandro; Bossa, Francesco; Pascarella, Stefano
2004-01-01
The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction. PMID:15498941
Automated detection of qualitative spatio-temporal features in electrocardiac activation maps.
Ironi, Liliana; Tentoni, Stefania
2007-02-01
This paper describes a piece of work aiming at the realization of a tool for the automated interpretation of electrocardiac maps. Such maps can capture a number of electrical conduction pathologies, such as arrhytmia, that can be missed by the analysis of traditional electrocardiograms. But, their introduction into the clinical practice is still far away as their interpretation requires skills that belongs to very few experts. Then, an automated interpretation tool would bridge the gap between the established research outcome and clinical practice with a consequent great impact on health care. Qualitative spatial reasoning can play a crucial role in the identification of spatio-temporal patterns and salient features that characterize the heart electrical activity. We adopted the spatial aggregation (SA) conceptual framework and an interplay of numerical and qualitative information to extract features from epicardial maps, and to make them available for reasoning tasks. Our focus is on epicardial activation isochrone maps as they are a synthetic representation of spatio-temporal aspects of the propagation of the electrical excitation. We provide a computational SA-based methodology to extract, from 3D epicardial data gathered over time, (1) the excitation wavefront structure, and (2) the salient features that characterize wavefront propagation and visually correspond to specific geometric objects. The proposed methodology provides a robust and efficient way to identify salient pieces of information in activation time maps. The hierarchical structure of the abstracted geometric objects, crucial in capturing the prominent information, facilitates the definition of general rules necessary to infer the correlation between pathophysiological patterns and wavefront structure and propagation.
Local and global aspects of biological motion perception in children born at very low birth weight
Williamson, K. E.; Jakobson, L. S.; Saunders, D. R.; Troje, N. F.
2015-01-01
Biological motion perception can be assessed using a variety of tasks. In the present study, 8- to 11-year-old children born prematurely at very low birth weight (<1500 g) and matched, full-term controls completed tasks that required the extraction of local motion cues, the ability to perceptually group these cues to extract information about body structure, and the ability to carry out higher order processes required for action recognition and person identification. Preterm children exhibited difficulties in all 4 aspects of biological motion perception. However, intercorrelations between test scores were weak in both full-term and preterm children—a finding that supports the view that these processes are relatively independent. Preterm children also displayed more autistic-like traits than full-term peers. In preterm (but not full-term) children, these traits were negatively correlated with performance in the task requiring structure-from-motion processing, r(30) = −.36, p < .05), but positively correlated with the ability to extract identity, r(30) = .45, p < .05). These findings extend previous reports of vulnerability in systems involved in processing dynamic cues in preterm children and suggest that a core deficit in social perception/cognition may contribute to the development of the social and behavioral difficulties even in members of this population who are functioning within the normal range intellectually. The results could inform the development of screening, diagnostic, and intervention tools. PMID:25103588
Integrated feature extraction and selection for neuroimage classification
NASA Astrophysics Data System (ADS)
Fan, Yong; Shen, Dinggang
2009-02-01
Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
A rapid extraction of landslide disaster information research based on GF-1 image
NASA Astrophysics Data System (ADS)
Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na
2015-08-01
In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.
Face recognition algorithm using extended vector quantization histogram features.
Yan, Yan; Lee, Feifei; Wu, Xueqian; Chen, Qiu
2018-01-01
In this paper, we propose a face recognition algorithm based on a combination of vector quantization (VQ) and Markov stationary features (MSF). The VQ algorithm has been shown to be an effective method for generating features; it extracts a codevector histogram as a facial feature representation for face recognition. Still, the VQ histogram features are unable to convey spatial structural information, which to some extent limits their usefulness in discrimination. To alleviate this limitation of VQ histograms, we utilize Markov stationary features (MSF) to extend the VQ histogram-based features so as to add spatial structural information. We demonstrate the effectiveness of our proposed algorithm by achieving recognition results superior to those of several state-of-the-art methods on publicly available face databases.
A tool for filtering information in complex systems
Tumminello, M.; Aste, T.; Di Matteo, T.; Mantegna, R. N.
2005-01-01
We introduce a technique to filter out complex data sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation-based graphs, giving filtered graphs that preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0), triangular loops and four-element cliques are formed. The application of this filtering procedure to 100 stocks in the U.S. equity markets shows that such loops and cliques have important and significant relationships with the market structure and properties. PMID:16027373
A tool for filtering information in complex systems.
Tumminello, M; Aste, T; Di Matteo, T; Mantegna, R N
2005-07-26
We introduce a technique to filter out complex data sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation-based graphs, giving filtered graphs that preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0), triangular loops and four-element cliques are formed. The application of this filtering procedure to 100 stocks in the U.S. equity markets shows that such loops and cliques have important and significant relationships with the market structure and properties.
Semantic Technologies for Re-Use of Clinical Routine Data.
Kreuzthaler, Markus; Martínez-Costa, Catalina; Kaiser, Peter; Schulz, Stefan
2017-01-01
Routine patient data in electronic patient records are only partly structured, and an even smaller segment is coded, mainly for administrative purposes. Large parts are only available as free text. Transforming this content into a structured and semantically explicit form is a prerequisite for querying and information extraction. The core of the system architecture presented in this paper is based on SAP HANA in-memory database technology using the SAP Connected Health platform for data integration as well as for clinical data warehousing. A natural language processing pipeline analyses unstructured content and maps it to a standardized vocabulary within a well-defined information model. The resulting semantically standardized patient profiles are used for a broad range of clinical and research application scenarios.
A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents
2011-01-01
Background A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. Methods This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. Results We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Conclusions Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts. PMID:21489220
An automatic system to detect and extract texts in medical images for de-identification
NASA Astrophysics Data System (ADS)
Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael
2010-03-01
Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.
KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process
NASA Technical Reports Server (NTRS)
Gettig, Gary A.
1988-01-01
Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.
Extracting Useful Semantic Information from Large Scale Corpora of Text
ERIC Educational Resources Information Center
Mendoza, Ray Padilla, Jr.
2012-01-01
Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…
Text mining by Tsallis entropy
NASA Astrophysics Data System (ADS)
Jamaati, Maryam; Mehri, Ali
2018-01-01
Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.
The 1984 NASA/ASEE summer faculty fellowship program
NASA Technical Reports Server (NTRS)
1984-01-01
The assessment of forest productivity and associated nitrogen flux in a number of conifer ecosystems is described. As a base line study of acid precipitation in the Sierra Nevada, involved is the extraction and integration of a number of data planes describing the terrain, soils, lithology, vegetation cover and structure, and microclimate of the region. The development of automated techniques to extract topographic networks (stream canyons and ridge lines) for use as a landscrape skeleton to organize and integrate data sets into an efficient geographical information system is examined. The software is written in both FORTRAN and C, and is portable to a number of different computer environments with minimal modification.
Long-term response of yellow-poplar to thinning in the southern Appalachian Mountains
Tara L. Keyser; Peter M. Brown
2014-01-01
As the focus of forest management on many public lands shifts away from timber production and extraction to habitat, restoration, and diversity-related objectives, it is important to understand the long-term effects that previous management activities have on structure and composition to better inform current management decisions. In this paper, we analyzed 40 years of...
Raboshchuk, Ganna; Nadeu, Climent; Jancovic, Peter; Lilja, Alex Peiro; Kokuer, Munevver; Munoz Mahamud, Blanca; Riverola De Veciana, Ana
2018-01-01
A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage, and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modeling, and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%.
Falahati, Farshad; Westman, Eric; Simmons, Andrew
2014-01-01
Machine learning algorithms and multivariate data analysis methods have been widely utilized in the field of Alzheimer's disease (AD) research in recent years. Advances in medical imaging and medical image analysis have provided a means to generate and extract valuable neuroimaging information. Automatic classification techniques provide tools to analyze this information and observe inherent disease-related patterns in the data. In particular, these classifiers have been used to discriminate AD patients from healthy control subjects and to predict conversion from mild cognitive impairment to AD. In this paper, recent studies are reviewed that have used machine learning and multivariate analysis in the field of AD research. The main focus is on studies that used structural magnetic resonance imaging (MRI), but studies that included positron emission tomography and cerebrospinal fluid biomarkers in addition to MRI are also considered. A wide variety of materials and methods has been employed in different studies, resulting in a range of different outcomes. Influential factors such as classifiers, feature extraction algorithms, feature selection methods, validation approaches, and cohort properties are reviewed, as well as key MRI-based and multi-modal based studies. Current and future trends are discussed.
Nadeu, Climent; Jančovič, Peter; Lilja, Alex Peiró; Köküer, Münevver; Muñoz Mahamud, Blanca; Riverola De Veciana, Ana
2018-01-01
A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage, and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modeling, and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%. PMID:29404227
NASA Astrophysics Data System (ADS)
Mallast, U.; Gloaguen, R.; Geyer, S.; Rödiger, T.; Siebert, C.
2011-08-01
In this paper we present a semi-automatic method to infer groundwater flow-paths based on the extraction of lineaments from digital elevation models. This method is especially adequate in remote and inaccessible areas where in-situ data are scarce. The combined method of linear filtering and object-based classification provides a lineament map with a high degree of accuracy. Subsequently, lineaments are differentiated into geological and morphological lineaments using auxiliary information and finally evaluated in terms of hydro-geological significance. Using the example of the western catchment of the Dead Sea (Israel/Palestine), the orientation and location of the differentiated lineaments are compared to characteristics of known structural features. We demonstrate that a strong correlation between lineaments and structural features exists. Using Euclidean distances between lineaments and wells provides an assessment criterion to evaluate the hydraulic significance of detected lineaments. Based on this analysis, we suggest that the statistical analysis of lineaments allows a delineation of flow-paths and thus significant information on groundwater movements. To validate the flow-paths we compare them to existing results of groundwater models that are based on well data.
The 2D analytic signal for envelope detection and feature extraction on ultrasound images.
Wachinger, Christian; Klein, Tassilo; Navab, Nassir
2012-08-01
The fundamental property of the analytic signal is the split of identity, meaning the separation of qualitative and quantitative information in form of the local phase and the local amplitude, respectively. Especially the structural representation, independent of brightness and contrast, of the local phase is interesting for numerous image processing tasks. Recently, the extension of the analytic signal from 1D to 2D, covering also intrinsic 2D structures, was proposed. We show the advantages of this improved concept on ultrasound RF and B-mode images. Precisely, we use the 2D analytic signal for the envelope detection of RF data. This leads to advantages for the extraction of the information-bearing signal from the modulated carrier wave. We illustrate this, first, by visual assessment of the images, and second, by performing goodness-of-fit tests to a Nakagami distribution, indicating a clear improvement of statistical properties. The evaluation is performed for multiple window sizes and parameter estimation techniques. Finally, we show that the 2D analytic signal allows for an improved estimation of local features on B-mode images. Copyright © 2012 Elsevier B.V. All rights reserved.
Ensemble-based evaluation for protein structure models
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2016-01-01
Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633
Sieve-based relation extraction of gene regulatory networks from biological literature
2015-01-01
Background Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. Results We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Conclusions Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. PMID:26551454
Sieve-based relation extraction of gene regulatory networks from biological literature.
Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko
2015-01-01
Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.
Utilization of ontology look-up services in information retrieval for biomedical literature.
Vishnyakova, Dina; Pasche, Emilie; Lovis, Christian; Ruch, Patrick
2013-01-01
With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance efforts. In this context, we developed Ontology Look-up services (OLS), based on NEWT and MeSH vocabularies. Our services were involved in some information retrieval tasks such as gene/disease normalization. The implementation of OLS services significantly accelerated the extraction of particular biomedical facts by structuring and enriching the data context. The results of precision in normalization tasks were boosted on about 20%.
Cary, Samantha K; Livshits, Maksim; Cross, Justin N; Ferrier, Maryline G; Mocko, Veronika; Stein, Benjamin W; Kozimor, Stosh A; Scott, Brian L; Rack, Jeffrey J
2018-04-02
Thenoyltrifluoroacetone (HTTA)-based extractions represent popular methods for separating microscopic amounts of transuranic actinides (i.e., Np and Pu) from macroscopic actinide matrixes (e.g. bulk uranium). It is well-established that this procedure enables +4 actinides to be selectively removed from +3, + 5, and +6 f-elements. However, even highly skilled and well-trained researchers find this process complicated and (at times) unpredictable. It is difficult to improve the HTTA extraction-or find alternatives-because little is understood about why this separation works. Even the identities of the extracted species are unknown. In addressing this knowledge gap, we report here advances in fundamental understanding of the HTTA-based extraction. This effort included comparatively evaluating HTTA complexation with +4 and +3 metals (M IV = Zr, Hf, Ce, Th, U, Np, and Pu vs M III = Ce, Nd, Sm, and Yb). We observed +4 metals formed neutral complexes of the general formula M IV (TTA) 4 . Meanwhile, +3 metals formed anionic M III (TTA) 4 - species. Characterization of these M(TTA) 4 x- ( x = 0, 1) compounds by UV-vis-NIR, IR, 1 H and 19 F NMR, single-crystal X-ray diffraction, and X-ray absorption spectroscopy (both near-edge and extended fine structure) was critical for determining that Np IV (TTA) 4 and Pu IV (TTA) 4 were the primary species extracted by HTTA. Furthermore, this information lays the foundation to begin developing and understanding of why the HTTA extraction works so well. The data suggest that the solubility differences between M IV (TTA) 4 and M III (TTA) 4 - are likely a major contributor to the selectivity of HTTA extractions for +4 cations over +3 metals. Moreover, these results will enable future studies focused on explaining HTTA extractions preference for +4 cations, which increases from Np IV to Pu IV , Hf IV , and Zr IV .
Linan, Margaret K; Sottara, Davide; Freimuth, Robert R
2015-01-01
Pharmacogenomics (PGx) guidelines contain drug-gene relationships, therapeutic and clinical recommendations from which clinical decision support (CDS) rules can be extracted, rendered and then delivered through clinical decision support systems (CDSS) to provide clinicians with just-in-time information at the point of care. Several tools exist that can be used to generate CDS rules that are based on computer interpretable guidelines (CIG), but none have been previously applied to the PGx domain. We utilized the Unified Modeling Language (UML), the Health Level 7 virtual medical record (HL7 vMR) model, and standard terminologies to represent the semantics and decision logic derived from a PGx guideline, which were then mapped to the Health eDecisions (HeD) schema. The modeling and extraction processes developed here demonstrate how structured knowledge representations can be used to support the creation of shareable CDS rules from PGx guidelines.
Extracting Loop Bounds for WCET Analysis Using the Instrumentation Point Graph
NASA Astrophysics Data System (ADS)
Betts, A.; Bernat, G.
2009-05-01
Every calculation engine proposed in the literature of Worst-Case Execution Time (WCET) analysis requires upper bounds on loop iterations. Existing mechanisms to procure this information are either error prone, because they are gathered from the end-user, or limited in scope, because automatic analyses target very specific loop structures. In this paper, we present a technique that obtains bounds completely automatically for arbitrary loop structures. In particular, we show how to employ the Instrumentation Point Graph (IPG) to parse traces of execution (generated by an instrumented program) in order to extract bounds relative to any loop-nesting level. With this technique, therefore, non-rectangular dependencies between loops can be captured, allowing more accurate WCET estimates to be calculated. We demonstrate the improvement in accuracy by comparing WCET estimates computed through our HMB framework against those computed with state-of-the-art techniques.
Basic level scene understanding: categories, attributes and structures
Xiao, Jianxiong; Hays, James; Russell, Bryan C.; Patterson, Genevieve; Ehinger, Krista A.; Torralba, Antonio; Oliva, Aude
2013-01-01
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image. PMID:24009590
Multi-dimensional super-resolution imaging enables surface hydrophobicity mapping
NASA Astrophysics Data System (ADS)
Bongiovanni, Marie N.; Godet, Julien; Horrocks, Mathew H.; Tosatto, Laura; Carr, Alexander R.; Wirthensohn, David C.; Ranasinghe, Rohan T.; Lee, Ji-Eun; Ponjavic, Aleks; Fritz, Joelle V.; Dobson, Christopher M.; Klenerman, David; Lee, Steven F.
2016-12-01
Super-resolution microscopy allows biological systems to be studied at the nanoscale, but has been restricted to providing only positional information. Here, we show that it is possible to perform multi-dimensional super-resolution imaging to determine both the position and the environmental properties of single-molecule fluorescent emitters. The method presented here exploits the solvatochromic and fluorogenic properties of nile red to extract both the emission spectrum and the position of each dye molecule simultaneously enabling mapping of the hydrophobicity of biological structures. We validated this by studying synthetic lipid vesicles of known composition. We then applied both to super-resolve the hydrophobicity of amyloid aggregates implicated in neurodegenerative diseases, and the hydrophobic changes in mammalian cell membranes. Our technique is easily implemented by inserting a transmission diffraction grating into the optical path of a localization-based super-resolution microscope, enabling all the information to be extracted simultaneously from a single image plane.
Multi-dimensional super-resolution imaging enables surface hydrophobicity mapping
Bongiovanni, Marie N.; Godet, Julien; Horrocks, Mathew H.; Tosatto, Laura; Carr, Alexander R.; Wirthensohn, David C.; Ranasinghe, Rohan T.; Lee, Ji-Eun; Ponjavic, Aleks; Fritz, Joelle V.; Dobson, Christopher M.; Klenerman, David; Lee, Steven F.
2016-01-01
Super-resolution microscopy allows biological systems to be studied at the nanoscale, but has been restricted to providing only positional information. Here, we show that it is possible to perform multi-dimensional super-resolution imaging to determine both the position and the environmental properties of single-molecule fluorescent emitters. The method presented here exploits the solvatochromic and fluorogenic properties of nile red to extract both the emission spectrum and the position of each dye molecule simultaneously enabling mapping of the hydrophobicity of biological structures. We validated this by studying synthetic lipid vesicles of known composition. We then applied both to super-resolve the hydrophobicity of amyloid aggregates implicated in neurodegenerative diseases, and the hydrophobic changes in mammalian cell membranes. Our technique is easily implemented by inserting a transmission diffraction grating into the optical path of a localization-based super-resolution microscope, enabling all the information to be extracted simultaneously from a single image plane. PMID:27929085
A novel method to extract dark matter parameters from neutrino telescope data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Esmaili, Arman; Farzan, Yasaman, E-mail: arman@ipm.ir, E-mail: yasaman@theory.ipm.ac.ir
2011-04-01
Recently it has been shown that when the Dark Matter (DM) particles captured in the Sun directly annihilate into neutrino pairs, the oscillatory terms in the oscillation probability do not average to zero and can lead to a seasonal variation as the distance between the Sun and Earth changes in time. In this paper, we explore this feature as a novel method to extract information on the properties of dark matter. We show that by studying the variation of the flux over a few months, it would in principle be possible to derive the DM mass as well as newmore » information on the flavor structure of the DM annihilation modes. In addition to analytic analysis, we present the results of our numerical calculations that take into account scattering and regeneration of neutrinos traversing the Sun.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, Adriana L.; Varga, Tamas
Branching structures such as lungs, blood vessels and plant roots play a critical role in life. Growth, structure, and function of these branching structures have an immense effect on our lives. Therefore, quantitative size information on such structures in their native environment is invaluable for studying their growth and the effect of the environment on them. X-ray computed tomography (XCT) has been an effective tool for in situ imaging and analysis of branching structures. We developed a costless tool that approximates the surface and volume of branching structures. Our methodology of noninvasive imaging, segmentation and extraction of quantitative information ismore » demonstrated through the analysis of a plant root in its soil medium from 3D tomography data. XCT data collected on a grass specimen was used to visualize its root structure. A suite of open-source software was employed to segment the root from the soil and determine its isosurface, which was used to calculate its volume and surface. This methodology of processing 3D data is applicable to other branching structures even when the structure of interest is of similar x-ray attenuation to its environment and difficulties arise with sample segmentation.« less
Golbamaki, Azadi; Benfenati, Emilio; Golbamaki, Nazanin; Manganaro, Alberto; Merdivan, Erinc; Roncaglioni, Alessandra; Gini, Giuseppina
2016-04-02
In this study, new molecular fragments associated with genotoxic and nongenotoxic carcinogens are introduced to estimate the carcinogenic potential of compounds. Two rule-based carcinogenesis models were developed with the aid of SARpy: model R (from rodents' experimental data) and model E (from human carcinogenicity data). Structural alert extraction method of SARpy uses a completely automated and unbiased manner with statistical significance. The carcinogenicity models developed in this study are collections of carcinogenic potential fragments that were extracted from two carcinogenicity databases: the ANTARES carcinogenicity dataset with information from bioassay on rats and the combination of ISSCAN and CGX datasets, which take into accounts human-based assessment. The performance of these two models was evaluated in terms of cross-validation and external validation using a 258 compound case study dataset. Combining R and H predictions and scoring a positive or negative result when both models are concordant on a prediction, increased accuracy to 72% and specificity to 79% on the external test set. The carcinogenic fragments present in the two models were compared and analyzed from the point of view of chemical class. The results of this study show that the developed rule sets will be a useful tool to identify some new structural alerts of carcinogenicity and provide effective information on the molecular structures of carcinogenic chemicals.
Development of structural health monitoring techniques using dynamics testing
DOE Office of Scientific and Technical Information (OSTI.GOV)
James, G.H. III
Today`s society depends upon many structures (such as aircraft, bridges, wind turbines, offshore platforms, buildings, and nuclear weapons) which are nearing the end of their design lifetime. Since these structures cannot be economically replaced, techniques for structural health monitoring must be developed and implemented. Modal and structural dynamics measurements hold promise for the global non-destructive inspection of a variety of structures since surface measurements of a vibrating structure can provide information about the health of the internal members without costly (or impossible) dismantling of the structure. In order to develop structural health monitoring for application to operational structures, developments inmore » four areas have been undertaken within this project: operational evaluation, diagnostic measurements, information condensation, and damage identification. The developments in each of these four aspects of structural health monitoring have been exercised on a broad range of experimental data. This experimental data has been extracted from structures from several application areas which include aging aircraft, wind energy, aging bridges, offshore structures, structural supports, and mechanical parts. As a result of these advances, Sandia National Laboratories is in a position to perform further advanced development, operational implementation, and technical consulting for a broad class of the nation`s aging infrastructure problems.« less
NASA Astrophysics Data System (ADS)
Ceder, Gerbrand
2007-03-01
The prediction of structure is a key problem in computational materials science that forms the platform on which rational materials design can be performed. Finding structure by traditional optimization methods on quantum mechanical energy models is not possible due to the complexity and high dimensionality of the coordinate space. An unusual, but efficient solution to this problem can be obtained by merging ideas from heuristic and ab initio methods: In the same way that scientist build empirical rules by observation of experimental trends, we have developed machine learning approaches that extract knowledge from a large set of experimental information and a database of over 15,000 first principles computations, and used these to rapidly direct accurate quantum mechanical techniques to the lowest energy crystal structure of a material. Knowledge is captured in a Bayesian probability network that relates the probability to find a particular crystal structure at a given composition to structure and energy information at other compositions. We show that this approach is highly efficient in finding the ground states of binary metallic alloys and can be easily generalized to more complex systems.
Price, Charles A.; Symonova, Olga; Mileyko, Yuriy; Hilley, Troy; Weitz, Joshua S.
2011-01-01
Interest in the structure and function of physical biological networks has spurred the development of a number of theoretical models that predict optimal network structures across a broad array of taxonomic groups, from mammals to plants. In many cases, direct tests of predicted network structure are impossible given the lack of suitable empirical methods to quantify physical network geometry with sufficient scope and resolution. There is a long history of empirical methods to quantify the network structure of plants, from roots, to xylem networks in shoots and within leaves. However, with few exceptions, current methods emphasize the analysis of portions of, rather than entire networks. Here, we introduce the Leaf Extraction and Analysis Framework Graphical User Interface (LEAF GUI), a user-assisted software tool that facilitates improved empirical understanding of leaf network structure. LEAF GUI takes images of leaves where veins have been enhanced relative to the background, and following a series of interactive thresholding and cleaning steps, returns a suite of statistics and information on the structure of leaf venation networks and areoles. Metrics include the dimensions, position, and connectivity of all network veins, and the dimensions, shape, and position of the areoles they surround. Available for free download, the LEAF GUI software promises to facilitate improved understanding of the adaptive and ecological significance of leaf vein network structure. PMID:21057114
Price, Charles A; Symonova, Olga; Mileyko, Yuriy; Hilley, Troy; Weitz, Joshua S
2011-01-01
Interest in the structure and function of physical biological networks has spurred the development of a number of theoretical models that predict optimal network structures across a broad array of taxonomic groups, from mammals to plants. In many cases, direct tests of predicted network structure are impossible given the lack of suitable empirical methods to quantify physical network geometry with sufficient scope and resolution. There is a long history of empirical methods to quantify the network structure of plants, from roots, to xylem networks in shoots and within leaves. However, with few exceptions, current methods emphasize the analysis of portions of, rather than entire networks. Here, we introduce the Leaf Extraction and Analysis Framework Graphical User Interface (LEAF GUI), a user-assisted software tool that facilitates improved empirical understanding of leaf network structure. LEAF GUI takes images of leaves where veins have been enhanced relative to the background, and following a series of interactive thresholding and cleaning steps, returns a suite of statistics and information on the structure of leaf venation networks and areoles. Metrics include the dimensions, position, and connectivity of all network veins, and the dimensions, shape, and position of the areoles they surround. Available for free download, the LEAF GUI software promises to facilitate improved understanding of the adaptive and ecological significance of leaf vein network structure.
Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis
2017-09-01
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhu, Hongchun; Zhao, Yipeng; Liu, Haiying
2018-04-01
Scale is the basic attribute for expressing and describing spatial entity and phenomena. It offers theoretical significance in the study of gully structure information, variable characteristics of watershed morphology, and development evolution at different scales. This research selected five different areas in China's Loess Plateau as the experimental region and used DEM data at different scales as the experimental data. First, the change rule of the characteristic parameters of the data at different scales was analyzed. The watershed structure information did not change along with a change in the data scale. This condition was proven by selecting indices of gully bifurcation ratio and fractal dimension as characteristic parameters of watershed structure information. Then, the change rule of the characteristic parameters of gully structure with different analysis scales was analyzed by setting the scale sequence of analysis at the extraction gully. The gully structure of the watershed changed with variations in the analysis scale, and the change rule was obvious when the gully level changed. Finally, the change rule of the characteristic parameters of the gully structure at different areas was analyzed. The gully fractal dimension showed a significant numerical difference in different areas, whereas the variation of the gully branch ratio was small. The change rule indicated that the development degree of the gully obviously varied in different regions, but the morphological structure was basically similar.
NASA Astrophysics Data System (ADS)
Zhu, Hongchun; Zhao, Yipeng; Liu, Haiying
2018-06-01
Scale is the basic attribute for expressing and describing spatial entity and phenomena. It offers theoretical significance in the study of gully structure information, variable characteristics of watershed morphology, and development evolution at different scales. This research selected five different areas in China's Loess Plateau as the experimental region and used DEM data at different scales as the experimental data. First, the change rule of the characteristic parameters of the data at different scales was analyzed. The watershed structure information did not change along with a change in the data scale. This condition was proven by selecting indices of gully bifurcation ratio and fractal dimension as characteristic parameters of watershed structure information. Then, the change rule of the characteristic parameters of gully structure with different analysis scales was analyzed by setting the scale sequence of analysis at the extraction gully. The gully structure of the watershed changed with variations in the analysis scale, and the change rule was obvious when the gully level changed. Finally, the change rule of the characteristic parameters of the gully structure at different areas was analyzed. The gully fractal dimension showed a significant numerical difference in different areas, whereas the variation of the gully branch ratio was small. The change rule indicated that the development degree of the gully obviously varied in different regions, but the morphological structure was basically similar.
NASA Technical Reports Server (NTRS)
Liu, J. T. C.
1986-01-01
Advances in the mechanics of boundary layer flow are reported. The physical problems of large scale coherent structures in real, developing free turbulent shear flows, from the nonlinear aspects of hydrodynamic stability are addressed. The presence of fine grained turbulence in the problem, and its absence, lacks a small parameter. The problem is presented on the basis of conservation principles, which are the dynamics of the problem directed towards extracting the most physical information, however, it is emphasized that it must also involve approximations.
Method for accurate growth of vertical-cavity surface-emitting lasers
Chalmers, Scott A.; Killeen, Kevin P.; Lear, Kevin L.
1995-01-01
We report a method for accurate growth of vertical-cavity surface-emitting lasers (VCSELs). The method uses a single reflectivity spectrum measurement to determine the structure of the partially completed VCSEL at a critical point of growth. This information, along with the extracted growth rates, allows imprecisions in growth parameters to be compensated for during growth of the remaining structure, which can then be completed with very accurate critical dimensions. Using this method, we can now routinely grow lasing VCSELs with Fabry-Perot cavity resonance wavelengths controlled to within 0.5%.
On the use of ANN interconnection weights in optimal structural design
NASA Technical Reports Server (NTRS)
Hajela, P.; Szewczyk, Z.
1992-01-01
The present paper describes the use of interconnection weights of a multilayer, feedforward network, to extract information pertinent to the mapping space that the network is assumed to represent. In particular, these weights can be used to determine an appropriate network architecture, and an adequate number of training patterns (input-output pairs) have been used for network training. The weight analysis also provides an approach to assess the influence of each input parameter on a selected output component. The paper shows the significance of this information in decomposition driven optimal design.
Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.
2010-01-01
Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations. PMID:21031012
Misra, Dharitri; Chen, Siyuan; Thoma, George R.
2010-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
New approaches to health promotion and informatics education using Internet in the Czech Republic.
Zvárová, J
2005-01-01
The paper describes nowadays information technology skills in the Czech Republic. It focuses on informatics education using Internet, ECDL concept and the links between computer literacy among health care professionals and quality of health care. Everyone understands that the main source of wealth of any nation is information management and the efficient transformation of information into knowledge. There appear completely new decisive factors for the economics of the near future based on circulation and exchange information. It is clear that modern health care cannot be built without information and communication technologies. We discuss several approaches how to contribute to some topics of information society in health care, namely the role of electronic health record, structured information, extraction of information from free medical texts and sharing knowledge stored in medical guidelines.
Sweet neutron crystallography.
Teixeira, S C M; Blakeley, M P; Leal, R M F; Gillespie, S M; Mitchell, E P; Forsyth, V T
2010-11-01
Extremely sweet proteins isolated from tropical fruit extracts are promising healthy alternatives to sugar and synthetic sweeteners. Sweetness and taste in general are, however, still poorly understood. The engineering of stable sweet proteins with tailored properties is made difficult by the lack of supporting high-resolution structural data. Experimental information on charge distribution, protonation states and solvent structure are vital for an understanding of the mechanism through which sweet proteins interact with taste receptors. Neutron studies of the crystal structures of sweet proteins allow a detailed study of these biophysical properties, as illustrated by a neutron study on the native protein thaumatin in which deuterium labelling was used to improve data quality.
Correlation filtering in financial time series (Invited Paper)
NASA Astrophysics Data System (ADS)
Aste, T.; Di Matteo, Tiziana; Tumminello, M.; Mantegna, R. N.
2005-05-01
We apply a method to filter relevant information from the correlation coefficient matrix by extracting a network of relevant interactions. This method succeeds to generate networks with the same hierarchical structure of the Minimum Spanning Tree but containing a larger amount of links resulting in a richer network topology allowing loops and cliques. In Tumminello et al.,1 we have shown that this method, applied to a financial portfolio of 100 stocks in the USA equity markets, is pretty efficient in filtering relevant information about the clustering of the system and its hierarchical structure both on the whole system and within each cluster. In particular, we have found that triangular loops and 4 element cliques have important and significant relations with the market structure and properties. Here we apply this filtering procedure to the analysis of correlation in two different kind of interest rate time series (16 Eurodollars and 34 US interest rates).
SEGMENTATION OF MITOCHONDRIA IN ELECTRON MICROSCOPY IMAGES USING ALGEBRAIC CURVES.
Seyedhosseini, Mojtaba; Ellisman, Mark H; Tasdizen, Tolga
2013-01-01
High-resolution microscopy techniques have been used to generate large volumes of data with enough details for understanding the complex structure of the nervous system. However, automatic techniques are required to segment cells and intracellular structures in these multi-terabyte datasets and make anatomical analysis possible on a large scale. We propose a fully automated method that exploits both shape information and regional statistics to segment irregularly shaped intracellular structures such as mitochondria in electron microscopy (EM) images. The main idea is to use algebraic curves to extract shape features together with texture features from image patches. Then, these powerful features are used to learn a random forest classifier, which can predict mitochondria locations precisely. Finally, the algebraic curves together with regional information are used to segment the mitochondria at the predicted locations. We demonstrate that our method outperforms the state-of-the-art algorithms in segmentation of mitochondria in EM images.
Characterizing the Fundamental Intellectual Steps Required in the Solution of Conceptual Problems
NASA Astrophysics Data System (ADS)
Stewart, John
2010-02-01
At some level, the performance of a science class must depend on what is taught, the information content of the materials and assignments of the course. The introductory calculus-based electricity and magnetism class at the University of Arkansas is examined using a catalog of the basic reasoning steps involved in the solution of problems assigned in the class. This catalog was developed by sampling popular physics textbooks for conceptual problems. The solution to each conceptual problem was decomposed into its fundamental reasoning steps. These fundamental steps are, then, used to quantify the distribution of conceptual content within the course. Using this characterization technique, an exceptionally detailed picture of the information flow and structure of the class can be produced. The intellectual structure of published conceptual inventories is compared with the information presented in the class and the dependence of conceptual performance on the details of coverage extracted. )
The CoFactor database: organic cofactors in enzyme catalysis.
Fischer, Julia D; Holliday, Gemma L; Thornton, Janet M
2010-10-01
Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. CoFactor provides a web interface to access hand-curated data extracted from the literature on organic enzyme cofactors in biocatalysis, as well as automatically collected information. CoFactor includes information on the conformational and solvent accessibility variation of the enzyme-bound cofactors, as well as mechanistic and structural information about the hosting enzymes. The database is publicly available and can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/CoFactor.
NASA Astrophysics Data System (ADS)
Guo, H., II
2016-12-01
Spatial distribution information of mountainous area settlement place is of great significance to the earthquake emergency work because most of the key earthquake hazardous areas of china are located in the mountainous area. Remote sensing has the advantages of large coverage and low cost, it is an important way to obtain the spatial distribution information of mountainous area settlement place. At present, fully considering the geometric information, spectral information and texture information, most studies have applied object-oriented methods to extract settlement place information, In this article, semantic constraints is to be added on the basis of object-oriented methods. The experimental data is one scene remote sensing image of domestic high resolution satellite (simply as GF-1), with a resolution of 2 meters. The main processing consists of 3 steps, the first is pretreatment, including ortho rectification and image fusion, the second is Object oriented information extraction, including Image segmentation and information extraction, the last step is removing the error elements under semantic constraints, in order to formulate these semantic constraints, the distribution characteristics of mountainous area settlement place must be analyzed and the spatial logic relation between settlement place and other objects must be considered. The extraction accuracy calculation result shows that the extraction accuracy of object oriented method is 49% and rise up to 86% after the use of semantic constraints. As can be seen from the extraction accuracy, the extract method under semantic constraints can effectively improve the accuracy of mountainous area settlement place information extraction. The result shows that it is feasible to extract mountainous area settlement place information form GF-1 image, so the article proves that it has a certain practicality to use domestic high resolution optical remote sensing image in earthquake emergency preparedness.
Multi-object segmentation framework using deformable models for medical imaging analysis.
Namías, Rafael; D'Amato, Juan Pablo; Del Fresno, Mariana; Vénere, Marcelo; Pirró, Nicola; Bellemare, Marc-Emmanuel
2016-08-01
Segmenting structures of interest in medical images is an important step in different tasks such as visualization, quantitative analysis, simulation, and image-guided surgery, among several other clinical applications. Numerous segmentation methods have been developed in the past three decades for extraction of anatomical or functional structures on medical imaging. Deformable models, which include the active contour models or snakes, are among the most popular methods for image segmentation combining several desirable features such as inherent connectivity and smoothness. Even though different approaches have been proposed and significant work has been dedicated to the improvement of such algorithms, there are still challenging research directions as the simultaneous extraction of multiple objects and the integration of individual techniques. This paper presents a novel open-source framework called deformable model array (DMA) for the segmentation of multiple and complex structures of interest in different imaging modalities. While most active contour algorithms can extract one region at a time, DMA allows integrating several deformable models to deal with multiple segmentation scenarios. Moreover, it is possible to consider any existing explicit deformable model formulation and even to incorporate new active contour methods, allowing to select a suitable combination in different conditions. The framework also introduces a control module that coordinates the cooperative evolution of the snakes and is able to solve interaction issues toward the segmentation goal. Thus, DMA can implement complex object and multi-object segmentations in both 2D and 3D using the contextual information derived from the model interaction. These are important features for several medical image analysis tasks in which different but related objects need to be simultaneously extracted. Experimental results on both computed tomography and magnetic resonance imaging show that the proposed framework has a wide range of applications especially in the presence of adjacent structures of interest or under intra-structure inhomogeneities giving excellent quantitative results.
Ali, Anjum A; Dale, Anders M; Badea, Alexandra; Johnson, G Allan
2005-08-15
We present the automated segmentation of magnetic resonance microscopy (MRM) images of the C57BL/6J mouse brain into 21 neuroanatomical structures, including the ventricular system, corpus callosum, hippocampus, caudate putamen, inferior colliculus, internal capsule, globus pallidus, and substantia nigra. The segmentation algorithm operates on multispectral, three-dimensional (3D) MR data acquired at 90-microm isotropic resolution. Probabilistic information used in the segmentation is extracted from training datasets of T2-weighted, proton density-weighted, and diffusion-weighted acquisitions. Spatial information is employed in the form of prior probabilities of occurrence of a structure at a location (location priors) and the pairwise probabilities between structures (contextual priors). Validation using standard morphometry indices shows good consistency between automatically segmented and manually traced data. Results achieved in the mouse brain are comparable with those achieved in human brain studies using similar techniques. The segmentation algorithm shows excellent potential for routine morphological phenotyping of mouse models.
Protein secondary structure determination by constrained single-particle cryo-electron tomography.
Bartesaghi, Alberto; Lecumberry, Federico; Sapiro, Guillermo; Subramaniam, Sriram
2012-12-05
Cryo-electron microscopy (cryo-EM) is a powerful technique for 3D structure determination of protein complexes by averaging information from individual molecular images. The resolutions that can be achieved with single-particle cryo-EM are frequently limited by inaccuracies in assigning molecular orientations based solely on 2D projection images. Tomographic data collection schemes, however, provide powerful constraints that can be used to more accurately determine molecular orientations necessary for 3D reconstruction. Here, we propose "constrained single-particle tomography" as a general strategy for 3D structure determination in cryo-EM. A key component of our approach is the effective use of images recorded in tilt series to extract high-resolution information and correct for the contrast transfer function. By incorporating geometric constraints into the refinement to improve orientational accuracy of images, we reduce model bias and overrefinement artifacts and demonstrate that protein structures can be determined at resolutions of ∼8 Å starting from low-dose tomographic tilt series. Copyright © 2012 Elsevier Ltd. All rights reserved.
Autism, Context/Noncontext Information Processing, and Atypical Development
Skoyles, John R.
2011-01-01
Autism has been attributed to a deficit in contextual information processing. Attempts to understand autism in terms of such a defect, however, do not include more recent computational work upon context. This work has identified that context information processing depends upon the extraction and use of the information hidden in higher-order (or indirect) associations. Higher-order associations underlie the cognition of context rather than that of situations. This paper starts by examining the differences between higher-order and first-order (or direct) associations. Higher-order associations link entities not directly (as with first-order ones) but indirectly through all the connections they have via other entities. Extracting this information requires the processing of past episodes as a totality. As a result, this extraction depends upon specialised extraction processes separate from cognition. This information is then consolidated. Due to this difference, the extraction/consolidation of higher-order information can be impaired whilst cognition remains intact. Although not directly impaired, cognition will be indirectly impaired by knock on effects such as cognition compensating for absent higher-order information with information extracted from first-order associations. This paper discusses the implications of this for the inflexible, literal/immediate, and inappropriate information processing of autistic individuals. PMID:22937255
DOE Office of Scientific and Technical Information (OSTI.GOV)
Veličković, Dušan; Chu, Rosalie K.; Carrell, Alyssa A.
One critical aspect of mass spectrometry imaging (MSI) is the need to confidently identify detected analytes. While orthogonal tandem MS (e.g., LC-MS 2) experiments from sample extracts can assist in annotating ions, the spatial information about these molecules is lost. Accordingly, this could cause mislead conclusions, especially in cases where isobaric species exhibit different distributions within a sample. In this Technical Note, we employed a multimodal imaging approach, using matrix assisted laser desorption/ionization (MALDI)-MSI and liquid extraction surface analysis (LESA)-MS 2I, to confidently annotate and One critical aspect of mass spectrometry imaging (MSI) is the need to confidently identify detectedmore » analytes. While orthogonal tandem MS (e.g., LC-MS2) experiments from sample extracts can assist in annotating ions, the spatial information about these molecules is lost. Accordingly, this could cause mislead conclusions, especially in cases where isobaric species exhibit different distributions within a sample. In this Technical Note, we employed a multimodal imaging approach, using matrix assisted laser desorption/ionization (MALDI)-MSI and liquid extraction surface analysis (LESA)-MS 2I, to confidently annotate and localize a broad range of metabolites involved in a tripartite symbiosis system of moss, cyanobacteria, and fungus. We found that the combination of these two imaging modalities generated very congruent ion images, providing the link between highly accurate structural information onfered by LESA and high spatial resolution attainable by MALDI. These results demonstrate how this combined methodology could be very useful in differentiating metabolite routes in complex systems.« less
Extracting Rayleigh wave dispersion from ambient noise across the Indian Ocean
NASA Astrophysics Data System (ADS)
Ma, Z.; Dalton, C. A.
2016-12-01
Rayleigh wave dispersion extracted from ambient seismic noise has been widely used to image crustal and uppermost mantle structure. Applications of this approach in continental settings are abundant, but there have been relatively few studies within ocean basins. In this presentation, we will first demonstrate the feasibility of extracting high quality Rayleigh wave dispersion information from ambient noise across the entire Indian Ocean basin. Phase arrival times measured from ambient noise are largely consistent with the ones predicted from 2-D phase velocity maps that were determined from earthquake data alone. Secondly, we show that adding dispersion information extracted from ambient noise to existing earthquake data can indeed improve the resolution of phase velocity maps by about 20% in the western Indian Ocean region where the station distribution is the densest. High quality Rayleigh wave dispersion information can be obtained from stacking waveforms over less than two years at land stations and less than four years at island stations. After removing the age dependent average velocities, the 2-D phase velocity maps show slow anomalies associated with the Seychelles-Mascarene plateau. Forward modeling suggests that the crust is about 15-25 km thick in this area, which agrees with previous estimates obtained from gravity data. We also observe that the slow anomaly related to the Central Indian Ridge is asymmetric. The center of this slow anomaly lies to the west side of ridge, which is opposite to the ridge migration direction. This asymmetry probably reflects the interactions between the ridge and nearby hotspots.
Information processing for aerospace structural health monitoring
NASA Astrophysics Data System (ADS)
Lichtenwalner, Peter F.; White, Edward V.; Baumann, Erwin W.
1998-06-01
Structural health monitoring (SHM) technology provides a means to significantly reduce life cycle of aerospace vehicles by eliminating unnecessary inspections, minimizing inspection complexity, and providing accurate diagnostics and prognostics to support vehicle life extension. In order to accomplish this, a comprehensive SHM system will need to acquire data from a wide variety of diverse sensors including strain gages, accelerometers, acoustic emission sensors, crack growth gages, corrosion sensors, and piezoelectric transducers. Significant amounts of computer processing will then be required to convert this raw sensor data into meaningful information which indicates both the diagnostics of the current structural integrity as well as the prognostics necessary for planning and managing the future health of the structure in a cost effective manner. This paper provides a description of the key types of information processing technologies required in an effective SHM system. These include artificial intelligence techniques such as neural networks, expert systems, and fuzzy logic for nonlinear modeling, pattern recognition, and complex decision making; signal processing techniques such as Fourier and wavelet transforms for spectral analysis and feature extraction; statistical algorithms for optimal detection, estimation, prediction, and fusion; and a wide variety of other algorithms for data analysis and visualization. The intent of this paper is to provide an overview of the role of information processing for SHM, discuss various technologies which can contribute to accomplishing this role, and present some example applications of information processing for SHM implemented at the Boeing Company.
Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play
ERIC Educational Resources Information Center
North, Jamie S.; Williams, A. Mark
2008-01-01
The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…
Can we replace curation with information extraction software?
Karp, Peter D
2016-01-01
Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.
Automatic definition of the oncologic EHR data elements from NCIT in OWL.
Cuggia, Marc; Bourdé, Annabel; Turlin, Bruno; Vincendeau, Sebastien; Bertaud, Valerie; Bohec, Catherine; Duvauferrier, Régis
2011-01-01
Semantic interoperability based on ontologies allows systems to combine their information and process them automatically. The ability to extract meaningful fragments from ontology is a key for the ontology re-use and the construction of a subset will help to structure clinical data entries. The aim of this work is to provide a method for extracting a set of concepts for a specific domain, in order to help to define data elements of an oncologic EHR. a generic extraction algorithm was developed to extract, from the NCIT and for a specific disease (i.e. prostate neoplasm), all the concepts of interest into a sub-ontology. We compared all the concepts extracted to the concepts encoded manually contained into the multi-disciplinary meeting report form (MDMRF). We extracted two sub-ontologies: sub-ontology 1 by using a single key concept and sub-ontology 2 by using 5 additional keywords. The coverage of sub-ontology 2 to the MDMRF concepts was 51%. The low rate of coverage is due to the lack of definition or mis-classification of the NCIT concepts. By providing a subset of concepts focused on a particular domain, this extraction method helps at optimizing the binding process of data elements and at maintaining and enriching a domain ontology.
A smart way to identify and extract repeated patterns of a layout
NASA Astrophysics Data System (ADS)
Wei, Fang; Gu, Tingting; Chu, Zhihao; Zhang, Chenming; Chen, Han; Zhu, Jun; Hu, Xinyi; Du, Chunshan; Wan, Qijian; Liu, Zhengfang
2018-03-01
As integrated circuits (IC) technology moves forward, manufacturing process is facing more and more challenges. Optical proximity correction (OPC) has been playing an important role in the whole manufacturing process. In the deep sub-micron technology, OPC engineers not only need to guarantee the layout designs to be manufacturable but also take a more precise control of the critical patterns to ensure a high performance circuit. One of the tasks that would like to be performed is the consistency checking as the identical patterns under identical context should have identical OPC results in theory, like SRAM regions. Consistency checking is essentially a technique of repeated patterns identification, extraction and derived patterns (i.e. OPC results) comparison. The layout passing to the OPC team may not have enough design hierarchical information either because the original designs may have undergone several layout processing steps or some other unknown reasons. This paper presents a generic way to identify and extract repeated layout structures in SRAM regions purely based on layout pattern analysis through Calibre Pattern Matching and Calibre equation-based DRC (eqDRC). Without Pattern Matching and eqDRC, it will take lots of effort to manually get it done by trial and error, it is almost impossible to automate the pattern analysis process. Combining Pattern Matching and eqDRC opens a new way to implement this flow. The repeated patterns must have some fundamental features for measurement of pitches in the horizontal and vertical direction separately by Calibre eqDRC and meanwhile can be a helper to generate some anchor points which will be the starting points for Pattern Matching to capture patterns. The informative statistical report from the pattern search tells the match counts individually for each patterns captured. Experiment shows that this is a smart way of identifying and extracting repeated structures effectively. The OPC results are the derived layers on these repeated structures, by running pattern search using design layers as pattern layers and OPC results as marker layers, it is an easy job to compare the consistency.
Single particle maximum likelihood reconstruction from superresolution microscopy images
Verdier, Timothée; Gunzenhauser, Julia; Manley, Suliana; Castelnovo, Martin
2017-01-01
Point localization superresolution microscopy enables fluorescently tagged molecules to be imaged beyond the optical diffraction limit, reaching single molecule localization precisions down to a few nanometers. For small objects whose sizes are few times this precision, localization uncertainty prevents the straightforward extraction of a structural model from the reconstructed images. We demonstrate in the present work that this limitation can be overcome at the single particle level, requiring no particle averaging, by using a maximum likelihood reconstruction (MLR) method perfectly suited to the stochastic nature of such superresolution imaging. We validate this method by extracting structural information from both simulated and experimental PALM data of immature virus-like particles of the Human Immunodeficiency Virus (HIV-1). MLR allows us to measure the radii of individual viruses with precision of a few nanometers and confirms the incomplete closure of the viral protein lattice. The quantitative results of our analysis are consistent with previous cryoelectron microscopy characterizations. Our study establishes the framework for a method that can be broadly applied to PALM data to determine the structural parameters for an existing structural model, and is particularly well suited to heterogeneous features due to its single particle implementation. PMID:28253349
Network structure of multivariate time series.
Lacasa, Lucas; Nicosia, Vincenzo; Latora, Vito
2015-10-21
Our understanding of a variety of phenomena in physics, biology and economics crucially depends on the analysis of multivariate time series. While a wide range tools and techniques for time series analysis already exist, the increasing availability of massive data structures calls for new approaches for multidimensional signal processing. We present here a non-parametric method to analyse multivariate time series, based on the mapping of a multidimensional time series into a multilayer network, which allows to extract information on a high dimensional dynamical system through the analysis of the structure of the associated multiplex network. The method is simple to implement, general, scalable, does not require ad hoc phase space partitioning, and is thus suitable for the analysis of large, heterogeneous and non-stationary time series. We show that simple structural descriptors of the associated multiplex networks allow to extract and quantify nontrivial properties of coupled chaotic maps, including the transition between different dynamical phases and the onset of various types of synchronization. As a concrete example we then study financial time series, showing that a multiplex network analysis can efficiently discriminate crises from periods of financial stability, where standard methods based on time-series symbolization often fail.
Language translation, doman specific languages and ANTLR
NASA Technical Reports Server (NTRS)
Craymer, Loring; Parr, Terence
2002-01-01
We will discuss the features of ANTLR that make it an attractive tool for rapid developement of domain specific language translators and present some practical examples of its use: extraction of information from the Cassini Command Language specification, the processing of structured binary data, and IVL--an English-like language for generating VRML scene graph, which is used in configuring the jGuru.com server.
Reducing full one-loop amplitudes to scalar integrals at the integrand level
NASA Astrophysics Data System (ADS)
Ossola, Giovanni; Papadopoulos, Costas G.; Pittau, Roberto
2007-02-01
We show how to extract the coefficients of the 4-, 3-, 2- and 1-point one-loop scalar integrals from the full one-loop amplitude of arbitrary scattering processes. In a similar fashion, also the rational terms can be derived. Basically no information on the analytical structure of the amplitude is required, making our method appealing for an efficient numerical implementation.
1976-03-01
This report summarizes the results of the research program on Image Analysis and Modeling supported by the Defense Advanced Research Projects Agency...The objective is to achieve a better understanding of image structure and to use this knowledge to develop improved image models for use in image ... analysis and processing tasks such as information extraction, image enhancement and restoration, and coding. The ultimate objective of this research is
Computer-based synthetic data to assess the tree delineation algorithm from airborne LiDAR survey
Lei Wang; Andrew G. Birt; Charles W. Lafon; David M. Cairns; Robert N. Coulson; Maria D. Tchakerian; Weimin Xi; Sorin C. Popescu; James M. Guldin
2013-01-01
Small Footprint LiDAR (Light Detection And Ranging) has been proposed as an effective tool for measuring detailed biophysical characteristics of forests over broad spatial scales. However, by itself LiDAR yields only a sample of the true 3D structure of a forest. In order to extract useful forestry relevant information, this data must be interpreted using mathematical...
Advances in Spectral-Spatial Classification of Hyperspectral Images
NASA Technical Reports Server (NTRS)
Fauvel, Mathieu; Tarabalka, Yuliya; Benediktsson, Jon Atli; Chanussot, Jocelyn; Tilton, James C.
2012-01-01
Recent advances in spectral-spatial classification of hyperspectral images are presented in this paper. Several techniques are investigated for combining both spatial and spectral information. Spatial information is extracted at the object (set of pixels) level rather than at the conventional pixel level. Mathematical morphology is first used to derive the morphological profile of the image, which includes characteristics about the size, orientation and contrast of the spatial structures present in the image. Then the morphological neighborhood is defined and used to derive additional features for classification. Classification is performed with support vector machines using the available spectral information and the extracted spatial information. Spatial post-processing is next investigated to build more homogeneous and spatially consistent thematic maps. To that end, three presegmentation techniques are applied to define regions that are used to regularize the preliminary pixel-wise thematic map. Finally, a multiple classifier system is defined to produce relevant markers that are exploited to segment the hyperspectral image with the minimum spanning forest algorithm. Experimental results conducted on three real hyperspectral images with different spatial and spectral resolutions and corresponding to various contexts are presented. They highlight the importance of spectral-spatial strategies for the accurate classification of hyperspectral images and validate the proposed methods.
Quantitative pathology in virtual microscopy: history, applications, perspectives.
Kayser, Gian; Kayser, Klaus
2013-07-01
With the emerging success of commercially available personal computers and the rapid progress in the development of information technologies, morphometric analyses of static histological images have been introduced to improve our understanding of the biology of diseases such as cancer. First applications have been quantifications of immunohistochemical expression patterns. In addition to object counting and feature extraction, laws of thermodynamics have been applied in morphometric calculations termed syntactic structure analysis. Here, one has to consider that the information of an image can be calculated for separate hierarchical layers such as single pixels, cluster of pixels, segmented small objects, clusters of small objects, objects of higher order composed of several small objects. Using syntactic structure analysis in histological images, functional states can be extracted and efficiency of labor in tissues can be quantified. Image standardization procedures, such as shading correction and color normalization, can overcome artifacts blurring clear thresholds. Morphometric techniques are not only useful to learn more about biological features of growth patterns, they can also be helpful in routine diagnostic pathology. In such cases, entropy calculations are applied in analogy to theoretical considerations concerning information content. Thus, regions with high information content can automatically be highlighted. Analysis of the "regions of high diagnostic value" can deliver in the context of clinical information, site of involvement and patient data (e.g. age, sex), support in histopathological differential diagnoses. It can be expected that quantitative virtual microscopy will open new possibilities for automated histological support. Automated integrated quantification of histological slides also serves for quality assurance. The development and theoretical background of morphometric analyses in histopathology are reviewed, as well as their application and potential future implementation in virtual microscopy. Copyright © 2012 Elsevier GmbH. All rights reserved.
Significance of structural changes in proteins: expected errors in refined protein structures.
Stroud, R. M.; Fauman, E. B.
1995-01-01
A quantitative expression key to evaluating significant structural differences or induced shifts between any two protein structures is derived. Because crystallography leads to reports of a single (or sometimes dual) position for each atom, the significance of any structural change based on comparison of two structures depends critically on knowing the expected precision of each median atomic position reported, and on extracting it for each atom, from the information provided in the Protein Data Bank and in the publication. The differences between structures of protein molecules that should be identical, and that are normally distributed, indicating that they are not affected by crystal contacts, were analyzed with respect to many potential indicators of structure precision, so as to extract, essentially by "machine learning" principles, a generally applicable expression involving the highest correlates. Eighteen refined crystal structures from the Protein Data Bank, in which there are multiple molecules in the crystallographic asymmetric unit, were selected and compared. The thermal B factor, the connectivity of the atom, and the ratio of the number of reflections to the number of atoms used in refinement correlate best with the magnitude of the positional differences between regions of the structures that otherwise would be expected to be the same. These results are embodied in a six-parameter equation that can be applied to any crystallographically refined structure to estimate the expected uncertainty in position of each atom. Structure change in a macromolecule can thus be referenced to the expected uncertainty in atomic position as reflected in the variance between otherwise identical structures with the observed values of correlated parameters. PMID:8563637
NASA Astrophysics Data System (ADS)
Knapmeyer-Endrun, Brigitte; Golombek, Matthew P.; Ohrnberger, Matthias
2017-10-01
The SEIS (Seismic Experiment for Interior Structure) instrument onboard the InSight mission will be the first seismometer directly deployed on the surface of Mars. From studies on the Earth and the Moon, it is well known that site amplification in low-velocity sediments on top of more competent rocks has a strong influence on seismic signals, but can also be used to constrain the subsurface structure. Here we simulate ambient vibration wavefields in a model of the shallow sub-surface at the InSight landing site in Elysium Planitia and demonstrate how the high-frequency Rayleigh wave ellipticity can be extracted from these data and inverted for shallow structure. We find that, depending on model parameters, higher mode ellipticity information can be extracted from single-station data, which significantly reduces uncertainties in inversion. Though the data are most sensitive to properties of the upper-most layer and show a strong trade-off between layer depth and velocity, it is possible to estimate the velocity and thickness of the sub-regolith layer by using reasonable constraints on regolith properties. Model parameters are best constrained if either higher mode data can be used or additional constraints on regolith properties from seismic analysis of the hammer strokes of InSight's heat flow probe HP3 are available. In addition, the Rayleigh wave ellipticity can distinguish between models with a constant regolith velocity and models with a velocity increase in the regolith, information which is difficult to obtain otherwise.
Guardado Yordi, E; Matos, M J; Pérez Martínez, A; Tornes, A C; Santana, L; Molina, E; Uriarte, E
2017-08-01
Coumarins are a group of phytochemicals that may be beneficial or harmful to health depending on their type and dosage and the matrix that contains them. Some of these compounds have been proven to display pro-oxidant and clastogenic activities. Therefore, in the current work, we have studied the coumarins that are present in food sources extracted from the Phenol-Explorer database in order to predict their clastogenic activity and identify the structure-activity relationships and genotoxic structural alerts using alternative methods in the field of computational toxicology. It was necessary to compile information on the type and amount of coumarins in different food sources through the analysis of databases of food composition available online. A virtual screening using a clastogenic model and different software, such as MODESLAB, ChemDraw and STATISTIC, was performed. As a result, a table of food composition was prepared and qualitative information from this data was extracted. The virtual screening showed that the esterified substituents inactivate molecules, while the methoxyl and hydroxyl substituents contribute to their activity and constitute, together with the basic structures of the studied subclasses, clastogenic structural alerts. Chemical subclasses of simple coumarins and furocoumarins were classified as active (xanthotoxin, isopimpinellin, esculin, scopoletin, scopolin and bergapten). In silico genotoxicity was mainly predicted for coumarins found in beer, sherry, dried parsley, fresh parsley and raw celery stalks. The results obtained can be interesting for the future design of functional foods and dietary supplements. These studies constitute a reference for the genotoxic chemoinformatic analysis of bioactive compounds present in databases of food composition.
Fragment-based prediction of skin sensitization using recursive partitioning
NASA Astrophysics Data System (ADS)
Lu, Jing; Zheng, Mingyue; Wang, Yong; Shen, Qiancheng; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2011-09-01
Skin sensitization is an important toxic endpoint in the risk assessment of chemicals. In this paper, structure-activity relationships analysis was performed on the skin sensitization potential of 357 compounds with local lymph node assay data. Structural fragments were extracted by GASTON (GrAph/Sequence/Tree extractiON) from the training set. Eight fragments with accuracy significantly higher than 0.73 ( p < 0.1) were retained to make up an indicator descriptor fragment. The fragment descriptor and eight other physicochemical descriptors closely related to the endpoint were calculated to construct the recursive partitioning tree (RP tree) for classification. The balanced accuracy of the training set, test set I, and test set II in the leave-one-out model were 0.846, 0.800, and 0.809, respectively. The results highlight that fragment-based RP tree is a preferable method for identifying skin sensitizers. Moreover, the selected fragments provide useful structural information for exploring sensitization mechanisms, and RP tree creates a graphic tree to identify the most important properties associated with skin sensitization. They can provide some guidance for designing of drugs with lower sensitization level.
Four types of ensemble coding in data visualizations.
Szafir, Danielle Albers; Haroz, Steve; Gleicher, Michael; Franconeri, Steven
2016-01-01
Ensemble coding supports rapid extraction of visual statistics about distributed visual information. Researchers typically study this ability with the goal of drawing conclusions about how such coding extracts information from natural scenes. Here we argue that a second domain can serve as another strong inspiration for understanding ensemble coding: graphs, maps, and other visual presentations of data. Data visualizations allow observers to leverage their ability to perform visual ensemble statistics on distributions of spatial or featural visual information to estimate actual statistics on data. We survey the types of visual statistical tasks that occur within data visualizations across everyday examples, such as scatterplots, and more specialized images, such as weather maps or depictions of patterns in text. We divide these tasks into four categories: identification of sets of values, summarization across those values, segmentation of collections, and estimation of structure. We point to unanswered questions for each category and give examples of such cross-pollination in the current literature. Increased collaboration between the data visualization and perceptual psychology research communities can inspire new solutions to challenges in visualization while simultaneously exposing unsolved problems in perception research.
NASA Astrophysics Data System (ADS)
Wang, X.
2018-04-01
Tourism geological resources are of high value in admiration, scientific research and universal education, which need to be protected and rationally utilized. In the past, most of the remote sensing investigations of tourism geological resources used two-dimensional remote sensing interpretation method, which made it difficult for some geological heritages to be interpreted and led to the omission of some information. This aim of this paper is to assess the value of a method using the three-dimensional visual remote sensing image to extract information of geological heritages. skyline software system is applied to fuse the 0.36 m aerial images and 5m interval DEM to establish the digital earth model. Based on the three-dimensional shape, color tone, shadow, texture and other image features, the distribution of tourism geological resources in Shandong Province and the location of geological heritage sites were obtained, such as geological structure, DaiGu landform, granite landform, Volcanic landform, sandy landform, Waterscapes, etc. The results show that using this method for remote sensing interpretation is highly recognizable, making the interpretation more accurate and comprehensive.
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review
Sun, Wencheng; Li, Yangyang; Liu, Fang; Fang, Shengqun; Wang, Guoyan
2018-01-01
Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work. PMID:29849998
Surface EMG signals based motion intent recognition using multi-layer ELM
NASA Astrophysics Data System (ADS)
Wang, Jianhui; Qi, Lin; Wang, Xiao
2017-11-01
The upper-limb rehabilitation robot is regard as a useful tool to help patients with hemiplegic to do repetitive exercise. The surface electromyography (sEMG) contains motion information as the electric signals are generated and related to nerve-muscle motion. These sEMG signals, representing human's intentions of active motions, are introduced into the rehabilitation robot system to recognize upper-limb movements. Traditionally, the feature extraction is an indispensable part of drawing significant information from original signals, which is a tedious task requiring rich and related experience. This paper employs a deep learning scheme to extract the internal features of the sEMG signals using an advanced Extreme Learning Machine based auto-encoder (ELMAE). The mathematical information contained in the multi-layer structure of the ELM-AE is used as the high-level representation of the internal features of the sEMG signals, and thus a simple ELM can post-process the extracted features, formulating the entire multi-layer ELM (ML-ELM) algorithm. The method is employed for the sEMG based neural intentions recognition afterwards. The case studies show the adopted deep learning algorithm (ELM-AE) is capable of yielding higher classification accuracy compared to the Principle Component Analysis (PCA) scheme in 5 different types of upper-limb motions. This indicates the effectiveness and the learning capability of the ML-ELM in such motion intent recognition applications.
Role of core excitation in (d ,p ) transfer reactions
NASA Astrophysics Data System (ADS)
Deltuva, A.; Ross, A.; Norvaišas, E.; Nunes, F. M.
2016-10-01
Background: Recent work found that core excitations can be important in extracting structure information from (d ,p ) reactions. Purpose: Our objective is to systematically explore the role of core excitation in (d ,p ) reactions and to understand the origin of the dynamical effects. Method: Based on the particle-rotor model of n +10Be , we generate a number of models with a range of separation energies (Sn=0.1 -5.0 MeV), while maintaining a significant core excited component. We then apply the latest extension of the momentum-space-based Faddeev method, including dynamical core excitation in the reaction mechanism to all orders, to the 10Be(d ,p )11Be -like reactions, and study the excitation effects for beam energies Ed=15 -90 MeV. Results: We study the resulting angular distributions and the differences between the spectroscopic factor that would be extracted from the cross sections, when including dynamical core excitation in the reaction, and that of the original structure model. We also explore how different partial waves affect the final cross section. Conclusions: Our results show a strong beam-energy dependence of the extracted spectroscopic factors that become smaller for intermediate beam energies. This dependence increases for loosely bound systems.
Yamamoto, Kazuo; Iriyama, Yasutoshi; Hirayama, Tsukasa
2017-02-08
All-solid-state Li-ion batteries having incombustible solid electrolytes are promising energy storage devices because they have significant advantages in terms of safety, lifetime and energy density. Electrochemical reactions, namely, Li-ion insertion/extraction reactions, commonly occur around the nanometer-scale interfaces between the electrodes and solid electrolytes. Thus, transmission electron microscopy (TEM) is an appropriate technique to directly observe such reactions, providing important information for understanding the fundamental solid-state electrochemistry and improving battery performance. In this review, we introduce two types of TEM techniques for operando observations of battery reactions, spatially resolved electron energy-loss spectroscopy in a TEM mode for direct detection of the Li concentration profiles and electron holography for observing the electric potential changes due to Li-ion insertion/extraction reactions. We visually show how Li-ion insertion/extractions affect the crystal structures, electronic structures, and local electric potential during the charge-discharge processes in these batteries. © The Author 2016. Published by Oxford University Press on behalf of The Japanese Society of Microscopy. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of Mobile Mapping System for 3D Road Asset Inventory.
Sairam, Nivedita; Nagarajan, Sudhagar; Ornitz, Scott
2016-03-12
Asset Management is an important component of an infrastructure project. A significant cost is involved in maintaining and updating the asset information. Data collection is the most time-consuming task in the development of an asset management system. In order to reduce the time and cost involved in data collection, this paper proposes a low cost Mobile Mapping System using an equipped laser scanner and cameras. First, the feasibility of low cost sensors for 3D asset inventory is discussed by deriving appropriate sensor models. Then, through calibration procedures, respective alignments of the laser scanner, cameras, Inertial Measurement Unit and GPS (Global Positioning System) antenna are determined. The efficiency of this Mobile Mapping System is experimented by mounting it on a truck and golf cart. By using derived sensor models, geo-referenced images and 3D point clouds are derived. After validating the quality of the derived data, the paper provides a framework to extract road assets both automatically and manually using techniques implementing RANSAC plane fitting and edge extraction algorithms. Then the scope of such extraction techniques along with a sample GIS (Geographic Information System) database structure for unified 3D asset inventory are discussed.
Development of Mobile Mapping System for 3D Road Asset Inventory
Sairam, Nivedita; Nagarajan, Sudhagar; Ornitz, Scott
2016-01-01
Asset Management is an important component of an infrastructure project. A significant cost is involved in maintaining and updating the asset information. Data collection is the most time-consuming task in the development of an asset management system. In order to reduce the time and cost involved in data collection, this paper proposes a low cost Mobile Mapping System using an equipped laser scanner and cameras. First, the feasibility of low cost sensors for 3D asset inventory is discussed by deriving appropriate sensor models. Then, through calibration procedures, respective alignments of the laser scanner, cameras, Inertial Measurement Unit and GPS (Global Positioning System) antenna are determined. The efficiency of this Mobile Mapping System is experimented by mounting it on a truck and golf cart. By using derived sensor models, geo-referenced images and 3D point clouds are derived. After validating the quality of the derived data, the paper provides a framework to extract road assets both automatically and manually using techniques implementing RANSAC plane fitting and edge extraction algorithms. Then the scope of such extraction techniques along with a sample GIS (Geographic Information System) database structure for unified 3D asset inventory are discussed. PMID:26985897
Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports.
Krebs, Jonathan; Corovic, Hamo; Dietrich, Georg; Ertl, Max; Fette, Georg; Kaspar, Mathias; Krug, Markus; Stoerk, Stefan; Puppe, Frank
2017-01-01
Extraction of structured data from textual reports is an important subtask for building medical data warehouses for research and care. Many medical and most radiology reports are written in a telegraphic style with a concatenation of noun phrases describing the presence or absence of findings. Therefore a lexico-syntactical approach is promising, where key terms and their relations are recognized and mapped on a predefined standard terminology (ontology). We propose a two-phase algorithm for terminology matching: In the first pass, a local terminology for recognition is derived as close as possible to the terms used in the radiology reports. In the second pass, the local terminology is mapped to a standard terminology. In this paper, we report on an algorithm for the first step of semi-automatic generation of the local terminology and evaluate the algorithm with radiology reports of chest X-ray examinations from Würzburg university hospital. With an effort of about 20 hours work of a radiologist as domain expert and 10 hours for meetings, a local terminology with about 250 attributes and various value patterns was built. In an evaluation with 100 randomly chosen reports it achieved an F1-Score of about 95% for information extraction.
Distributed control using linear momentum exchange devices
NASA Technical Reports Server (NTRS)
Sharkey, J. P.; Waites, Henry; Doane, G. B., III
1987-01-01
MSFC has successfully employed the use of the Vibrational Control of Space Structures (VCOSS) Linear Momentum Exchange Devices (LMEDs), which was an outgrowth of the Air Force Wright Aeronautical Laboratory (AFWAL) program, in a distributed control experiment. The control experiment was conducted in MSFC's Ground Facility for Large Space Structures Control Verification (GF/LSSCV). The GF/LSSCV's test article was well suited for this experiment in that the LMED could be judiciously placed on the ASTROMAST. The LMED placements were such that vibrational mode information could be extracted from the accelerometers on the LMED. The LMED accelerometer information was processed by the control algorithms so that the LMED masses could be accelerated to produce forces which would dampen the vibrational modes of interest. Experimental results are presented showing the LMED's capabilities.
Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna
2017-10-01
Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.
Structure-seeking multilinear methods for the analysis of fMRI data.
Andersen, Anders H; Rayens, William S
2004-06-01
In comprehensive fMRI studies of brain function, the data structures often contain higher-order ways such as trial, task condition, subject, and group in addition to the intrinsic dimensions of time and space. While multivariate bilinear methods such as principal component analysis (PCA) have been used successfully for extracting information about spatial and temporal features in data from a single fMRI run, the need to unfold higher-order data sets into bilinear arrays has led to decompositions that are nonunique and to the loss of multiway linkages and interactions present in the data. These additional dimensions or ways can be retained in multilinear models to produce structures that are unique and which admit interpretations that are neurophysiologically meaningful. Multiway analysis of fMRI data from multiple runs of a bilateral finger-tapping paradigm was performed using the parallel factor (PARAFAC) model. A trilinear model was fitted to a data cube of dimensions voxels by time by run. Similarly, a quadrilinear model was fitted to a higher-way structure of dimensions voxels by time by trial by run. The spatial and temporal response components were extracted and validated by comparison to results from traditional SVD/PCA analyses based on scenarios of unfolding into lower-order bilinear structures.
Applications of AMPS-1D for solar cell simulation
NASA Astrophysics Data System (ADS)
Zhu, Hong; Kalkan, Ali Kaan; Hou, Jingya; Fonash, Stephen J.
1999-03-01
The AMPS-1D PC computer program is now used by over 70 groups world-wide for detector and solar cell analysis. It has proved to be a very powerful tool in understanding device operation and physics for single crystal, poly-crystalline and amorphous structures. For example, AMPS-1D has been successful in explaining the "red kink" [1] and the "transient effect" in CdS/CIGS poly-crystalline solar cells. It has been used to show that thin film poly-Si structures, with reasonable light trapping, are capable of competitive solar cell conversion efficiencies. In the case of a-Si:H structures, it has been used, for example, to settle the discrepancies in bandgap measurement, to predict the effective QE>1 phenomenon later seen in these materials [2], to determine the relative roles of interface and bulk properties, and to point the direction toward 16% triple junction structures. In general AMPS-1D is used for cell and detector design, material parameter sensitivity studies, and parameter extraction. Recently we have shown that it can be used to determine optimum structure and light and voltage biasing conditions in the material parameter extraction function. Information on AMPS can be found at www.psu.edu/dept/AMPS/amps_web/AMPS.html and at other web sites set up by user groups.
NASA Astrophysics Data System (ADS)
David, Peter; Hansen, Nichole; Nolan, James J.; Alcocer, Pedro
2015-05-01
The growth in text data available online is accompanied by a growth in the diversity of available documents. Corpora with extreme heterogeneity in terms of file formats, document organization, page layout, text style, and content are common. The absence of meaningful metadata describing the structure of online and open-source data leads to text extraction results that contain no information about document structure and are cluttered with page headers and footers, web navigation controls, advertisements, and other items that are typically considered noise. We describe an approach to document structure and metadata recovery that uses visual analysis of documents to infer the communicative intent of the author. Our algorithm identifies the components of documents such as titles, headings, and body content, based on their appearance. Because it operates on an image of a document, our technique can be applied to any type of document, including scanned images. Our approach to document structure recovery considers a finer-grained set of component types than prior approaches. In this initial work, we show that a machine learning approach to document structure recovery using a feature set based on the geometry and appearance of images of documents achieves a 60% greater F1- score than a baseline random classifier.
Structural Chemistry of Human RNA Methyltransferases.
Schapira, Matthieu
2016-03-18
RNA methyltransferases (RNMTs) play important roles in RNA stability, splicing, and epigenetic mechanisms. They constitute a promising target class that is underexplored by the medicinal chemistry community. Information of relevance to drug design can be extracted from the rich structural coverage of human RNMTs. In this work, the structural chemistry of this protein family is analyzed in depth. Unlike most methyltransferases, RNMTs generally feature a substrate-binding site that is largely open on the cofactor-binding pocket, favoring the design of bisubstrate inhibitors. Substrate purine or pyrimidines are often sandwiched between hydrophobic walls that can accommodate planar ring systems. When the substrate base is laying on a shallow surface, a 5' flanking base is sometimes anchored in a druggable cavity. The cofactor-binding site is structurally more diverse than in protein methyltransferases and more druggable in SPOUT than in Rossman-fold enzymes. Finally, conformational plasticity observed both at the substrate and cofactor binding sites may be a challenge for structure-based drug design. The landscape drawn here may inform ongoing efforts toward the discovery of the first human RNMT inhibitors.
The dynamics of information-driven coordination phenomena: A transfer entropy analysis
Borge-Holthoefer, Javier; Perra, Nicola; Gonçalves, Bruno; González-Bailón, Sandra; Arenas, Alex; Moreno, Yamir; Vespignani, Alessandro
2016-01-01
Data from social media provide unprecedented opportunities to investigate the processes that govern the dynamics of collective social phenomena. We consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of microblogging time series to extract directed networks of influence among geolocalized subunits in social systems. This methodology captures the emergence of system-level dynamics close to the onset of socially relevant collective phenomena. The framework is validated against a detailed empirical analysis of five case studies. In particular, we identify a change in the characteristic time scale of the information transfer that flags the onset of information-driven collective phenomena. Furthermore, our approach identifies an order-disorder transition in the directed network of influence between social subunits. In the absence of clear exogenous driving, social collective phenomena can be represented as endogenously driven structural transitions of the information transfer network. This study provides results that can help define models and predictive algorithms for the analysis of societal events based on open source data. PMID:27051875
The dynamics of information-driven coordination phenomena: A transfer entropy analysis.
Borge-Holthoefer, Javier; Perra, Nicola; Gonçalves, Bruno; González-Bailón, Sandra; Arenas, Alex; Moreno, Yamir; Vespignani, Alessandro
2016-04-01
Data from social media provide unprecedented opportunities to investigate the processes that govern the dynamics of collective social phenomena. We consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of microblogging time series to extract directed networks of influence among geolocalized subunits in social systems. This methodology captures the emergence of system-level dynamics close to the onset of socially relevant collective phenomena. The framework is validated against a detailed empirical analysis of five case studies. In particular, we identify a change in the characteristic time scale of the information transfer that flags the onset of information-driven collective phenomena. Furthermore, our approach identifies an order-disorder transition in the directed network of influence between social subunits. In the absence of clear exogenous driving, social collective phenomena can be represented as endogenously driven structural transitions of the information transfer network. This study provides results that can help define models and predictive algorithms for the analysis of societal events based on open source data.
Linguistic feature analysis for protein interaction extraction
2009-01-01
Background The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels. Results Our results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared. Conclusion Our findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches. PMID:19909518
Automatic retinal blood vessel parameter calculation in spectral domain optical coherence tomography
NASA Astrophysics Data System (ADS)
Wehbe, Hassan; Ruggeri, Marco; Jiao, Shuliang; Gregori, Giovanni; Puliafito, Carmen A.
2007-02-01
Measurement of retinal blood vessel parameters like the blood blow in the vessels may have significant impact on the study and diagnosis of glaucoma, a leading blinding disease worldwide. Optical coherence tomography (OCT) is a noninvasive imaging technique that can provide not only microscopic structural imaging of the retina but also functional information like the blood flow velocity in the retina. The aim of this study is to automatically extract the parameters of retinal blood vessels like the 3D orientation, the vessel diameters, as well as the corresponding absolute blood flow velocity in the vessel. The parameters were extracted from circular OCT scans around the optic disc. By removing the surface reflection through simple segmentation of the circular OCT scans a blood vessel shadowgram can be generated. The lateral coordinates and the diameter of each blood vessel are extracted from the shadowgram through a series of signal processing. Upon determination of the lateral position and the vessel diameter, the coordinate in the depth direction of each blood vessel is calculated in combination with the Doppler information for the vessel. The extraction of the vessel coordinates and diameter makes it possible to calculate the orientation of the vessel in reference to the direction of the incident sample light, which in turn can be used to calculate the absolute blood flow velocity and the flow rate.
Bayesian depth estimation from monocular natural images.
Su, Che-Chun; Cormack, Lawrence K; Bovik, Alan C
2017-05-01
Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world that the vision system likely exploits to compute perceived depth, monocularly as well as binocularly. Toward understanding how this might be accomplished, we propose a Bayesian model of monocular depth computation that recovers detailed 3D scene structures by extracting reliable, robust, depth-sensitive statistical features from single natural images. These features are derived using well-accepted univariate natural scene statistics (NSS) models and recent bivariate/correlation NSS models that describe the relationships between 2D photographic images and their associated depth maps. This is accomplished by building a dictionary of canonical local depth patterns from which NSS features are extracted as prior information. The dictionary is used to create a multivariate Gaussian mixture (MGM) likelihood model that associates local image features with depth patterns. A simple Bayesian predictor is then used to form spatial depth estimates. The depth results produced by the model, despite its simplicity, correlate well with ground-truth depths measured by a current-generation terrestrial light detection and ranging (LIDAR) scanner. Such a strong form of statistical depth information could be used by the visual system when creating overall estimated depth maps incorporating stereopsis, accommodation, and other conditions. Indeed, even in isolation, the Bayesian predictor delivers depth estimates that are competitive with state-of-the-art "computer vision" methods that utilize highly engineered image features and sophisticated machine learning algorithms.
NASA Astrophysics Data System (ADS)
Curilem, Millaray; Huenupan, Fernando; Beltrán, Daniel; San Martin, Cesar; Fuentealba, Gustavo; Franco, Luis; Cardona, Carlos; Acuña, Gonzalo; Chacón, Max; Khan, M. Salman; Becerra Yoma, Nestor
2016-04-01
Automatic pattern recognition applied to seismic signals from volcanoes may assist seismic monitoring by reducing the workload of analysts, allowing them to focus on more challenging activities, such as producing reports, implementing models, and understanding volcanic behaviour. In a previous work, we proposed a structure for automatic classification of seismic events in Llaima volcano, one of the most active volcanoes in the Southern Andes, located in the Araucanía Region of Chile. A database of events taken from three monitoring stations on the volcano was used to create a classification structure, independent of which station provided the signal. The database included three types of volcanic events: tremor, long period, and volcano-tectonic and a contrast group which contains other types of seismic signals. In the present work, we maintain the same classification scheme, but we consider separately the stations information in order to assess whether the complementary information provided by different stations improves the performance of the classifier in recognising seismic patterns. This paper proposes two strategies for combining the information from the stations: i) combining the features extracted from the signals from each station and ii) combining the classifiers of each station. In the first case, the features extracted from the signals from each station are combined forming the input for a single classification structure. In the second, a decision stage combines the results of the classifiers for each station to give a unique output. The results confirm that the station-dependent strategies that combine the features and the classifiers from several stations improves the classification performance, and that the combination of the features provides the best performance. The results show an average improvement of 9% in the classification accuracy when compared with the station-independent method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Russina, Olga; Macchiagodena, Marina; Kirchner, Barbara
2015-01-01
Here we report the first structural and dynamic investigation on ethylammonium nitrate, a representative protic Ionic liquid, and dimethylsulfoxide. By using joined x/ray and neutron diffraction, we exploit the EPSR approach to extract structural information at atomistic level. EAN/DMSO turns out to be homogeneous at microscopic scales and indications for the existence of a structural leit motiv with stoichiometric composition 2DMSO:1EAN are found. Dielectric spectroscopy is used to access the relaxation map of the DMSO:EAN = 60:40 mixture. No crystallisation is detected and three relaxation processes could be characterised. Overall this study provides new indications of strict analogies between watermore » and ethylammonium nitrate. (c) 2014 Elsevier B.V. All rights reserved.« less
Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths
NASA Astrophysics Data System (ADS)
Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka
2010-11-01
Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.
NASA Astrophysics Data System (ADS)
Kitaura, Francisco-Shu
2016-10-01
One of the main goals in cosmology is to understand how the Universe evolves, how it forms structures, why it expands, and what is the nature of dark matter and dark energy. Next decade large and expensive observational projects will bring information on the structure and the distribution of many millions of galaxies at different redshifts enabling us to make great progress in answering these questions. However, these data require a very special and complex set of analysis tools to extract the maximum valuable information. Statistical inference techniques are being developed, bridging the gaps between theory, simulations, and observations. In particular, we discuss the efforts to address the question: What is the underlying nonlinear matter distribution and dynamics at any cosmic time corresponding to a set of observed galaxies in redshift space? An accurate reconstruction of the initial conditions encodes the full phase-space information at any later cosmic time (given a particular structure formation model and a set of cosmological parameters). We present advances to solve this problem in a self-consistent way with Big Data techniques of the Cosmic Web.
NPIDB: Nucleic acid-Protein Interaction DataBase.
Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V
2013-01-01
The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.
City of Flagstaff Project: Ground Water Resource Evaluation, Remote Sensing Component
Chavez, Pat S.; Velasco, Miguel G.; Bowell, Jo-Ann; Sides, Stuart C.; Gonzalez, Rosendo R.; Soltesz, Deborah L.
1996-01-01
Many regions, cities, and towns in the Western United States need new or expanded water resources because of both population growth and increased development. Any tools or data that can help in the evaluation of an area's potential water resources must be considered for this increasingly critical need. Remotely sensed satellite images and subsequent digital image processing have been under-utilized in ground water resource evaluation and exploration. Satellite images can be helpful in detecting and mapping an area's regional structural patterns, including major fracture and fault systems, two important geologic settings for an area's surface to ground water relations. Within the United States Geological Survey's (USGS) Flagstaff Field Center, expertise and capabilities in remote sensing and digital image processing have been developed over the past 25 years through various programs. For the City of Flagstaff project, this expertise and these capabilities were combined with traditional geologic field mapping to help evaluate ground water resources in the Flagstaff area. Various enhancement and manipulation procedures were applied to the digital satellite images; the results, in both digital and hardcopy format, were used for field mapping and analyzing the regional structure. Relative to surface sampling, remotely sensed satellite and airborne images have improved spatial coverage that can help study, map, and monitor the earth surface at local and/or regional scales. Advantages offered by remotely sensed satellite image data include: 1. a synoptic/regional view compared to both aerial photographs and ground sampling, 2. cost effectiveness, 3. high spatial resolution and coverage compared to ground sampling, and 4. relatively high temporal coverage on a long term basis. Remotely sensed images contain both spectral and spatial information. The spectral information provides various properties and characteristics about the surface cover at a given location or pixel (that is, vegetation and/or soil type). The spatial information gives the distribution, variation, and topographic relief of the cover types from pixel to pixel. Therefore, the main characteristics that determine a pixel's brightness/reflectance and, consequently, the digital number (DN) assigned to the pixel, are the physical properties of the surface and near surface, the cover type, and the topographic slope. In this application, the ability to detect and map lineaments, especially those related to fractures and faults, is critical. Therefore, the extraction of spatial information from the digital images was of prime interest in this project. The spatial information varies among the different spectral bands available; in particular, a near infrared spectral band is better than a visible band when extracting spatial information in highly vegetated areas. In this study, both visible and near infrared bands were analyzed and used to extract the desired spatial information from the images. The wide swath coverage of remotely sensed satellite digital images makes them ideal for regional analysis and mapping. Since locating and mapping highly fractured and faulted areas is a major requirement for ground water resource evaluation and exploration this aspect of satellite images was considered critical; it allowed us to stand back (actually up about 440 miles), look at, and map the regional structural setting of the area. The main focus of the remote sensing and digital image processing component of this project was to use both remotely sensed digital satellite images and a Digital Elevation Model (DEM) to extract spatial information related to the structural and topographic patterns in the area. The data types used were digital satellite images collected by the United States' Landsat Thematic Mapper (TM) and French Systeme Probatoire d'Observation de laTerre (SPOT) imaging systems, along with a DEM of the Flagstaff region. The USGS Mini Image Processing Sy
Longitudinal Analysis of New Information Types in Clinical Notes
Zhang, Rui; Pakhomov, Serguei; Melton, Genevieve B.
2014-01-01
It is increasingly recognized that redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous, significant, and may negatively impact the secondary use of these notes for research and patient care. We investigated several automated methods to identify redundant versus relevant new information in clinical reports. These methods may provide a valuable approach to extract clinically pertinent information and further improve the accuracy of clinical information extraction systems. In this study, we used UMLS semantic types to extract several types of new information, including problems, medications, and laboratory information. Automatically identified new information highly correlated with manual reference standard annotations. Methods to identify different types of new information can potentially help to build up more robust information extraction systems for clinical researchers as well as aid clinicians and researchers in navigating clinical notes more effectively and quickly identify information pertaining to changes in health states. PMID:25717418
Robust watermark technique using masking and Hermite transform.
Coronel, Sandra L Gomez; Ramírez, Boris Escalante; Mosqueda, Marco A Acevedo
2016-01-01
The following paper evaluates a watermark algorithm designed for digital images by using a perceptive mask and a normalization process, thus preventing human eye detection, as well as ensuring its robustness against common processing and geometric attacks. The Hermite transform is employed because it allows a perfect reconstruction of the image, while incorporating human visual system properties; moreover, it is based on the Gaussian functions derivates. The applied watermark represents information of the digital image proprietor. The extraction process is blind, because it does not require the original image. The following techniques were utilized in the evaluation of the algorithm: peak signal-to-noise ratio, the structural similarity index average, the normalized crossed correlation, and bit error rate. Several watermark extraction tests were performed, with against geometric and common processing attacks. It allowed us to identify how many bits in the watermark can be modified for its adequate extraction.
Building Facade Reconstruction by Fusing Terrestrial Laser Points and Images
Pu, Shi; Vosselman, George
2009-01-01
Laser data and optical data have a complementary nature for three dimensional feature extraction. Efficient integration of the two data sources will lead to a more reliable and automated extraction of three dimensional features. This paper presents a semiautomatic building facade reconstruction approach, which efficiently combines information from terrestrial laser point clouds and close range images. A building facade's general structure is discovered and established using the planar features from laser data. Then strong lines in images are extracted using Canny extractor and Hough transformation, and compared with current model edges for necessary improvement. Finally, textures with optimal visibility are selected and applied according to accurate image orientations. Solutions to several challenge problems throughout the collaborated reconstruction, such as referencing between laser points and multiple images and automated texturing, are described. The limitations and remaining works of this approach are also discussed. PMID:22408539
Optimal Information Extraction of Laser Scanning Dataset by Scale-Adaptive Reduction
NASA Astrophysics Data System (ADS)
Zang, Y.; Yang, B.
2018-04-01
3D laser technology is widely used to collocate the surface information of object. For various applications, we need to extract a good perceptual quality point cloud from the scanned points. To solve the problem, most of existing methods extract important points based on a fixed scale. However, geometric features of 3D object come from various geometric scales. We propose a multi-scale construction method based on radial basis function. For each scale, important points are extracted from the point cloud based on their importance. We apply a perception metric Just-Noticeable-Difference to measure degradation of each geometric scale. Finally, scale-adaptive optimal information extraction is realized. Experiments are undertaken to evaluate the effective of the proposed method, suggesting a reliable solution for optimal information extraction of object.
Senger, Stefan; Bartek, Luca; Papadatos, George; Gaulton, Anna
2015-12-01
First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases. When looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys. In our comparison of automatically generated vs. manually curated patent chemistry databases, the former successfully provided approximately 60 % of links between chemical structure and patents. It needs to be stressed that only a very limited number of patents and compound-patent pairs were used for our comparison. Nevertheless, our results will hopefully help to manage expectations of users of patent chemistry databases of this type and provide a useful framework for more studies like ours as well as guide future developments of the workflows used for the automated extraction of chemical structures from patents. The challenges we have encountered whilst performing this study highlight that more needs to be done to make such assessments easier. Above all, more adequate, preferably open access to relevant 'gold standards' is required.
Information extraction during simultaneous motion processing.
Rideaux, Reuben; Edwards, Mark
2014-02-01
When confronted with multiple moving objects the visual system can process them in two stages: an initial stage in which a limited number of signals are processed in parallel (i.e. simultaneously) followed by a sequential stage. We previously demonstrated that during the simultaneous stage, observers could discriminate between presentations containing up to 5 vs. 6 spatially localized motion signals (Edwards & Rideaux, 2013). Here we investigate what information is actually extracted during the simultaneous stage and whether the simultaneous limit varies with the detail of information extracted. This was achieved by measuring the ability of observers to extract varied information from low detail, i.e. the number of signals presented, to high detail, i.e. the actual directions present and the direction of a specific element, during the simultaneous stage. The results indicate that the resolution of simultaneous processing varies as a function of the information which is extracted, i.e. as the information extraction becomes more detailed, from the number of moving elements to the direction of a specific element, the capacity to process multiple signals is reduced. Thus, when assigning a capacity to simultaneous motion processing, this must be qualified by designating the degree of information extraction. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Hoehndorf, Robert; Alshahrani, Mona; Gkoutos, Georgios V; Gosline, George; Groom, Quentin; Hamann, Thomas; Kattge, Jens; de Oliveira, Sylvia Mota; Schmidt, Marco; Sierra, Soraya; Smets, Erik; Vos, Rutger A; Weiland, Claus
2016-11-14
The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.
Ertl, Peter; Patiny, Luc; Sander, Thomas; Rufener, Christian; Zasso, Michaël
2015-01-01
Wikipedia, the world's largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia. We extracted chemical structures from entries in Wikipedia and implemented a web system allowing structure and similarity searching on these data. The whole search as well as visualization system is written in JavaScript and therefore can run locally within a web page and does not require a central server. The Wikipedia Chemical Structure Explorer is accessible on-line at www.cheminfo.org/wikipedia and is available also as an open source project from GitHub for local installation. The web-based Wikipedia Chemical Structure Explorer provides a useful resource for research as well as for chemical education enabling both researchers and students easy and user friendly chemistry searching and identification of relevant information in Wikipedia. The tool can also help to improve quality of chemical entries in Wikipedia by providing potential contributors regularly updated list of entries with problematic structures. And last but not least this search system is a nice example of how the modern web technology can be applied in the field of cheminformatics. Graphical abstractWikipedia Chemical Structure Explorer allows substructure and similarity searches on molecules referenced in Wikipedia.
Zeng, Jijiao; Tong, Zhaohui; Wang, Letian; Zhu, J Y; Ingram, Lonnie
2014-02-01
The structure of lignin after dilute phosphoric acid plus steam explosion pretreatment process of sugarcane bagasse in a pilot scale and the effect of the lignin extracted by ethanol on subsequent cellulose hydrolysis were investigated. The lignin structural changes caused by pretreatment were identified using advanced nondestructive techniques such as gel permeation chromatography (GPC), quantitative (13)C, and 2-D nuclear magnetic resonance (NMR). The structural analysis revealed that ethanol extractable lignin preserved basic lignin structure, but had relatively lower amount of β-O-4 linkages, syringyl/guaiacyl units ratio (S/G), p-coumarate/ferulate ratio, and other ending structures. The results also indicated that approximately 8% of mass weight was extracted by pure ethanol. The bagasse after ethanol extraction had an approximate 22% higher glucose yield after enzyme hydrolysis compared to pretreated bagasse without extraction. Copyright © 2013 Elsevier Ltd. All rights reserved.
Querying databases of trajectories of differential equations: Data structures for trajectories
NASA Technical Reports Server (NTRS)
Grossman, Robert
1989-01-01
One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.
Grazing-incidence small angle x-ray scattering studies of nanoscale polymer gratings
NASA Astrophysics Data System (ADS)
Doxastakis, Manolis; Suh, Hyo Seon; Chen, Xuanxuan; Rincon Delgadillo, Paulina A.; Wan, Lingshu; Williamson, Lance; Jiang, Zhang; Strzalka, Joseph; Wang, Jin; Chen, Wei; Ferrier, Nicola; Ramirez-Hernandez, Abelardo; de Pablo, Juan J.; Gronheid, Roel; Nealey, Paul
2015-03-01
Grazing-Incidence Small Angle X-ray Scattering (GISAXS) offers the ability to probe large sample areas, providing three-dimensional structural information at high detail in a thin film geometry. In this study we exploit the application of GISAXS to structures formed at one step of the LiNe (Liu-Nealey) flow using chemical patterns for directed self-assembly of block copolymer films. Experiments conducted at the Argonne National Laboratory provided scattering patterns probing film characteristics at both parallel and normal directions to the surface. We demonstrate the application of new computational methods to construct models based on scattering measured. Such analysis allows for extraction of structural characteristics at unprecedented detail.
Geologic information from satellite images
NASA Technical Reports Server (NTRS)
Lee, K.; Knepper, D. H.; Sawatzky, D. L.
1974-01-01
Extracting geologic information from ERTS and Skylab/EREP images is best done by a geologist trained in photo-interpretation. The information is at a regional scale, and three basic types are available: rock and soil, geologic structures, and landforms. Discrimination between alluvium and sedimentary or crystalline bedrock, and between units in thick sedimentary sequences is best, primarily because of topographic expression and vegetation differences. Discrimination between crystalline rock types is poor. Folds and fractures are the best displayed geologic features. They are recognizable by topographic expression, drainage patterns, and rock or vegetation tonal patterns. Landforms are easily discriminated by their familiar shapes and patterns. Several examples demonstrate the applicability of satellite images to tectonic analysis and petroleum and mineral exploration.
Borges, Cleber N; Bruns, Roy E; Almeida, Aline A; Scarminio, Ieda S
2007-07-09
A composite simplex centroid-simplex centroid mixture design is proposed for simultaneously optimizing two mixture systems. The complementary model is formed by multiplying special cubic models for the two systems. The design was applied to the simultaneous optimization of both mobile phase chromatographic mixtures and extraction mixtures for the Camellia sinensis Chinese tea plant. The extraction mixtures investigated contained varying proportions of ethyl acetate, ethanol and dichloromethane while the mobile phase was made up of varying proportions of methanol, acetonitrile and a methanol-acetonitrile-water (MAW) 15%:15%:70% mixture. The experiments were block randomized corresponding to a split-plot error structure to minimize laboratory work and reduce environmental impact. Coefficients of an initial saturated model were obtained using Scheffe-type equations. A cumulative probability graph was used to determine an approximate reduced model. The split-plot error structure was then introduced into the reduced model by applying generalized least square equations with variance components calculated using the restricted maximum likelihood approach. A model was developed to calculate the number of peaks observed with the chromatographic detector at 210 nm. A 20-term model contained essentially all the statistical information of the initial model and had a root mean square calibration error of 1.38. The model was used to predict the number of peaks eluted in chromatograms obtained from extraction solutions that correspond to axial points of the simplex centroid design. The significant model coefficients are interpreted in terms of interacting linear, quadratic and cubic effects of the mobile phase and extraction solution components.
Adib, Adiana Mohamed; Jamaludin, Fadzureena; Kiong, Ling Sui; Hashim, Nuziah; Abdullah, Zunoliza
2014-08-05
Baeckea frutescens or locally known as Cucur atap is used as antibacterial, antidysentery, antipyretic and diuretic agent. In Malaysia and Indonesia, they are used as an ingredient of the traditional medicine given to mothers during confinement. A three-steps infra-red (IR) macro-fingerprinting method combining conventional IR spectra, and the secondary derivative spectra with two dimensional infrared correlation spectroscopy (2D-IR) have been proved to be effective methods to examine a complicated mixture such as herbal medicines. This study investigated the feasibility of employing multi-steps IR spectroscopy in order to study the main constituents of B. frutescens and its different extracts (extracted by chloroform, ethyl acetate, methanol and aqueous in turn). The findings indicated that FT-IR and 2D-IR can provide many holistic variation rules of chemical constituents. The structural information of the samples indicated that B. frutescens and its extracts contain a large amount of flavonoids, since some characteristic absorption peaks of flavonoids, such as ∼1600cm(-1), ∼1500cm(-1), ∼1450cm(-1), and ∼1270cm(-1) can be observed. The macroscopical fingerprint characters of FT-IR and 2D-IR spectra can not only provide the information of main chemical constituents in medicinal materials and their different extracts, but also compare the components differences among the similar samples. In conclusion, the multi-steps IR macro-fingerprint method is rapid, effective, visual and accurate for pharmaceutical research. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, J.; Chen, W.; Dou, A.; Li, W.; Sun, Y.
2018-04-01
A new information extraction method of damaged buildings rooted in optimal feature space is put forward on the basis of the traditional object-oriented method. In this new method, ESP (estimate of scale parameter) tool is used to optimize the segmentation of image. Then the distance matrix and minimum separation distance of all kinds of surface features are calculated through sample selection to find the optimal feature space, which is finally applied to extract the image of damaged buildings after earthquake. The overall extraction accuracy reaches 83.1 %, the kappa coefficient 0.813. The new information extraction method greatly improves the extraction accuracy and efficiency, compared with the traditional object-oriented method, and owns a good promotional value in the information extraction of damaged buildings. In addition, the new method can be used for the information extraction of different-resolution images of damaged buildings after earthquake, then to seek the optimal observation scale of damaged buildings through accuracy evaluation. It is supposed that the optimal observation scale of damaged buildings is between 1 m and 1.2 m, which provides a reference for future information extraction of damaged buildings.
Bahrami, Yadollah; Franco, Christopher M. M.
2015-01-01
Sea cucumbers produce numerous compounds with a wide range of chemical structural diversity. Among these, saponins are the most diverse and include sulfated, non-sulfated, acetylated and methylated congeners with different aglycone and sugar moieties. In this study, MALDI and ESI tandem mass spectrometry, in the positive ion mode, were used to elucidate the structure of new saponins extracted from the viscera of H. lessoni. Fragmentation of the aglycone provided structural information on the presence of the acetyl group. The presence of the O-acetyl group was confirmed by observing the mass transition of 60 u corresponding to the loss of a molecule of acetic acid. Ion fingerprints from the glycosidic cleavage provided information on the mass of the aglycone (core), and the sequence and type of monosaccharides that constitute the sugar moiety. The tandem mass spectra of the saponin precursor ions [M + Na]+ provided a wealth of detailed structural information on the glycosidic bond cleavages. As a result, and in conjunction with existing literature, we characterized the structure of five new acetylated saponins, Lessoniosides A–E, along with two non-acetylated saponins Lessoniosides F and G at m/z 1477.7, which are promising candidates for future drug development. The presented strategy allows a rapid, reliable and complete analysis of native saponins. PMID:25603350
Iterative cross section sequence graph for handwritten character segmentation.
Dawoud, Amer
2007-08-01
The iterative cross section sequence graph (ICSSG) is an algorithm for handwritten character segmentation. It expands the cross section sequence graph concept by applying it iteratively at equally spaced thresholds. The iterative thresholding reduces the effect of information loss associated with image binarization. ICSSG preserves the characters' skeletal structure by preventing the interference of pixels that causes flooding of adjacent characters' segments. Improving the structural quality of the characters' skeleton facilitates better feature extraction and classification, which improves the overall performance of optical character recognition (OCR). Experimental results showed significant improvements in OCR recognition rates compared to other well-established segmentation algorithms.
The structure and infrastructure of the global nanotechnology literature
NASA Astrophysics Data System (ADS)
Kostoff, Ronald N.; Stump, Jesse A.; Johnson, Dustin; Murday, James S.; Lau, Clifford G. Y.; Tolles, William M.
2006-08-01
Text mining is the extraction of useful information from large volumes of text. A text mining analysis of the global open nanotechnology literature was performed. Records from the Science Citation Index (SCI)/Social SCI were analyzed to provide the infrastructure of the global nanotechnology literature (prolific authors/journals/institutions/countries, most cited authors/papers/journals) and the thematic structure (taxonomy) of the global nanotechnology literature, from a science perspective. Records from the Engineering Compendex (EC) were analyzed to provide a taxonomy from a technology perspective. The Far Eastern countries have expanded nanotechnology publication output dramatically in the past decade.
Hively, Lee M.
2014-09-16
Data collected from devices and human condition may be used to forewarn of critical events such as machine/structural failure or events from brain/heart wave data stroke. By monitoring the data, and determining what values are indicative of a failure forewarning, one can provide adequate notice of the impending failure in order to take preventive measures. This disclosure teaches a computer-based method to convert dynamical numeric data representing physical objects (unstructured data) into discrete-phase-space states, and hence into a graph (structured data) for extraction of condition change.
Method for accurate growth of vertical-cavity surface-emitting lasers
Chalmers, S.A.; Killeen, K.P.; Lear, K.L.
1995-03-14
The authors report a method for accurate growth of vertical-cavity surface-emitting lasers (VCSELs). The method uses a single reflectivity spectrum measurement to determine the structure of the partially completed VCSEL at a critical point of growth. This information, along with the extracted growth rates, allows imprecisions in growth parameters to be compensated for during growth of the remaining structure, which can then be completed with very accurate critical dimensions. Using this method, they can now routinely grow lasing VCSELs with Fabry-Perot cavity resonance wavelengths controlled to within 0.5%. 4 figs.
Review: Magnetic resonance imaging techniques in ophthalmology
Fagan, Andrew J.
2012-01-01
Imaging the eye with magnetic resonance imaging (MRI) has proved difficult due to the eye’s propensity to move involuntarily over typical imaging timescales, obscuring the fine structure in the eye due to the resulting motion artifacts. However, advances in MRI technology help to mitigate such drawbacks, enabling the acquisition of high spatiotemporal resolution images with a variety of contrast mechanisms. This review aims to classify the MRI techniques used to date in clinical and preclinical ophthalmologic studies, describing the qualitative and quantitative information that may be extracted and how this may inform on ocular pathophysiology. PMID:23112569
BilKristal 2.0: A tool for pattern information extraction from crystal structures
NASA Astrophysics Data System (ADS)
Okuyan, Erhan; Güdükbay, Uğur
2014-01-01
We present a revised version of the BilKristal tool of Okuyan et al. (2007). We converted the development environment into Microsoft Visual Studio 2005 in order to resolve compatibility issues. We added multi-core CPU support and improvements are made to graphics functions in order to improve performance. Discovered bugs are fixed and exporting functionality to a material visualization tool is added.
ERIC Educational Resources Information Center
Hao, Jiangang; Smith, Lawrence; Mislevy, Robert; von Davier, Alina; Bauer, Malcolm
2016-01-01
Extracting information efficiently from game/simulation-based assessment (G/SBA) logs requires two things: a well-structured log file and a set of analysis methods. In this report, we propose a generic data model specified as an extensible markup language (XML) schema for the log files of G/SBAs. We also propose a set of analysis methods for…
Development of Markup Language for Medical Record Charting: A Charting Language.
Jung, Won-Mo; Chae, Younbyoung; Jang, Bo-Hyoung
2015-01-01
Nowadays a lot of trials for collecting electronic medical records (EMRs) exist. However, structuring data format for EMR is an especially labour-intensive task for practitioners. Here we propose a new mark-up language for medical record charting (called Charting Language), which borrows useful properties from programming languages. Thus, with Charting Language, the text data described in dynamic situation can be easily used to extract information.
Web-Scale Search-Based Data Extraction and Integration
2011-10-17
differently, posing challenges for aggregating this information. For example, for the task of finding population for cities in Benin, we were faced with...merged record. Our GeoMerging algorithm attempts to address various ambiguity challenges : • For name: The name of a hospital is not a unique...departments in the same building. For agent-extractor results from structured sources, our GeoMerging algorithm overcomes these challenges using a two
Reducing Labeling Effort for Structured Prediction Tasks
2005-01-01
correctly annotated for the instance to be of use to the learner. Traditional active learning addresses this problem by optimizing the order in which the...than for others. We propose a new active learning paradigm which reduces not only how many instances the annotator must label, but also how difficult...We validate this active learning framework in an interactive information extraction system, reducing the total number of annotation actions by 22%.
SPECTRAL LINE DE-CONFUSION IN AN INTENSITY MAPPING SURVEY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheng, Yun-Ting; Bock, James; Bradford, C. Matt
2016-12-01
Spectral line intensity mapping (LIM) has been proposed as a promising tool to efficiently probe the cosmic reionization and the large-scale structure. Without detecting individual sources, LIM makes use of all available photons and measures the integrated light in the source confusion limit to efficiently map the three-dimensional matter distribution on large scales as traced by a given emission line. One particular challenge is the separation of desired signals from astrophysical continuum foregrounds and line interlopers. Here we present a technique to extract large-scale structure information traced by emission lines from different redshifts, embedded in a three-dimensional intensity mapping data cube.more » The line redshifts are distinguished by the anisotropic shape of the power spectra when projected onto a common coordinate frame. We consider the case where high-redshift [C ii] lines are confused with multiple low-redshift CO rotational lines. We present a semi-analytic model for [C ii] and CO line estimates based on the cosmic infrared background measurements, and show that with a modest instrumental noise level and survey geometry, the large-scale [C ii] and CO power spectrum amplitudes can be successfully extracted from a confusion-limited data set, without external information. We discuss the implications and limits of this technique for possible LIM experiments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shur, V. Ya., E-mail: vladimir.shur@urfu.ru; Zelenovskiy, P. S.
2014-08-14
The application of the most effective methods of the domain visualization in model uniaxial ferroelectrics of lithium niobate (LN) and lithium tantalate (LT) family, and relaxor strontium-barium niobate (SBN) have been reviewed in this paper. We have demonstrated the synergetic effect of joint usage of optical, confocal Raman, and piezoelectric force microscopies which provide extracting of the unique information about formation of the micro- and nanodomain structures. The methods have been applied for investigation of various types of domain structures with increasing complexity: (1) periodical domain structure in LN and LT, (2) nanodomain structures in LN, LT, and SBN, (3)more » nanodomain structures in LN with modified surface layer, (4) dendrite domain structure in LN. The self-assembled appearance of quasi-regular nanodomain structures in highly non-equilibrium switching conditions has been considered.« less
Gazal, Giath; Tola, Ahmed W; Fareed, Wamiq M; Alnazzawi, Ahmad A; Zafar, Muhammad S
2016-04-01
To evaluate the value of using the visual information for reducing the level of dental fear and anxiety in patients undergoing teeth extraction under LA. A total of 64 patients were indiscriminately allotted to solitary of the study groups following reading the information sheet and signing the formal consent. If patient was in the control group, only verbal information and routine warnings were provided. If patient was in the study group, tooth extraction video was showed. The level of dental fear and anxiety was detailed by the patients on customary 100 mm visual analog scales (VAS), with "no dental fear and anxiety" (0 mm) and "severe dental distress and unease" (100 mm). Evaluation of dental apprehension and fretfulness was made pre-operatively, following visual/verbal information and post-extraction. There was a substantial variance among the mean dental fear and anxiety scores for both groups post-extraction (p-value < 0.05). Patients in tooth extraction video group were more comfortable after dental extraction than verbal information and routine warning group. For tooth extraction video group there were major decreases in dental distress and anxiety scores between the pre-operative and either post video information scores or postoperative scores (p-values < 0.05). Younger patients recorded higher dental fear and anxiety scores than older ones (P < 0.05). Dental fear and anxiety associated with dental extractions under local anesthesia can be reduced by showing a tooth extraction video to the patients preoperatively.
NASA Astrophysics Data System (ADS)
Reppert, Michael; Tokmakoff, Andrei
The structural characterization of intrinsically disordered peptides (IDPs) presents a challenging biophysical problem. Extreme heterogeneity and rapid conformational interconversion make traditional methods difficult to interpret. Due to its ultrafast (ps) shutter speed, Amide I vibrational spectroscopy has received considerable interest as a novel technique to probe IDP structure and dynamics. Historically, Amide I spectroscopy has been limited to delivering global secondary structural information. More recently, however, the method has been adapted to study structure at the local level through incorporation of isotope labels into the protein backbone at specific amide bonds. Thanks to the acute sensitivity of Amide I frequencies to local electrostatic interactions-particularly hydrogen bonds-spectroscopic data on isotope labeled residues directly reports on local peptide conformation. Quantitative information can be extracted using electrostatic frequency maps which translate molecular dynamics trajectories into Amide I spectra for comparison with experiment. Here we present our recent efforts in the development of a rigorous approach to incorporating Amide I spectroscopic restraints into refined molecular dynamics structural ensembles using maximum entropy and related approaches. By combining force field predictions with experimental spectroscopic data, we construct refined structural ensembles for a family of short, strongly disordered, elastin-like peptides in aqueous solution.
NASA Astrophysics Data System (ADS)
Ye, L.; Xu, X.; Luan, D.; Jiang, W.; Kang, Z.
2017-07-01
Crater-detection approaches can be divided into four categories: manual recognition, shape-profile fitting algorithms, machine-learning methods and geological information-based analysis using terrain and spectral data. The mainstream method is Shape-profile fitting algorithms. Many scholars throughout the world use the illumination gradient information to fit standard circles by least square method. Although this method has achieved good results, it is difficult to identify the craters with poor "visibility", complex structure and composition. Moreover, the accuracy of recognition is difficult to be improved due to the multiple solutions and noise interference. Aiming at the problem, we propose a method for the automatic extraction of impact craters based on spectral characteristics of the moon rocks and minerals: 1) Under the condition of sunlight, the impact craters are extracted from MI by condition matching and the positions as well as diameters of the craters are obtained. 2) Regolith is spilled while lunar is impacted and one of the elements of lunar regolith is iron. Therefore, incorrectly extracted impact craters can be removed by judging whether the crater contains "non iron" element. 3) Craters which are extracted correctly, are divided into two types: simple type and complex type according to their diameters. 4) Get the information of titanium and match the titanium distribution of the complex craters with normal distribution curve, then calculate the goodness of fit and set the threshold. The complex craters can be divided into two types: normal distribution curve type of titanium and non normal distribution curve type of titanium. We validated our proposed method with MI acquired by SELENE. Experimental results demonstrate that the proposed method has good performance in the test area.
Learning Predictive Statistics: Strategies and Brain Mechanisms.
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe
2017-08-30
When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to changes in the environment's statistics. We provide evidence for an alternate route for learning complex temporal statistics: extracting the most probable outcome in a given context is implemented by interactions between executive and motor corticostriatal mechanisms compared with visual corticostriatal circuits (including hippocampal cortex) that support learning of the exact temporal statistics. Copyright © 2017 Wang et al.
Automated prediction of protein function and detection of functional sites from structure.
Pazos, Florencio; Sternberg, Michael J E
2004-10-12
Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.